Every commission moves through the same nine stations — whatever the dataset type. The catalogue grows, the architecture doesn't.
A buyer writes a commission brief — domain, volume, quality bar, provenance, delivery. The owner opens a new spec for it or maps it onto an existing one. The catalogue gains a lot.
The spec is the contract for the dataset. It defines schema, quality bar, provenance constraints, and the validation layers the output must pass. Specs are versioned and citable, identical for text, image, audio, video, or anything else.
Miners commit their read credentials on-chain. The validator metagraph discovers them. The floor opens for the next production interval.
Each miner produces a dataset shard plus a manifest for the interval and uploads it to their own bucket. The manifest carries the spec id, version, schema, hashes, and the producer's identity.
Validators independently download each submission and run the spec's layered checks — schema, integrity, sampled-asset verification, quality gate, provenance, overlap, and scoring.
Per-miner appraisals are aggregated across validators. Validators in quorum push their results to the validation API.
When a miner clears the spec's quality bar with quorum, the cycle's verdict is published. Aggregate scores become public on the floor.
Certified rows wear the spec hallmark. The owner-validator updates the shared overlap snapshot so future cycles never re-produce what's already been certified.
The subnet owner delivers the certified dataset to the buyer.
Manifest matches the miner hotkey and the interval id; spec and protocol versions are compatible and enabled.
The dataset's hash matches the manifest; the row count matches the dataset.
Every row parses against the spec's schema; type, range, and required-field constraints hold.
Random rows are sampled; the underlying assets are re-fetched and hashed against the manifest fields.
Spec-specific quality checks run on the sampled rows — whatever the spec calls for, from format to fidelity to semantic grounding.
Each row's declared source, licensing, and jurisdiction conform to the spec's provenance rules.
Rows already present in the shared overlap snapshot are pruned. Cross-miner duplicates are resolved by earliest manifest time.
Per-miner aggregate score is computed. Invalid hotkeys are zeroed before chain weights are set.