Dataset Rich Results

Dataset structured data makes your data discoverable in Google Dataset Search — a dedicated search engine for researchers, journalists, and data scientists.

Table of contents

  1. Required Fields
  2. Recommended Fields
  3. Basic Dataset
  4. Research Dataset with DataCatalog
  5. Common Mistakes

Required Fields

Field Type Notes
name string Dataset name
description string What the dataset contains
Field Type Notes
url string Landing page URL
sameAs string \| string[] DOI, repository URL
license string License URL
creator Person \| Organization Who created it
publisher Organization Who published it
datePublished string When first published
dateModified string Last update date
version string Dataset version
keywords string[] Descriptive keywords
distribution DataDownload[] Where to get the data
variableMeasured string \| PropertyValue[] What variables are included
spatialCoverage string \| Place Geographic scope
temporalCoverage string Time period covered
measurementTechnique string How data was collected
includedInDataCatalog DataCatalog Parent catalog

Basic Dataset

import { createDataset } from 'schemaorg-kit';

const dataset = createDataset({
  name: 'Global Carbon Emissions by Country 1990–2024',
  description: 'Annual CO₂ and greenhouse gas emissions data for 195 countries from 1990 to 2024. Includes total emissions, per-capita figures, and sector breakdowns.',
  url: 'https://climate-data.org/datasets/carbon-emissions',
  license: 'https://creativecommons.org/licenses/by/4.0/',
  creator: {
    '@type': 'Organization',
    name: 'Climate Research Institute',
    url: 'https://climate-data.org',
  },
  datePublished: '2025-01-15',
  dateModified: '2025-03-01',
  version: '4.2',
  keywords: ['climate', 'CO2', 'emissions', 'greenhouse gas', 'carbon footprint'],
  temporalCoverage: '1990/2024',
  spatialCoverage: {
    '@type': 'Place',
    name: 'Global',
  },
  distribution: [
    {
      '@type': 'DataDownload',
      encodingFormat: 'text/csv',
      contentUrl: 'https://climate-data.org/datasets/carbon-emissions.csv',
    },
    {
      '@type': 'DataDownload',
      encodingFormat: 'application/json',
      contentUrl: 'https://climate-data.org/datasets/carbon-emissions.json',
    },
    {
      '@type': 'DataDownload',
      encodingFormat: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
      contentUrl: 'https://climate-data.org/datasets/carbon-emissions.xlsx',
    },
  ],
  variableMeasured: [
    { '@type': 'PropertyValue', name: 'total_emissions_mt', description: 'Total annual emissions in megatons CO₂e' },
    { '@type': 'PropertyValue', name: 'per_capita_tons',    description: 'Per capita emissions in metric tons CO₂' },
    { '@type': 'PropertyValue', name: 'sector',             description: 'Emission source sector (energy, transport, industry, etc.)' },
  ],
  measurementTechnique: 'Satellite observation combined with national inventory reports (UNFCCC submission data)',
  sameAs: 'https://doi.org/10.5281/zenodo.1234567',
});

Research Dataset with DataCatalog

import { createDataset } from 'schemaorg-kit';

const dataset = createDataset({
  name: 'ImageNet-1M — Cleaned Subset',
  description: 'A curated 1-million image subset of ImageNet with corrected labels and balanced class distribution.',
  creator: { '@type': 'Organization', name: 'ML Research Lab', url: 'https://mllab.edu' },
  datePublished: '2024-09-01',
  license: 'https://creativecommons.org/licenses/by-nc/4.0/',
  distribution: [
    {
      '@type': 'DataDownload',
      encodingFormat: 'application/zip',
      contentUrl: 'https://mllab.edu/datasets/imagenet-1m-clean.zip',
      contentSize: '45GB',
    },
  ],
  includedInDataCatalog: {
    '@type': 'DataCatalog',
    name: 'ML Research Lab Open Datasets',
    url: 'https://mllab.edu/datasets',
  },
  sameAs: 'https://paperswithcode.com/dataset/imagenet-1m-clean',
});

Common Mistakes

Always include at least one distribution entry with contentUrl and encodingFormat. This is what allows Dataset Search to link directly to downloadable data.

Use sameAs to link to the DOI (Digital Object Identifier) if your dataset has one — this is how academic search engines identify your dataset.

temporalCoverage uses ISO 8601 interval notation: "1990/2024" means 1990 through 2024. For an ongoing dataset: "2020/.."