Structured Contents Initiative

More Structured Data, less BLOBs

Wikipedia uses wikitext, a markup language designed for formatting page content. While it has proven useful for editors authoring wiki articles, it creates complexity for developers parsing articles at scale.

When Wikimedia Enterprise launched in 2021, we built it to serve high-volume, high-frequency users of Wikimedia data. As part of that effort we also improved parsing by providing HTML blobs, a format that developers are more familiar with and for which many parsing libraries already exist.

The Structured Contents Initiative is the next step in serving easy-to-parse Wikimedia data. Currently in beta, it extracts infoboxes, sections, tables, references, and more from raw wikitext and HTML and delivers them as structured, machine-readable JSON.

What’s Available Now

Languages: English, French, German, Italian, Spanish, Portuguese, Dutch, Welsh, and Indonesian.

APIs: On-demand (free) & Snapshot (talk to sales)

Structured Contents currently extracts the following article parts into JSON:

  • abstract
  • description
  • main image
  • infobox
  • sections
  • images
  • lists
  • citations & references
  • tables

For a full explanation of the structured contents response data schema see our Data Dictionary: Beta section.

Wikitext blob compared with Structured Contents JSON

Showcase: BLOBs vs Structured Contents

Below are examples using Josephine Baker‘s English Wikipedia article. Each feature is shown side by side, comparing the raw HTML and wikitext BLOBs versus the clean JSON output from Structured Contents. These examples make it clear how the data is transformed and why it is easier for developers to use. Some of the payload output in these examples have been truncated (using […]).

Article Description

[...]\u003e\u003cdiv class=\"shortdescription nomobile noexcerpt noprint searchaux\" style=\"display:none\" about=\"#mwt1\" typeof=\"mw:Transclusion\" data-mw='{\"parts\":[{\"template\":{\"target\":{\"wt\":\"short description\",\"href\":\"./Template:Short_description\"},\"params\":{\"1\":{\"wt\":\"American-born French entertainer (1906–1975)\"}},\"i\":0}}]}' id=\"mwAg\"\u003eAmerican-born French entertainer (1906–1975)\u003c/div[...]
{{short description|American-born French entertainer (1906–1975)}}\n
"description": "American-born French entertainer (1906–1975)"

Infobox

data-mw-deduplicate=\"TemplateStyles:r1295905060\" typeof=\"mw:Extension/templatestyles mw:Transclusion\" about=\"#mwt6\" data-mw='{\"name\":\"templatestyles\",\"attrs\":{\"src\":\"Module:Infobox/styles.css\"},\"body\":{\"extsrc\":\"\"},\"parts\":[{\"template\":{\"target\":{\"wt\":\"Infobox person\\n\",\"href\":\"./Template:Infobox_person\"},\"params\":{\"name\":{\"wt\":\"Josephine Baker\"},\"image\":{\"wt\":\"File:Baker Harcourt 1940 2.jpg\"},\"caption\":{\"wt\":\"Baker in 1940\"},\"birth_name\":{\"wt\":\"Freda Josephine McDonald\"},\"birth_date\":{\"wt\":\"{{birth date|mf=yes|1906|06|03}}\"},\"birth_place\":{\"wt\":\"[[St. Louis]], Missouri, U.S.\"}
[...]
{{Infobox person\n| name               = Josephine Baker\n| image              = File:Baker Harcourt 1940 2.jpg\n| caption            = Baker in 1940\n| birth_name         = Freda Josephine McDonald\n| birth_date         = {{birth date|mf=yes|1906|06|03}}\n| birth_place        = [[St. Louis]], Missouri, U.S.\n| [...]
"infoboxes": [{
  "name": "Infobox person",
  "type": "infobox",
  "has_parts": [
    {
      "name": "Josephine Baker",
      "type": "section",
      "has_parts": [
        {
          "type": "image",
          "value": "Baker in 1940",
          "images": [
            {
              "content_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Baker_Harcourt_1940_2.jpg/250px-Baker_Harcourt_1940_2.jpg",
              "caption": "Baker in 1940",
              "height": 250,
              "width": 250
            }
          ]
        },
        {
          "name": "Born",
          "type": "field",
          "value": "Freda Josephine McDonald June 3, 1906 St. Louis, Missouri, U.S.",
          "links": [
            {
              "url": "https://en.wikipedia.org/wiki/St._Louis",
              "text": "St. Louis"
            }
          ]
        },
        [...]

Article Sections

id=\"mwIQ\"\u003eDuring her early career, Baker was among the most celebrated performers to headline the revues of the \u003cspan title=\"French-language text\" about=\"#mwt45\" typeof=\"mw:Transclusion\" data-mw='{\"parts\":[{\"template\":{\"target\":{\"wt\":\"lang\",\"href\":\"./Template:Lang\"},\"params\":{\"1\":{\"wt\":\"fr\"},\"2\":{\"wt\":\"[[Folies Bergère]]\"},\"italic\":{\"wt\":\"no\"}},\"i\":0}}]}' id=\"mwIg\"\u003e\u003cspan lang=\"fr\" style=\"font-style: normal;\"\u003e\u003ca rel=\"mw:WikiLink\" href=\"./Folies_Bergère\" title=\"Folies Bergère\"\u003eFolies Bergère\u003c/a\u003e\u003c/span\u003e\u003c/span\u003e\u003clink rel=\"mw:PageProp/Category\" href=\"./Category:Articles_containing_French-language_text\" about=\"#mwt45\" id=\"mwIw\"/\u003e in \u003ca rel=\"mw:WikiLink\" href=\"./Paris\" title=\"Paris\" id=\"mwJA\"\u003eParis\u003c/a\u003e.[...]
\n\nDuring her early career, Baker was among the most celebrated performers to headline the revues of the {{lang|fr|[[Folies Bergère]]|italic=no}} in [[Paris]].[...]
"sections": [{
  "type": "paragraph",
  "value": "During her early career, Baker was among the most celebrated performers to headline the revues of the Folies Bergère in Paris. [...]",
  "links": [
    {
      "url": "https://en.wikipedia.org/wiki/Folies_Bergère",
      "text": "Folies Bergère"
    },
    [...]
  ],
  "citations": [
    {
      "identifier": "cite_note-4",
      "text": "[4]"
    },
    [...]
  ]
}]

Article Main Image

<tr><td colspan=\"2\" class=\"infobox-image\"><span class=\"mw-default-size\" typeof=\"mw:File/Frameless\"><a href=\"./File:Baker_Harcourt_1940_2.jpg\" class=\"mw-file-description\"><img resource=\"./File:Baker_Harcourt_1940_2.jpg\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Baker_Harcourt_1940_2.jpg/250px-Baker_Harcourt_1940_2.jpg\" decoding=\"async\" data-file-width=\"540\" data-file-height=\"756\" data-file-type=\"bitmap\" height=\"350\" width=\"250\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Baker_Harcourt_1940_2.jpg/500px-Baker_Harcourt_1940_2.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Baker_Harcourt_1940_2.jpg/500px-Baker_Harcourt_1940_2.jpg 2x\" class=\"mw-file-element\"/></a></span><div class=\"infobox-caption\">Baker in 1940</div></td></tr> [...]
| image = File:Baker Harcourt 1940 2.jpg\n| caption = Baker in 1940\n| [...]
"image": {
  "content_url": "https://upload.wikimedia.org/wikipedia/commons/0/0b/Baker_Harcourt_1940_2.jpg",
  "width": 540,
  "height": 756
}

Article Lists

This list originally comes from the Paralympic powerlifting page on English Wikipedia.

<section data-mw-section-id=\"1\" id=\"mwGw\"><h2 id=\"History\">History</h2>\n<ul id=\"mwHA\"><li id=\"mwHQ\">1964–1984: Wheelchair Powerlifting</li>\n<li id=\"mwHg\">1984–2016: Paralympic Powerlifting / IPC Powerlifting</li>\n<li id=\"mwHw\">2017–present: Para Powerlifting</li></ul>\n\n</section>
==History==\n* 1964–1984: Wheelchair Powerlifting\n* 1984–2016: Paralympic Powerlifting / IPC Powerlifting\n* 2017–present: Para Powerlifting\n\n
{
  "name": "History",
  "type": "section",
  "has_parts": [
  {
    "type": "list",
    "has_parts": [
    {
      "type": "list_item",
      "value": "1964–1984: Wheelchair Powerlifting"
    },
    {
      "type": "list_item",
      "value": "1984–2016: Paralympic Powerlifting / IPC Powerlifting"
    },
    {
      "type": "list_item",
      "value": "2017–present: Para Powerlifting"
    }]
  }]
}

Article Tables

[...]<table class=\"wikitable sortable\" id=\"mwBpA\">\n<caption id=\"mwBpE\">Film credits for Josephine Baker</caption>\n<tbody id=\"mwBpI\"><tr id=\"mwBpM\">\n<th scope=\"col\" id=\"mwBpQ\">Year</th>\n<th scope=\"col\" id=\"mwBpU\">Title</th>\n<th scope=\"col\" id=\"mwBpY\">Role</th>\n<th scope=\"col\" class=\"unsortable\" id=\"mwBpc\">Notes</th>\n<th scope=\"col\" class=\"unsortable\" id=\"mwBpg\"><abbr title=\"Reference\" about=\"#mwt748\" typeof=\"mw:Transclusion mw:ExpandedAttrs\" data-mw='{\"attribs\":[[{\"txt\":\"title\"},{\"html\":\"&lt;span typeof=\\\"mw:Nowiki\\\" data-parsoid=\\\"{}\\\">Reference&lt;/span>\"}]],\"parts\":[{\"template\":{\"target\":{\"wt\":\"abbr\",\"href\":\"./Template:Abbr\"},\"params\":{\"1\":{\"wt\":\"Ref.\"},\"2\":{\"wt\":\"Reference\"}},\"i\":0}}]}' id=\"mwBpk\">Ref.</abbr></th></tr>\n<tr[...]
[...]\n\n== Film credits ==\n{| class=\"wikitable sortable\"\n|+Film credits for Josephine Baker\n|-\n! scope=\"col\"| Year\n! scope=\"col\"| Title\n! scope=\"col\"| Role\n! scope=\"col\" class=\"unsortable\"| Notes\n! scope=\"col\" class=\"unsortable\"| {{abbr|Ref.|Reference}}\n|-\n!scope=row| 1927\n| {{lang|fr|La Sirène des Tropiques}} (''[[Siren of the Tropics]]'')\n| Papitou\n| [[silent film]]\n|align=\"center\" |{{sfnp|Bergfelder|Harris|Street|2007|p=193}}{{sfnp|Francis|2021|p=68}}\n|-\n!scope=row| 1927\n| {{lang|de|Die Frauen von Folies Bergères}} (''[[The Woman from the Folies Bergères]]'')\n|\n| [[silent film]]\n|align=\"center\" |[...]
"tables": [
  {
    "identifier": "film_credits_table1",
    "headers": [
      [
        { "value": "Year" },
        { "value": "Title" },
        { "value": "Role" },
        { "value": "Notes" },
        { "value": "Ref." }
      ]
    ],
    "rows": [
      [
        { "value": "1927" },
        { "value": "La Sirène des Tropiques (Siren of the Tropics)" },
        { "value": "Papitou" },
        { "value": "silent film" },
        { "value": "" }
      ],
      [
        { "value": "1927" },
        { "value": "Die Frauen von Folies Bergères (The Woman from the Folies Bergères)" },
        { "value": "" },
        { "value": "silent film" },
        { "value": "" }
      ],[...]
    ],
    "confidence_score": 0.8
  }
],

How to Access Structured Contents

Structured Contents is currently available in two of our APIs:

On-demand API: Request individual articles from any project with structured JSON. Best for testing, post-training, or lightweight use.

Snapshot API: Get a compressed file of all articles in a project as structured JSON snapshots. Best for pre-training, indexing, and high-scale applications.

Wikimedians can also access beta Structured Contents through their Wikimedia Cloud Services accounts.

Shaping Structured Contents Together

We continue to build on top of recent Structured Contents releases, in response to feedback you’ve given us: we’re improving our recent updates to tables and references so they are parsed more in line with user expectations. We’re also working to expand coverage to more Wikipedia languages.

In order to help us strengthen current features and shape new ones we welcome and encourage feedback on Structured Contents. If you have a use case that warrants adding support for a new language to our Structured Contents coverage, please let us know.

Signing up for a free API account provides the latest features, but to make experimentation easy we have also shared French and English Structured Contents snapshots on open dataset platforms Hugging Face and Kaggle.

Structured Contents Payload Example

Our Structured Contents endpoints have the same familiar structure as our production responses, but also include beta fields and objects parsed from raw article data. Parsed objects that are unique to Structured Contents are: infoboxes, sections, description, references, and tables.

The On-demand API Structured Contents endpoint is freely available. Snapshot API Structured Contents dumps are available upon request.

Example: Run this cURL command with your access token (see auth docs) to get the Structured Contents response from the live English Josephine Baker Wikipedia article as seen here →

curl --location 'https://api.enterprise.wikimedia.com/v2/structured-contents/Josephine_Baker' --header 'Content-Type: application/json' --header 'Authorization: Bearer ACCESS_TOKEN' --data '{"filters":[{"field":"is_part_of.identifier","value":"enwiki"}]}'

For a full breakdown and explanation of all Structured Contents response fields, consult our Data Dictionary.

More questions? – We’re here to help.

[{
  "name": "Josephine Baker",
  "identifier": 255083,
  "abstract": "Freda Josephine Baker, naturalized as Joséphine Baker, was an...",
  "version": {...},
  "url": "https://en.wikipedia.org/wiki/Josephine_Baker",
  "date_created": "2003-06-29T19:16:19Z",
  "date_modified": "2025-09-08T23:58:22Z",
  "main_entity": {
    "identifier": "Q151972",
    "url": "https://www.wikidata.org/entity/Q151972"
  },
  "is_part_of": {...},
  "additional_entities": [...],
  "in_language": {...},
  "image": {...},
  "license": [...],
  "description": "American-born French entertainer (1906–1975)",
  "infoboxes": [
    {
      "name": "Infobox person",
      "type": "infobox",
      "has_parts": [
        {
          "name": "Josephine Baker",
          "type": "section",
          "has_parts": [
            {
              "type": "image",
              "value": "Baker in 1940",
              "images": [...]
            },
            {
              "name": "Born",
              "type": "field",
              "value": "Freda Josephine McDonald June 3, 1906 St. Louis, Missouri, U.S.",
              "links": [...]
            },
            {
              "name": "Died",
              "type": "field",
              "value": "April 12, 1975 (aged 68) Paris, France"
            },{...}
          ]
        },{...}
      ]
    },{...}
  ],
  "sections": [
    {
      "name": "abstract",
      "type": "section",
      "has_parts": [
        {
          "type": "paragraph",
          "value": "Freda Josephine Baker (née McDonald; June 3, 1906 – April 12, 1975), naturalized as...",
          "links": [...],
          "citations": [
            {
              "identifier": "cite_note-3",
              "text": "[3]"
            }
          ]
        },{...}
      ]
    },
    {
      "name": "film_credits",
      "type": "section",
      "has_parts": [
        {
          "type": "table",
          "table_references": [
            {
              "identifier": "film_credits_table1",
              "confidence_score": 0.8
            }
          ]
        }
      ]
    }
  ],
  "tables": [
    {
      "identifier": "film_credits_table1",
      "headers": [
        [
          { "value": "Year" },
          { "value": "Title" },
          { "value": "Role" },
          { "value": "Notes" },
          { "value": "Ref." }
        ]
      ],
      "rows": [
        [
          { "value": "1927" },
          { "value": "La Sirène des Tropiques (Siren of the Tropics)" },
          { "value": "Papitou" },
          { "value": "silent film" },
          { "value": "" }
        ],
        [
          { "value": "1927" },
          { "value": "Die Frauen von Folies Bergères (The Woman from the Folies Bergères)" },
          { "value": "" },
          { "value": "silent film" },
          { "value": "" }
        ],[...]
      ],
      "confidence_score": 0.8
    }
  ],
  "references": [
    {
      "identifier": "cite_note-3",
      "type": "book",
      "metadata": {
        "first": "Kathryn",
        "isbn": "978-1-55652-961-0",
        "last": "Atwood",
        "page": "77",
        "publisher": "Chicago Review Press",
        "title": "Women Heroes of World War II",
        "year": "2011"
      },
      "text": {
        "value": "Atwood, Kathryn (2011). Women Heroes of World War II. Chicago Review Press. p. 77. ISBN 978-1-55652-961-0.",
        "links": [
          {
            "url": "https://en.wikipedia.org/wiki/ISBN_(identifier)",
            "text": "ISBN"
          },
          {
            "url": "https://en.wikipedia.org/wiki/Special:BookSources/978-1-55652-961-0",
            "text": "978-1-55652-961-0"
          }
        ]
      }
    },{...}
  ]
}]

See the Production Payload Example