Partial view of the Mandelbrot set. Step 4 of a zoom sequence: The central endpoint of the “seahorse tail” is also a Misiurewicz point.

Wikimedia Enterprise & Pleias Partner for Ethical AI Innovation

As AI evolves, access to high-quality and ethically sourced data is more critical than ever. Wikimedia Enterprise and French AI startup Pleias have joined forces to show that structured, machine-readable knowledge can drive AI innovation while upholding openness, verifiability, and ethical development.

Pleias is a Franco-German startup developing AI act compliant open source foundation models, combining energetically efficient small language and vision models (<3b) into orchestrated building blocks for a secure document processing assistant running locally, on device and on prem. Pleias trains its models exclusively on data with permissive licence, thus allowing full auditability alongside excellent performance on multilingual complex documents.

Harnessing Open, Structured Data

“Pleias has developed the pioneering suite of Large Language Models trained exclusively on permissively licensed content. Through a strategic partnership with Wikimedia Enterprise, Pleias has enhanced its 2 trillion token open corpus by incorporating Wikimedia’s unique structured dataset, which spans multiple languages and has been instrumental in the model’s annealing phase. This high-quality dataset was specifically chosen for the annealing process, as it demands the most refined training data and has demonstrated significant improvements in model performance”, explains CEO and Co-Founder, Anastasia Stasenko.  

Pleias has leveraged Wikimedia Enterprise’s structured datasets to develop verifiable language models, multilingual content enriched with metadata and credibility signals like RevertRisk, pre-parsed infoboxes, sections, and summaries eliminate the need for complex preprocessing, allowing AI models to focus on accuracy and reliability.

Their approach moves beyond the “bigger is better” mindset, focusing on high-quality, verifiable data to build more reliable AI systems.

Why Wikimedia Enterprise?

Wikimedia Enterprise transforms Wikipedia’s vast, collaborative knowledge into supported, structured, machine-readable data, enabling AI developers to access reliable content with built-in credibility signals across multiple languages, support, and real-time updates. Our APIs provide seamless access to high-quality data for LLM training, retrieval-augmented generation (RAG), and knowledge graphs, enabling developers to build efficient, reliable AI solutions at scale.

Shaping the Future of AI

The right data is key to building ethical and responsible AI. Wikimedia Enterprise provides structured, verifiable knowledge that fuels AI innovation while maintaining data integrity.

📚 Learn more about Pleias’ work and datasets on Hugging Face 🤗

If your organization seems like a good fit to work with us, please contact us!

— Wikimedia Enterprise Team