Focus On AI

Open Science In The Age Of AI

Getting their research peer reviewed and published in a respected scientific journal has long been the gold standard for scientists. But too often, , this process fails to truly advance science because no one else can access or interrogate the underlying data and build on it, argues neuroscientist Sean Hill.

“We need to use this data to build the next iteration of science,” he says. “ The fact that we can’t is a major barrier. Every year, billions of dollars in research value are lost because scientific data is difficult to find, access, and reuse. In fact, most scientific data disappears after only a few years. That drives me crazy. We have to change this and find a scalable way to solve the problem.”

To address this, Hill co-founded Senscience, the AI-driven initiative behind Frontiers’ new FAIR² Data Management service, launched on March 3. Its mission is to make research data AI-ready, aligned with Responsible AI principles, and structured for deep scientific reuse. The goal is to enable open-source science and ultimately help the pharmaceutical and other industries find and leverage AI-ready research data, he says.

The profound impact of AI on the pursuit of science is already evident. The 2024 Nobel Prizes in both physics and chemistry were awarded to pioneers of AI-driven research, underscoring how AI is no longer a peripheral tool, but a central engine of discovery, notes a recent report by the Tony Blair Institute for Global Change. The report notes that advanced AI models are driving groundbreaking discoveries that push the boundaries of scientific knowledge in specific domains from AlphaFold’s revolutionary breakthrough in protein-structure prediction, to materials discovery, toxicity prediction in drug discovery and predictive modelling in climate science, these domain-specific innovations are redefining what is possible and accelerating the pace at which society’s most pressing challenges can be addressed.

AI is essential for addressing critical challenges across health, the climate and security, but we can only leverage these breakthroughs if the data is AI-ready, says Hill.

Researchers waste valuable time cleaning data instead of making discoveries and rarely receive credit for the data they generate, says Hill. Meanwhile, funders are increasingly demanding that researchers publish their data, but they lack the tools to comply. Without scalable solutions, vast pools of knowledge remain locked away, stalling scientific progress.

Increasing The Quality Of Science

Senscience is an AI venture of Switzerland-based scientific publisher Frontiers, which was founded by neuroscientists Henry and Kamila Markram with the stated goal of accelerating collaboration and increasing the quality of science across all academia through open science.

Henry Markram, a leading figure in brain simulation, is the founder of the Blue Brain Project, which created detailed digital replicas of the brain, and founder of the Human Brain Project, a major EU initiative to advance understanding of the human brain. With over 450 publications and approximately 55,000 citations, his work significantly influences the fields of brain architecture and how the brain learns. Markram also established the Open Brain Institute, a not-for-profit foundation, to democratize access to brain simulation through virtual laboratories, making the tools and data available to allow researchers worldwide to simulate the brain. He is now focused on developing artificial general intelligence in inait, a company he formed to focus on teaching digital brains to acquire skills, continuing his mission to unlock the full potential of the brain.

Hill worked on the Blue Brain Project with Henry Markram. During the project they constantly struggled with how to organize petabytes of neuroscience data. “The challenge was how to organize diverse scientific data so we could  receive it and combine it and use it for machine learning pipelines to build a brain,” says Hill, a serial entrepreneur.  “It’s not like sharing text or numbers,” he says. “Scientific data is far more complicated because all the details matter.” For example, during the Blue Brain Project different groups of scientists would trace neurons using different methods. To image the brain one group would inject a dye and trace the neurons manually and another group would reconstruct neurons using totally different techniques. “It was the same type of neuron, the same species, and the same binary format and yet if you assume they are the same you would fail to build an accurate model of a neuron,” says Hill. “It is the subtleties that really matter to ensure valid scientific insights.”

After nearly a decade of trying different database solutions and failing Hill led the development of a platform called Blue Brain Nexus, a flexible knowledge graph data structure that could handle distributed data and capture all the details of each piece of scientific data. At the time knowledge graphs were not widely used. Today they are often used to integrate heterogeneous data and knowledge (i.e. data models such as ontologies, schemas) coming from different sources and often with different formats (i.e. structured, unstructured). The latest iteration of Blue Brain Nexus now forms the backbone of Senscience.

FAIR² Data Sharing

For years the FAIR principles (Findable, Accessible, Interoperable, Reusable) have provided a foundation for research data sharing. However, as machine learning and AI become an increasingly important tool in scientific research, data must be structured for both humans and machines.

Senscience says FAIR² Data Management goes beyond the FAIR principles by providing an AI-powered solution that transforms research data into a structured, machine-actionable resource, ensuring data is richly documented and linked to provenance, methodology, and a detailed data dictionary, creating a context-rich representation of each dataset.  It leverages an AI data steward to automate data organization, improve usability, and assist with governance.

Senscience’s open specification, FAIR² (see fair2.ai), is compatible with MLCommons Croissant, a high-level format for machine learning datasets that combines metadata, resource file descriptions, data structure, and default ML semantics into a single file. It also integrates with TensorFlow, JAX, and PyTorch, enabling AI-driven analysis and easy sharing on Kaggle and Hugging Face, amplifying its impact across disciplines, says Hill.

Researchers benefit from an AI-assisted workflow that streamlines data preparation and sharing, turning their datasets into a FAIR² Data Package, an interactive exploration portal, and a peer-reviewed FAIR² data article in a Frontiers journal—increasing visibility, recognition, and citations.

“If someone has already finished a scientific study, we can make the data available in usable form by dragging and dropping their manuscripts, spreadsheets, and additional files into our platform,” says Hill. “Our AI data steward cleans the data, visualizes it in a data portal and generates adata article draft that they edit and approve before it is submitted to Frontiers.”

This gives researchers not only the classical value of a peer reviewed publication but a way to create an interactive data portal, allowing others to interact with that data and use the AI chat to ask questions of the data, says Hill. In addition, an AI-generated podcast, and integration with Python and Jupyter Notebooks, allow researchers to interact with and analyze their data in completely new ways. “This creates huge opportunities for collaboration and scientific advancement,” he says.

The first peer-reviewed FAIR² Data Article and FAIR² Data Portal, published March 3, showcase what Senscience offers. Led by Dr. Ángel Borja of AZTI Foundation (Spain), this dataset—spanning nearly three decades of marine biodiversity monitoring in the Basque Country, managed by the Basque Water Agency (URA)—has been curated using FAIR², transforming long-term environmental data in Spanish into an AI-ready English language resource.

“AI-assisted curation is a game changer,” Borja said in a statement. “ AI-assisted metadata creation makes ocean sustainability research more accessible, providing scientists, managers, and decision-makers with faster, more accurate insights.”

For now, Senscience operates as an AI venture within Frontiers. But the aim is to become an independent company. “Science thrives when data is open and accessible,” says Hill. “We want FAIR² to be the standard across publishers, researchers, and industries—not confined to a single organization.

“Other sectors are also eager for a solution like this,” he says. “For example, pharma companies want to be able to organize protocols and the data produced by those protocols so they can find, assess and use data in a meaningful way.”

During the current pilot period Senscience is waiving fees. Eventually it plans to charge for its services.

‘Everyone sees the value of making research data AI-accessible and computable,’ says Hill. ‘The ability to structure and reuse data at scale will transform scientific discovery.

This article is content that would normally only be available to subscribers. Sign up for a four-week free trial to see what you have been missing.

To access more of The Innovator’s Focus On AI stories click here.

About the author

Jennifer L. Schenker

Jennifer L. Schenker, an award-winning journalist, has been covering the global tech industry from Europe since 1985, working full-time, at various points in her career for the Wall Street Journal Europe, Time Magazine, International Herald Tribune, Red Herring and BusinessWeek. She is currently the editor-in-chief of The Innovator, an English-language global publication about the digital transformation of business. Jennifer was voted one of the 50 most inspiring women in technology in Europe in 2015 and 2016 and was named by Forbes Magazine in 2018 as one of the 30 women leaders disrupting tech in France. She has been a World Economic Forum Tech Pioneers judge for 20 years. She lives in Paris and has dual U.S. and French citizenship.