On May 27 French President Emmanuel Macron and German Chancellor Olaf Scholz published an op-ed in the Financial Times that focused on a plan to make Europe a strong world-class industrial and technological leader, while simultaneously making the EU the first climate neutral continent.
“Together, we will advocate to strengthen the EU’s sovereignty and reduce our critical dependencies…,” wrote the French and German leaders. “With an ambitious industrial policy, we can enable the development and rollout of key technologies of tomorrow, such as AI, quantum technologies, space, 5G/6G, biotechnologies, Net Zero technologies, mobility and chemicals. We call for strengthening the EU’s technological capabilities by promoting cutting-edge research and innovation and necessary infrastructures, including those regarding artificial intelligence and health.”
The op-ed followed a meeting convened on May 21 by Macron of France’s top AI talents, which focused in part on using open source to reinforce Europe’s technological autonomy, arguing that an open approach was the best way to make the necessary building blocks available to create sovereign AI for Europe.
France’s national 2030 program, which treats AI as both an innovation accelerator and differentiator, has set aside a budget of €32 million to develop and maintain open source tools. Part of that budget is going to a startup called Probabl, a spin-off of French research center Inria that has been financing a global open source data science library called scikit-learn, an approach widely used for performing complex AI and machine learning tasks.
“The reinforcement of the scikit-learn open source library is “particularly targeted” for support, says a recently released French government press kit about France’s 2030 program and its AI strategy.
Only two other such libraries exist in the world, at that scale: one created by Google, called Tensorflow and another called PyTorch created by Meta, which powers, among other things, OpenAI’s ChatGPT and Tesla’s autopilot. While not covering the same machine learning techniques, the open source scikit-learn library is ahead of both in popularity and usage.
According to independent measurement by Pypistats.org the open source scikit-learn library backed by Inria, supported by a global community of contributors, and now overseen by Probabl, has been downloaded 1.5 billion times, averaging 65 million per month (22% from the U.S., 25% from China and 3% from France). It creates more dependencies than PyTorch and Tensorflow combined. Dependencies are the number of projects and packages that depend on scikit-learn, i.e. where scikit-learn is a core component that helps build additional value. Since it’s middleware and popular, it is foundational to nearly a million projects and 15,000 “packages” (i.e. projects that are more structured and significant).
“We need to take advantage of open source to minimize dependencies on monolithic, proprietary and captive technologies,” says serial entrepreneur and Probabl CEO Yann Lechelle. “The U.S. enjoys near supremacy when it comes to chips, cloud and software. This isn’t great for Europe or any other nation for that matter. If you don’t control your technological infrastructure, you have no sovereignty.”
Scikit-learn, Probabl’s capstone software, is a Python library, which is widely used by machine learning teams working on tabular and quantitative data. The approach specializes in the resolution of a large range of problems, notably classification, regression, regroupment, and dimension reduction. Scikit-learn can handle diverse algorithms, ranging from traditional statistics models to neural networks. The number one use-case is health, i.e. accelerating the discovery of medicine and identifying patterns and symptoms, says Lechelle. Other use cases include fraud detection for the financial services industry, logistics optimization and forecasting. “Scikit-learn is ideal for everything that looks like a spreadsheet,” says Lechelle. “Whether there is a stream of data or patterns of behavior or logs this is the best tool because it can pinpoint and create probabilistic and predictive models out of all of those things.”
The company, which was launched February 1 by a team of 14 including 10 from Inria and Lechelle, the former CEO of cloud hosting company Scaleway, and founder of five other startups, is intent on keeping the software library open-source. It has published a manifesto that says:
- Individuals, academics and researchers, engineers and data scientists, small companies and large enterprises alike, as well as nation states should be on a level playing field and have unbridled access to such resources and technologies.
- Users should choose how they are deployed, on-premises or in a cloud-agnostic way.
- Users should not be locked-in with any provider.
- The open-source approach is a way to maximize adoption, trust and eco-systemic value creation.
- Choosing open-source strengthens data and operational sovereignty at all levels.
The company has inscribed its open source mission into its bylaws, a rare occurrence and a strong signal to all stakeholders, providing much needed alignment over the long term, says Lechelle.
“What is interesting about our approach is that you can do anything you want with the library,” he says. “We have a very permissive license.”
While Scikit-learn was largely developed by researchers and research engineers working at Inria, it has received financial support in the form of donations from BNP Paribas Cardif, Chanel, AXA and the startups Hugging Face and Dataiku.
Now that the library is being overseen by Probabl, a for-profit company, the hope is that corporate buyers and the government will pay for the commercial managed software-as-a-service (SaaS) model that is being built on top of scikit-learn. Probabl’s commercial activities will include training, support, certification, hosting and providing managed services and professional services for both government and big corporate clients, says Lechelle.
It is currently in the process of raising money from private investors to augment the funding from the French government.
Probabl is one of a growing number of private French companies targeting AI. A company called H, founded by former Google Deepmind scientists, which claims it has developed a more powerful and efficient way to build foundational models, has just raised a $220 million seed financing round that includes European, U.S. and Asian investors as well as the French government. France now has more than 600 AI startups and 76 of them are focused on GenAI; 50% are profitable or envision that they will be in the next three years, according to the French government. Few are open source.
What Does Openness Mean In The Age Of AI?
During earlier days of the Internet, open source – technologies that users can download and use the source code for free – played a core role in promoting innovation and safety. Open source technology provided a core set of building blocks that software developers have used to do everything from create art to design vaccines to develop apps that are used by people all over the world; it is estimated that open source software is worth over $8 trillion in value, says a report entitled The Colombia Convening On Openness and AI compiled by Mozilla and the Institute of Global Politics.
Today, open source approaches for artificial intelligence — and especially for foundation models — offer the promise of similar benefits to society, says the report. However, defining and empowering “open source” for foundation models has proven tricky, given its significant differences from traditional software development.
Indeed, critics say OpenAI’s name is a misnomer as its products are closed source and proprietary. Meanwhile, MistralAI, a French startup created with the promise of creating a European champion that could compete against U.S. giants like ChatGPT creator OpenAI, with an open-source product that reflects the Continent’s push for more transparency in the sector, has disappointed some supporters. Mistral AI is partnering with Microsoft—( the tech giant took a stake in Mistral)— to distribute its new large language model (LLM), Mistral Large, which is not open source. Unlike Mistral’s first open-source model releases, Mistral Large is a ready-made API that can be used for a fee and gives no access to the code.
The Colombia Convening on Openness and AI report seeks to define openness in the age of AI, a move welcomed by Lechelle.
“A nuanced framework will help structure the conversation and avoid “open source washing”, the Probabl CEO wrote on LinkedIn. “Open source should normally be a strict definition, i.e. every item should be provided so that from the source (code and data), it is possible to recompile/rebuild the output. For better or worse, the term “open sourcing” has become a common verb, signifying “putting something out there in the open to let people play with”. With deep-learning and the encoding of massive knowledge bases, some LLMs have been released as ‘open source’ without paying much attention to the semantic of the term… Let’s avoid concentration of knowledge by just a handful of companies.”
Probabl’s mission “is to build, sustain and maintain open source libraries for data scientists but we are also a for-profit company that is building a product around it with added value for data scientists,” says Lechelle. “We will offer paid and non-paid access. Our approach is reversible so if it some point a data scientist has more time than money they can revert to free.”
In both cases, what Probabl offers its clients is sovereignty, says Lechelle. ”Our tagline is: Own your Data Science.”
Lechelle says he is out to prove that an open source approach to AI can be profitable. Its inspiration is Red Hat, an American open source software company, which did just that. It was sold to IBM in 2019 for $34 billion.
“We are a hybrid play, one where economic interest binds with public interest, yielding both financial and societal dividends,” says Lechelle. “We want to show that another way is possible to distribute wealth and impact.”
If it succeeds, Probabl’s library could end up helping Europe achieve its goal of technology sovereignty and pave the way for an alternative to the offerings of U.S. and Chinese AI giants.
This article is content that would normally only be available to subscribers. Sign up for a four-week free trial to see what you have been missing.
To read more of The Innovator’s Focus On AI articles click here.