- Currently, many of the largest large language models, like OpenAI's GPT and Anthropic's Claude, use data centers based in the U.S. to store data and process requests via the cloud.
- This has led to concern from politicians and regulators in Europe, who see this dependence on U.S. technology as harmful to the continent's competitiveness.
- Enter "sovereign" AI: the idea that AI services in a given jurisdiction should be built upon data from within that region so results are grounded in local language and culture.
LISBON, Portugal — Tech giants are increasingly investing in the development of so-called "sovereign" artificial intelligence models as they seek to boost competitiveness by focusing more on local infrastructure.
Data sovereignty refers to the idea that people's data should be stored on infrastructure within the country or continent they reside in.
"Sovereign AI is a relatively new term that's emerged in the last year or so," Chris Gow, IT networking giant Cisco's Brussels-based EU public policy lead, told CNBC.
Get top local stories in Connecticut delivered to you every morning. >Sign up for NBC Connecticut's News Headlines newsletter.
Currently, many of the biggest large language models (LLMs), like OpenAI's ChatGPT and Anthropic's Claude, use data centers based in the U.S. to store data and process requests via the cloud.
This has led to concern from politicians and regulators in Europe, who see dependence on U.S. technology as harmful to the continent's competitiveness — and, more worryingly, technological resilience.
Where did 'AI sovereignty' come from?
Money Report
The notion of data and technological sovereignty is something that has previously been on Europe's agenda. It came about, in part, as a result of businesses reacting to new regulations.
The European Union's General Data Protection Regulation, for example, requires companies to handle user data in a secure, compliant way that respects their right to privacy. High-profile cases in the EU have also raised doubts over whether data on European citizens can be transferred across borders safely.
The European Court of Justice in 2020 invalidated an EU-U.S. data-sharing framework, on the grounds that the pact did not afford the same level of protection as guaranteed within the EU by the General Data Protection Regulation (GDPR). Last year the EU-U.S. Data Privacy Framework was formed to ensure that data can flow safely between the EU and U.S.
These political development have ultimately resulted in a push toward localization of cloud infrastructure, where data is stored and processed for many online services.
Filippo Sanesi, global head of startup marketing and operations at OVHCloud, said the French cloud firm is seeing lots of demand for its European-located infrastructure, as they "understand the value of having their data in Europe, which are subject to European legislation."
"As this concept of data sovereignty becomes more mature and people understand what it means, we see more and more companies understanding the importance of having your data locally and under a specific jurisdiction and governance," Sanesi told CNBC. "We host a lot of data," he added. "This data is sovereign in specific countries, under specific regulations."
"Now, with this data, you can actually make products and services for AI, and those services should then be sovereign, should be controlled, deployed and developed locally by local talent for the local population or businesses."
The AI sovereignty push hasn't been driven forward by regulators — at least, not yet, according to Cisco's Gow. Rather, it's come from private companies, which are opening more data centers — facilities containing vast amounts of computing equipment to enable cloud-based AI tools — in Europe, he said.
Sovereign AI is "more driven by the industry naming it that, than it is from the policymakers' side," Gow said. "You don't see the 'AI sovereignty' terminology used on the regulator side yet."
Countries are pushing the idea of AI sovereignty because they recognize AI is "the future" and a "massively strategic technology," Gow said.
Governments are focusing on boosting their domestic tech companies and ecosystems, as well as the all-important backend infrastructure that enables AI services.
"The AI workload uses 20 times the bandwidth of a traditional workload," Gow said. It's also about enabling the workforce, according to Gow, as firms need skilled workers to be successful.
Most important of all, however, is the data. "What you're seeing is quite a few attempts from that side to think about training LLMs on localized data, in language," Gow said.
'Reflecting values'
In Italy, the first LLM trained specifically on the Italian language data, called Italia 9B, launched this summer.
The aim of the Italia project is to store results in a given jurisdiction and rely on data from citizens within that region so that results produced by the AI systems there are more grounded in local languages, culture and history.
"Sovereign AI is about reflecting the values of an organization or, equally, the country that you're in and the values and the language," David Hogan, EMEA head of enterprise sales for chipmaking giant Nvidia, told CNBC.
"The core challenge is that most of the frontier models today have been trained primarily on Western data generally," Hogan added.
In Denmark for example, where Nvidia has a major presence, officials are concerned about vital services such as health care and telecoms being delivered by AI systems that aren't "reflective" of local Danish culture and values, according to Hogan.
On Wednesday, Denmark laid out a landmark white paper outlining how companies can use AI in compliance with the incoming EU AI Act — the world's first major AI law. The document is meant to serve as a blueprint for other EU nations to follow and adopt.
"If you're in a European country that's not one of the major language countries that's spoken internationally, probably less than 2% of the data is trained on your language -- let alone your culture," Hogan said.
How regulation fueled a mindset shift
That's not to say regulations haven't proven an important factor in getting tech giants to think more about building localized AI infrastructure within Europe.
OVHCloud's Sanesi said regulations like the EU's GDPR catalyzed a lot of the interest in onshoring the processing of data in a given region.
The concept of AI sovereignty is also getting buy-in from local European tech firms.
Earlier this week, Berlin-headquartered search engine Ecosia and its Paris-based peer Qwant announced a joint venture to develop a European search index from scratch, aiming to serve improved French and German language results.
Meanwhile, French telecom operator Orange has said it's in discussions with a number of foundational AI model companies about building a smartphone-based "sovereign AI" model for its customers that more accurately reflects their own language and culture.
"It wouldn't make sense to build our own LLMs. So there's a lot of discussion right now about, how do we partner with existing providers to make it more local and safer?" Bruno Zerbib, Orange's chief technology officer, told CNBC.
"There are a lot of use cases where [AI data] can be processed locally [on a phone] instead of processed on the cloud," Zerbib added. Orange hasn't yet selected a partner for these sovereign AI model ambitions.