An AI-powered transcription tool widely used in the medical field to help doctors communicate with their patients touts itself as having “human-level robustness and accuracy” But the tool sometimes invents things that no one ever said, posing potential risks to patient safety, and deletes the underlying audio from which the transcriptions are generated, leaving medical staff no way to verify their accuracy, AP News reported on October 26.
Meanwhile, Wired reported October 24 that AI-enhanced search engines from Google, Microsoft, and Perplexity have been surfacing debunked and racist research claiming genetic superiority of white people over other racial groups. This finding, revealed through investigative work by Hope Not Hate, a UK-based anti-racism organization, has added to concerns about racial bias and radicalization in AI-powered search.
Both stories illustrate the gulf between AI hype and reality as well as the dangers of overestimating the technology.
Careless Whisper: Speech-to-Text Hallucination Harms
There has been a rush by medical centers to utilize AI to transcribe patients’ consultations, even though OpenAI, the developer of the underlying technology, has warned that the tool should not be used in “high-risk domains.”
OpenAI’s Whisper is integrated into medical transcription services from Nabla, which the company says are used by over 30,000 clinicians at more than 70 organizations. Nabla told AP its product had been used to transcribe around 7 million medical visits.
Whisper is also embedded in Microsoft’s and Oracle’s cloud computing platforms and integrated with certain versions of ChatGPT. Despite its wide adoption, researchers are now raising serious concerns about its accuracy.
In a study conducted by researchers from Cornell University, the University of Washington, and others, researchers discovered that Whisper “hallucinated” in about 1.4% of its transcriptions, sometimes inventing entire sentences, nonsensical phrases, or even dangerous content, including violent and racially charged remarks.
The study, Careless Whisper: Speech-to-Text Hallucination Harms, found that Whisper often inserted phrases during moments of silence in medical conversations, particularly when transcribing patients with aphasia, a condition that affects language and speech patterns.
In these cases, the AI sometimes fabricated unrelated phrases, invented fictional medications like “hyperactivated antibiotics” and even injected racial commentary into transcripts, AP reported.
Such mistakes could have “really grave consequences,” particularly in hospital settings, Alondra Nelson, who led the White House Office of Science and Technology Policy for the Biden administration until last year, told AP. “Nobody wants a misdiagnosis,” said Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey. “There should be a higher bar.”
Whisper also is used to create closed captioning for the deaf and hard of hearing — a population at particular risk for faulty transcriptions. That’s because the deaf and hard of hearing have no way of identifying fabrications “hidden amongst all this other text,” Christian Vogler, who is deaf and directs Gallaudet University’s Technology Access Program, was quoted as saying in the AP article.
OpenAI’s Whisper is being used in a slew of industries worldwide to translate and transcribe interviews and public meetings, generate text in popular consumer technologies and create subtitles for videos. These applications are also plagued with hallucinations.
A University of Michigan researcher conducting a study of public meetings, for example, said he found hallucinations in eight out of every 10 audio transcriptions he inspected, before he started trying to improve the model, according to the AP story. A machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisper transcriptions he analyzed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper.
The problems persist even in well-recorded, short audio samples, according to the AP story. That trend would lead to tens of thousands of faulty transcriptions over millions of recordings, researchers told the AP.
The prevalence of such hallucinations has led experts, advocates and former OpenAI employees to call for the federal government to consider AI regulations. At minimum, they said, OpenAI needs to address the flaw.
An OpenAI spokesperson told AP the company continually studies how to reduce hallucinations and appreciated the researchers’ findings, adding that OpenAI incorporates feedback in model updates.
Microsoft, which offers Whisper as part of its cloud computing services, advises companies incorporating it in the solutions they offer to “obtain appropriate legal advice to review your solution, particularly if you will use it in sensitive or high-risk applications.”
Disseminating Disinformation
Search engines are having accuracy problems of their own.
Patrik Hermansson, a researcher with anti-racism group Hope Not Hate, was investigating the resurgence of scientific racism when he found that AI-driven search engines often promote discredited “race science.”
When searching the average IQ scores in different nations, the Google’s AI-driven “Overviews” feature displayed figures derived from the work of Richard Lynn, a University of Ulster professor who died in 2023, which relies on dubious samples and questionable methodologies and has been used to support racial hierarchies.
A Wired investigation confirmed Hermansson’s findings and found that Microsoft’s Copilot—integrated into Bing— and Perplexity also referenced Lynn’s work when queried about IQ scores in various countries.
Lynn’s flawed research has long been used by far-right extremists, white supremacists, and proponents of eugenics as evidence that the white race is superior genetically and intellectually from non-white races. Experts worry that its promotion through AI could help radicalize others, says the Wired story.
The misinformation appeared in Google’s AI Overviews, which as launched earlier this year as part of the company’s effort to revamp its powerful search tool for the age of AI. For some queries, the tool, which is only available in certain countries right now, gives an AI-generated summary of its findings. The Tool pulls the information from the Internet and gives users the answers to queries without needs to click on a link.
When contacted by Wired, Google said it has guardrails and policies in place to protect against low quality responses and when it finds that Overviews don’t align with its polices it takes action. But even after Google removed the information Wired said that Overview still amplified the flawed figures from Lynn’s work in what is called a “featured snippet” which displays some of the text from a website before the link.
Google told Wired that part of the problem it faces in generating AI Overviews is that for some very specific queries there is an absence of high-quality information on the Web.
Microsoft, for its part, told Wired that Copilot answers questions by distilling information from multiple web sources into a single response and “the user can further explore and research as they would with traditional search.”
Based on this week’s news, perhaps AI-powered transcriptions and search results should come with a label that says “caveat emptor.”
IN OTHER NEWS THIS WEEK:
SUSTAINABILITY
Sweden’s Syre and Selenis Forge Strategic Partnership To Establish A U.S.-Based Textile-To-Textile Recycling Plant
Syre, a new venture to scale textile-to-textile recycled polyester, is partnering with Selenis, a global supplier of high-quality specialty polyester solutions, to establish a textile-to textile recycling plant in Cedar Creek, North Carolina, USA, to be operational in mid-2025. The two companies said their partnership will combine new technologies in depolymerizing and polymerizing for textile-to-textile recycling all in one place, allowing for a cost efficient, industrial scale operation. Syre, which is backed by H&M – one of the world’s largest and most recognizable fast fashion brands – and Vargas, a Swedish impact company builder behind H2 Green Steel and Northvolt, launched in March this year with a mission to establish multiple textile-to-textile gigascale plants producing circular polyester across the globe, reducing CO2 emissions by up to 85% compared to the production of oil-based virgin polyester. The new plant, which will deliver volumes up to 10,000 metric tons of circular polyester annually, is scheduled to be operational in mid-2025, with the aim of making its first commercial sales to customers later that year. For more on Syre read The Innovator’s story about its launch.
Data Centers Could Be Source of Heat for European Cities
Data centers, often criticized for their intensive need for energy, could become a critical source of heating for cities if properly located, according to the boss of one of Europe’s leading energy transition companies. Kim Fausing, chief executive of Danfoss, a privately owned Danish company that provides heat pumps and data center cooling systems, told the Financial Times that Frankfurt could have all its heating needs met by excess heat generated by data centres by the end of this decade.
MOBILITY
Toyota, NTT To Invest $3.26 Billion In AI Self-Driving Technology
Toyota Motor and Japan’s Nippon Telegraph and Telephone (NTT) will invest 500 billion yen ($3.26 billion) in research and development to create artificial intelligence software to improve self-driving. The automaker and the Tokyo-headquartered telecommunications major are planning to develop automotive software which will use AI to anticipate accidents and take control of the vehicle, according to newspaper reports.
ARTIFICIAL INTELLIGENCE
Scikit-learn Practitioner Certification Program Launched
As machine learning continues to grow at an unprecedented pace, so does the demand for professionals who can validate their expertise with recognized credentials. To meet this need, France’s Probabl is launching the first official global scikit-learn practitioner certification program, developed in partnership with members of the scikit-learn core team, industry leaders, and expert trainers. The certification program is designed around three levels—Associate, Professional, and Expert—each tailored to different stages of a data science career. Scikit-learn positions itself as an open-source alternative to proprietary AI platforms. With over 80 million downloads each month, scikit’s role as middleware makes it fundamental to nearly a million projects and more than 15,000 structured packages. Probabl said Inria, the French national research institute for digital science and technology, which has supported the creation of scikit-learn since its launch, will keep certifications connected to the latest research advances in the field while Artefact, an AI and data consultancy with expertise in industrial applications of machine learning and machine learning’s impact on business, will help ensure certifications are relevant to industry practices and needs.
To access more of The Innovator’s News In Context stories click here.