LLMs today suffer from inaccuracies at scale, but that doesn’t mean you should cede competitive ground by waiting to adopt generative AI.
Every enterprise technology has a purpose or it wouldn’t exist. Generative AI’s enterprise purpose is to produce human-usable output from technical, business, and language data rapidly and at scale to drive productivity, efficiency, and business gains. But this primary function of generative AI — to provide a witty answer — is also the source of large language models’ (LLMs) biggest barrier to enterprise adoption: so-called “hallucinations”.
Why do hallucinations happen at all? Because, at their core, LLMs are complex statistical matching systems. They analyze billions of data points in an effort to determine patterns and predict the most likely response to any given prompt. But while these models may impress us with the usefulness, depth, and creativity of their answers, seducing us to trust them each time, they are far from reliable. New research from Vectara found that chatbots can “invent” new information up to 27% of the time. In an enterprise setting where question complexity can vary greatly, that number climbs even higher. A recent benchmark from data.world’s AI Lab using real business data found that when deployed as a standalone solution, LLMs return accurate responses to most basic business queries only 25.5% of the time. When it comes to intermediate or expert level queries, which are still well within the bounds of typical, data-driven enterprise queries, accuracy dropped to ZERO percent!
The tendency to hallucinate may be inconsequential for individuals playing around with ChatGPT for small or novelty use cases. But when it comes to enterprise deployment, hallucinations present a systemic risk. The consequences range from inconvenient (a service chatbot sharing irrelevant information in a customer interaction) to catastrophic, such as inputting the wrong numeral on an SEC filing.
As it stands, generative AI is still a gamble for the enterprise. However, it’s also a necessary one. As we learned at OpenAI’s first developer conference, 92% of Fortune 500 companies are using OpenAI APIs. The potential of this technology in the enterprise is so transformative that the path forward is resoundingly clear: start adopting generative AI — knowing that the rewards come with serious risks. The alternative is to insulate yourself from the risks, and swiftly fall behind the competition. The inevitable productivity lift is so obvious now that to not take advantage of it could be existential to an enterprise’s survival. So, faced with this illusion of choice, how can organizations go about integrating generative AI into their workflows, while simultaneously mitigating risk?
First, you need to prioritize your data foundation. Like any modern enterprise technology, generative AI solutions are only as good as the data they’re built on top of — and according to Cisco’s recent AI Readiness Index, intention is outpacing ability, particularly on the data front. Cisco found that while 84% of companies worldwide believe AI will have a significant impact on their business, 81% lack the data centralization needed to leverage AI tools to their full potential, and only 21% say their network has ‘optimal’ latency to support demanding AI workloads. It’s a similar story when it comes to data governance as well; just three out of ten respondents currently have comprehensive AI policies and protocols, while only four out of ten have systematic processes for AI bias and fairness corrections.
As benchmarking demonstrates, LLMs have a hard enough time already retrieving factual answers reliably. Combine that with poor data quality, a lack of data centralization / management capabilities, and limited governance policies, and the risk of hallucinations — and accompanying consequences — skyrockets. Put simply, companies with a strong data architecture have better and more accurate information available to them and, by extension, their AI solutions are equipped to make better decisions. Working with a data catalog or evaluating internal governance and data entry processes may not feel like the most exciting part of adopting generative AI. But it’s those considerations — data governance, lineage, and quality — that could make or break the success of a generative AI Initiative. It not only enables organizations to deploy enterprise AI solutions faster and more responsibly, but also allows them to keep pace with the market as the technology evolves.
Second, you need to build an AI-educated workforce. Research points to the fact that techniques like advanced prompt engineering can prove useful in identifying and mitigating hallucinations. Other methods, such as fine-tuning, have been shown to dramatically improve LLM accuracy, even to the point of outperforming larger, more advanced general purpose models. However, employees can only deploy these tactics if they’re empowered with the latest training and education to do so. And let’s be honest: most employees aren’t. We are just over the one-year mark since the launch of ChatGPT on November 30, 2022!
When a major vendor such as Databricks or Snowflake releases new capabilities, organizations flock to webinars, conferences, and workshops to ensure they can take advantage of the latest features. Generative AI should be no different. Create a culture in 2024 where educating your team on AI best practices is your default; for example, by providing stipends for AI-specific L&D programs or bringing in an outside training consultant, such as the work we’ve done at data.world with Rachel Woods, who serves on our Advisory Board and founded and leads The AI Exchange. We also promoted Brandon Gadoci, our first data.world employee outside of me and my co-founders, to be our VP of AI Operations. The staggering lift we’ve already had in our internal productivity is nothing short of inspirational (I wrote about it in this three-part series.) Brandon just reported yesterday that we’ve seen an astounding 25% increase in our team’s productivity through the use of our internal AI tools across all job roles in 2023! Adopting this type of culture will go a long way toward ensuring your organization is equipped to understand, recognize, and mitigate the threat of hallucinations.
Third, you need to stay on top of the burgeoning AI ecosystem. As with any new paradigm-shifting tech, AI is surrounded by a proliferation of emerging practices, software, and processes to minimize risk and maximize value. As transformative as LLMs may become, the wonderful truth is that we’re just at the start of the long arc of AI’s evolution.
Technologies once foreign to your organization may become critical. The aforementioned benchmark we released saw LLMs backed by a knowledge graph — a decades-old architecture for contextualizing data in three dimensions (mapping and relating data much like a human brain works) — can improve accuracy by 300%! Likewise, technologies like vector databases and retrieval augmented generation (RAG) have also risen to prominence given their ability to help address the hallucination problem with LLMs. Long-term, the ambitions of AI extend far beyond the APIs of the major LLM providers available today, so remain curious and nimble in your enterprise AI investments.
Like any new technology, generative AI solutions are not perfect, and their tendency to hallucinate poses a very real threat to their current viability for widespread enterprise deployment. However, these hallucinations shouldn’t stop organizations from experimenting and integrating these models into their workflows. Quite the opposite, in fact, as so eloquently stated by AI pioneer and Wharton entrepreneurship professor Ethan Mollick: “…understanding comes from experimentation.” Rather, the risk hallucinations impose should act as a forcing function for enterprise decision-makers to recognize what’s at stake, take steps to mitigate that risk accordingly, and reap the early benefits of LLMs in the process. 2024 is the year that your enterprise should take the leap.