In the rapidly evolving landscape of technology, Artificial Intelligence (AI) and Machine Learning (ML) have emerged as pivotal forces driving innovation, efficiency, and competitive advantage across industries. For Chief Technology Officers, IT Directors, Tech Project Managers, and Tech Product Managers, understanding and integrating these technologies into business strategies is no longer optional; it’s imperative.
It’s not a surprise, Large language Models (LLMs) like ChatGPT could do more than chat.
We will explore step by step strategies to prevent data distortion, enhance operational efficiency, and better use your company’s resources.
You already know that Large Language Models (LLMs) such as ChatGPT, Gemini, Mistral, etc… have emerged as powerful tools that can automate tasks and enhance customer service. As a business decision-maker, understanding the capabilities and limitations of LLMs can help you make informed decisions about their implementation.
The first step in leveraging LLMs is to identify a task that can be automated to improve customer service or relieve employees of repetitive tasks. For instance, LLMs can be used to automate information retrieval in documents, write reports, or process customer requests.
Once you have identified a business case, the next step is to manually evaluate this with ChatGPT (or Gemini) to estimate if the current reasoning capabilities of generative AI are sufficient to meet the need.
You can create a list of sample inputs and evaluate the accuracy of the responses generated by ChatGPT.
Let’s say you want to automate the dispatching of emails your company receives.
You have to get some emails and test if an online LLM is able to sort them and prepare an answer.
Here is a sample prompt:
Our company is organized into several departments, each with distinct functions and areas of responsibility. Here's a brief overview:
Sales Department: This team is responsible for generating new business leads, negotiating deals, and maintaining relationships with existing clients.Marketing Department: This department focuses on promoting our products and services, managing our brand reputation, and conducting market research.
Customer Service Department: This team handles inquiries, complaints, and support requests from our customers.
Human Resources Department: This department manages employee relations, recruitment, training, and benefits.
Technical Support Department: This team provides assistance with technical issues related to our products and services.
Finance Department: This department is responsible for managing the company's financial resources, including budgeting, forecasting, and financial reporting.
We want you to categorize the following email. I would like you to analyze the content of the email to determine the most suitable department. Here are some guidelines to help you with your analysis:
Look for keywords and phrases that indicate the email is related to sales, marketing, customer service, HR, technical support, or finance.
Consider the tone and context of the email. For example, an email that expresses frustration with a product may be best suited for the customer service department.
If an email contains multiple topics, categorize it based on the most prominent or urgent issue.
Part 3: Once you have analyzed the email and determined the most suitable department, please append your recommendations. Please ensure that your recommendations are accurate and consistent, as this will help us to improve our response times and provide better service to our customers.
Part 4: Write a model of response for the department
Here is the email:
```
Hi,
I hope this email finds you well. I am writing to request time off from work for the upcoming holiday season. Specifically, I would like to take the following days off:
Monday, December 20th
Tuesday, December 21st
Wednesday, December 22nd
Thursday, December 23rd
Friday, December 24th
I understand that this is a busy time of year for the company, and I apologize for any inconvenience my absence may cause. However, I have accrued enough vacation time to cover these days, and I believe that taking this time off will allow me to come back refreshed and ready to tackle the new year.
Please let me know if there are any issues with my request, or if you need any additional information from me. Thank you for your time and consideration.
Best regards, [Employee Name]
```
The model answer:
Based on the content of the email, I would recommend categorizing it under the
__Human Resources Department.__
The email contains keywords such as "time off," "vacation time," and "absence,"
which are related to employee relations and benefits, which fall under the
responsibilities of the HR department. The tone of the email is polite and
professional, and the context indicates that the employee is requesting
time off for personal reasons.
__Here is a model response for the HR department:__
Dear [Employee Name],
Thank you for your email and for providing ample notice regarding your time off request for the upcoming holiday season. We appreciate your consideration for the busy time of year and your accrued vacation time to cover your absence.
I am pleased to inform you that your request for time off from Monday, December 20th to Friday, December 24th has been approved. Please ensure that you have communicated your absence to your team and made any necessary arrangements to cover your responsibilities during your time off.
If you have any further questions or concerns, please do not hesitate to reach out to the Human Resources Department.
Thank you for your hard work and dedication to the company. We hope that you have a relaxing and enjoyable holiday season.
Best regards,
[Your Name] Human Resources Department.
⚠️ Don’t hesitate to iterate with your prompting until you get the intended form of response you want to reproduce. This could take some time to get used to it. Don’t stop at first failure, nor first success. Don’t hesitate to restart from scratch. You could even challenge the model to write a prompt for you.
⚠️ Don’t forget to test edge cases, be sure to evaluate at least for 80% of your use case.
It’s time to expand your test. Previously you proved that it worked with few examples, now try again with more data to measure the accuracy.
Creating a dataset is essential to evaluate a process. You can use this dataset to test the model, ensuring that it meets your business needs. The dataset should be representative of the real-world scenarios that your business will encounter. Ideally, one should use code to create a reproducible evaluation chain. For example, by calling OpenAI’s API from a list of questions and automatically comparing expected answers.
With a ChatGPT subscription if you look at Explore GPTs you can also try Data Analyst to upload an Excel file and interact with the AI on it.
- Compile an Email Dataset: Start by assembling an Excel file containing 100 sample emails that your company might receive.
- Draft a Detailed Prompt
In this case you can structure your prompt in three segments:
- Part 1: Detail the various departments within your company, outlining their specific functions and areas of responsibility.
- Part 2: Introduce the dataset to the model, instructing it to analyze the content of each email to determine the most suitable department.
- Part 3: Direct the model to append its recommendations in a new column within your Excel file, effectively categorizing each email.
- Execute and Evaluate: Utilize the prompt to task the model with identifying the correct recipient department for each email. Following the model’s processing, review its suggestions to assess accuracy and relevance.
Before considering going further you can manually rate each answer and compute the average to evaluate if the result is good enough for this use case. In our example, remember that the use case is a <human>(email) to <machine> (routing & proposed answer) to <human> (department) workflow, so an error can be tolerated : the human could modify the answer, or a department can reroute an email to another… If it happens on ten emails in a hundred it can be good enough.
You can fastrack a production ready solution by using an API provided by an external provider.
You can use OpenAI API or others for your MVP, but there are several factors that you should consider, including:
- All the Data you provide to an external API or chat is recorded somewhere
- You should anonymize your data even if the service provider claims that it is not using your data…
- Risk of industrial secret leakage: If you are outside of the US, be aware that OpenAI is subject to the Cloud Act.
- Speed limitations: It often takes several seconds to obtain a complete response from OpenAI, which may not be fast enough for certain use cases.
- Call limitations: The number of calls per second are limited, as well as maximum monthly expenses
- Environmental impact: Large generalist models have a significant environmental impact, and this should be taken into account when considering their use.
- Cost variation: ie OpenAI APIs are subject to cost variation, which can impact your budget.
- Difficulty in asserting a competitive advantage: It can be challenging to assert a competitive advantage when using OpenAI APIs, as they are available to all businesses.
- Stability: LLM private models like Gemini, Mistral, Claude2, GPT4 … are not always stable and you should consider monitoring the quality and stability of the answers provided. You also have to add rail guards to protect your service quality and you & your customers from hazardous behaviors coming from in and out. Problems can occur from the input or the output.
To avoid some of these pitfalls, you can turn to open-source models such as LLAMA or Mistral. These open-source alternatives offer several advantages:
- Privacy and Security: Self hosted models, reduce the risk of industrial secret leakage.
- Customization: You can fine-tune open-source models to better suit your specific business needs.
- Lower Costs: Open-source models are often less expensive than proprietary solutions, especially when considering the limitations on the number of calls and monthly expenses.
- Environmental Impact: Open-source models are smaller and can be optimized for specific use cases, potentially reducing their environmental footprint. You could measure it with CodeCarbon.
- Competitive Advantage: By customizing an open-source model, you can create a unique solution that sets your business apart from competitors.
Now you have automated the routing of the email, let’s improve the quality of the answer. A way to do it is to add company documents to the capability of the model. This will allow the model to find answers in your document instead of his “memory”.
Customizing responses from a LLM with company data will create a more accurate and tailored experience for users.
You can’t send all company data within the prompt. That’s why Retrieval Augmented Generation (RAG) is useful, it’s a technique that combines information retrieval from a database and generation capabilities of a LLM. By using RAG, you can improve the accuracy of responses. And you could tell to the user which documents have been used for the answer.
RAG technique can be simply presented by this formula:
<LLM trained with billion of data> + <Your prompt> + <Your company dataset> = Responses aligned with your context
RAG is often done with a vector database as it works in most cases, here is how to create the database:
- Split your documents by shorts chapters
- Convert chapters to vectors using an embedding model. The vector on the same subjects will be near in the n-dimensional spaces. Typical vector is an array of 1,024 floats values. Think of it like if each value represents a characteristic, like color, size, gender… It’s not hard coded, the model finds the value by himself in training.
- Store them in a vector database
When you receive an email, you will use RAG like this:
- Convert the email of your customer to a vector
- Query the database with this vector to retrieve the 10 nearest vectors of paragraphs
- Take the text of these paragraphs and add them to the prompt
- Ask the LLM for an answer
- The answer will be based on the data provided in the prompt
If you want to learn more, read Retrieval Augmented Generation (RAG)
Now your answer will be using your data, so it helps prevent what is called hallucination.
ℹ️ Model Hallucination is not an easy problem to manage. Because the “memory” of a LLM is more like a human memory (compressed representation of the world) than a computer’s exact memory. And models are trained to help you so they will try to, even when they don’t know the answer, misleading information will be presented as fact. RAG helps cope with this problem by providing relevant data to the model.
RAG is really good for unstructured data, but sometimes you have a better way to answer the question like tabular data with pricing for each product, or you may even want to compute taxes, or looking for a slot in an agenda to arrange a meeting. Let’s see how to do that with function calling.
Function calling is a way to allow interaction between a LLM and your enterprise API, like:
- Salesforce, SAP for your ERP
- Service Now or other ticketing services
- Agendas
- Invoice, pricing
- Custom API to do anything in your company
- Third party API
Function calling is an essential feature that allows you to use APIs without exposing them to the outside world. This feature opens up many possibilities beyond simple chat applications. For instance, you can integrate specialized internal services or tools into the LLM, making it more versatile and valuable for your business. You can take a mail from a customer requesting a price, send it to the LLM to turn it into a parameter to call your pricing API, then use the API answer to ask the LLM back to write the answer to the customer.
Given the request:
“
Hello,
I really like your company. I would like to order you a solar panel mounting rail, what would be the price ?
Best regards
“
You send the request to the LLM, with the definitions of the API that exist in your company:
{
"type": "function",
"function": {
"name": "multiply",
"description": "Get product price.",
"parameters": {
"type": "object",
"properties": {
"product_name": {
"description": "Name of the product",
"type": "string"
},
"required": [ "product_name" ]
}
}
}
So the LLM extract the product name from the mail and give you the JSON to make the API call:
{
"product_name": "solar panel mounting rail"
}
It’s up to you to call the API, so it is totally secured : the LLM never knows where your API is, just what it can do.
The answer of the API could be sent back to the LLM to build a natural language answer.
Can you answer this email given that the price for a “solar panel mounting rail” is $10 without a VAT of 5% ? “Hello, I really like your company. I would like to order you a solar panel mounting rail, what would be the price ? Best regards Your customer “
The answer will be:
Hello,
Thank you for your interest in our company and for considering our solar panel mounting rail. The price for the mounting rail is 10 before taxes, with a VAT of 5%, so $10.50 taxes included.
Please let me know if you have any other questions or if you would like to proceed with the order.
Best regards,
So you now have a system that can use your internal services to better prepare answers for your customers. That’s a game changer if you have already invested in APIs.
We just saw that we may call a LLM more than once for a single task, let see that in more detail.
It’s important to note that a single prompt is often not enough for complex tasks. Your project will likely require breaking down the task into multiple prompts that will chain together and combine several techniques.
For exemple https://360learning.com/ build a platform to help building online courses with AI from a single text document as input. Their pipelines make use of 9 prompts, used for 30 OpenAI calls, and RAG to achieve their goal. A first prompt asks for a resume of the document, a second asks for a plan for an online course from the resume, then RAG is used to retrieve each part of the document from the title, and so on.
Here is some slides of their presentation:
Video source : https://www.youtube.com/watch?v=1Eyc2GypnF4 (in French)
They are using LangChain, a framework that helps to create these types of LLM pipelines.
ℹ️ You probably heard of “AI Agents”: they are just a way to combine prompts, but without writing them in advance. An agent is a call to a LLM to get a list of tasks. Then, make a call to LLM for each task, and so on. It works best with giving the ability to the LLM to call external tools like browsing the web using functions like we saw before.
Now you have a powerful pipeline, but how to improve the model itself to have faster and better answers ? You can fine tune a model.
Fine-tuning can often improve the model’s performance and reduce its size while maintaining equal performance, because you could use smaller models, like Mistral-7B, or even Phi-2.
Very few companies could afford to train a LLM from scratch because it requires a huge dataset and hundreds of GPUs, almost 2 millions GPU hours for Llama2–70B for example. But you can take an already pre-trained model and fine-tune it, only an afternoon of fine-tuning is needed in most cases.
The drawback is that you have to build a training dataset with hundreds of questions and answers.
It’s a new technique to combine multiple models in one. The result is a big model, called Mixture of Experts (MoE), with better capabilities than a single of the same size. The easiest way to do that is with MergeKit.
This could help you if it’s difficult to decide which model to use : with MoE, it’s the model who decides which one to use.
Customizing responses from LLMs with company data and API create a more accurate and tailored experience for users. Fine-tuning can improve the performance, and breaking down tasks into multiple prompts can help tackle complex tasks.
While all of this may seem complex and reserved for specialists, abundant documentation and numerous libraries are available to facilitate implementation. Popular libraries include HuggingFace, Langchain, HayStack, Axolotl and so on…
However, don’t forget the cost of integration. As with any project, there is a significant cost associated with moving from a functional prototype to a fully industrialized solution within an existing IT system. You will often discover that the process of your company is more complex than expected. Or that the data needs a bit of cleaning to be processed.
While large language models offer many advantages, don’t neglect the benefits of “older” machine learning techniques like random forest or DistiliBert. These techniques can still provide values, including faster processing, easier integration into existing tools, no need for GPUs, better explainability, and lower costs.
We hope that this article provides a view on how to include LLM in your software architecture.
Article written in february 2024 by Benoît Courty, data scientist, with the help of Stéphane Van-Bosterhaudt, CEO of UpScale.
More readings on the subject: