Language models are AI computational models that can generate natural human language. That’s no easy feat.
These models are trained as probabilistic machine learning models – predicting a probability distribution of words suitable for generation in a phrase sequence, attempting to emulate human intelligence. The focus of language models in the scientific domain has been twofold:
In terms of in exhibiting human intelligence, today’s bleeding edge AI models in Natural Language Processing (NLP) have not quite passed the Turing Test. (A machine passes the Turing Test if it is impossible to discern whether the communication is originating from a human source or a computer.)
What is particularly interesting is that we are getting pretty close to this marker: certainly with the hyped Large Language Models (LLMs) and the promising, though less hyped SLMs. (SLM can stand for both Small Language Model or Short Language Model.)
If you’ve followed the hype, then you’re likely familiar with LLMs such as ChatGPT. These generative AIs are hugely interesting across academic, industrial and consumer segments. That’s primarily due to their ability to perform relatively complex interactions in the form of speech communication.
Currently, LLM tools are being used as an intelligent machine interface to knowledge available on the internet. LLMs distill relevant information on the Internet, which has been used to train it, and provide concise and consumable knowledge to the user. This is an alternative to searching a query on the Internet, reading through thousands of Web pages and coming up with a concise and conclusive answer.
Indeed, ChatGPT is the first consumer-facing use case of LLMs, which previously were limited to OpenAI’s GPT and Google’s BERT technology.
Recent iterations, including but not limited to ChatGPT, have been trained and engineered on programming scripts. Developers use ChatGPT to write complete program functions – assuming they can specify the requirements and limitations via the text user prompt adequately.
(Concerned about security in your LLMs? Learn how to defend against the OWASP Top 10 for LLMs.)
The three main types of NLP models include symbolic NLP, statistical NLP and neural NLP.
So how do Large Language Models work? Let’s review the key steps in generating natural language using LLMs.
The idea is to develop a mathematical model with parameters that can represent true predictions with the highest probability.
In the context of a language model, these predictions are the distribution of natural language data. The goal is to use the learned probability distribution of natural language for generating a sequence of phrases that are most likely to occur based on the available contextual knowledge, which includes user prompt queries.
To learn the complex relationships between words and sequential phrases, modern language models such as ChatGPT and BERT rely on the so-called Transformers based deep learning architectures. The general idea of Transformers is to convert text into numerical representations weighed in terms of importance when making sequence predictions.
Language models are heavily fine-tuned and engineered on specific task domains. Another important use case of engineering language models is to eliminate bias against unwanted language outcomes such as hate speech and discrimination.
The process involves adjusting model parameters by:
Training the model on domain-specific knowledge.
Initializing model parameters based on pretrained data.
Monitoring model performance.
Further turning model hyperparameters.
Both SLM and LLM follow similar concepts of probabilistic machine learning for their architectural design, training, data generation and model evaluation.
Now, let’s discuss what differentiates SLM and LLM technologies.
Perhaps the most visible difference between the SLM and LLM is the model size.
LLMs such as ChatGPT (GPT-4) purportedly contain 1.76 Trillion parameters.
Open source SLM such as Mistral 7B can contain 7 billion model parameters.
The difference comes down to the training process in the model architecture. ChatGPT uses a self-attention mechanism in an encoder-decoder model scheme, whereas Mistral 7B uses sliding window attention that allows for efficient training in a decoder-only model.
SLMs are trained on data from specific domains. They may lack holistic contextual information from all multiple knowledge domains but are likely to excel in their chosen domain.
The goal of an LLM, on the other hand, is to emulate human intelligence on a wider level. It is trained on larger data sources and expected to perform well on all domains relatively well as compared to a domain specific SLM.
That means LLMs are also more versatile and can be adapted, improved and engineered for better downstream tasks such as programming.
Training an LLM is a resource intensive process and requires GPU compute resources in the cloud at scale. Training ChatGPT from scratch requires several thousand GPUs for training, whereas the Mistral 7B SLM can be run on your local machines with a decent GPU – training a 7B parameter model still requires several computing hours across multiple GPUs.
LLMs tend to be biased. That’s because they are not adequately fine-tuned and because they train on raw data that’s openly accessible and published on the Internet. Because of the source of that training data, it is likely that the training data may…
Underrepresents or misrepresents certain groups or ideas
Be labeled erroneously.
Further complexity emerges elsewhere: language itself introduces its own bias, depending on a variety of factors such as dialect, geographic location, and grammar rules. Another common issue is that the model architecture itself can inadvertently enforce a bias, which may go unnoticed.
Since the SLM trains on relatively smaller domain-specific data sets, the risk of bias is naturally lower when compared to LLMs.
The smaller model size of the SLM means that users can run the model on their local machines and still generate data within acceptable time.
An LLM requires multiple parallel processing units to generate data. Depending on the number of concurrent users accessing an LLM, the model inference tends to slow down.
The answer to this question entirely depends on the use case of your language models and the resources available to you. In business context, it is likely that an LLM may be better suited as a chat agent for your call centers and customer support teams.
In most function-specific use cases, an SLM is likely to excel.
Consider the use cases in medical, legal and financial domains. Each application here requires highly specialized and proprietary knowledge. Training an SLM in-house with this knowledge and fine-tuned for internal use can serve as an intelligent agent for domain-specific use cases in highly regulated and specialized industries.
See an error or have a suggestion? Please let us know by emailing ssg-blogs@splunk.com.
This posting does not necessarily represent Splunk's position, strategies or opinion.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.