Your Generative AI CookBook
The Beginners Guide to Understanding Generative AI Lingo
There has been a lot of buzz around AI, specifically generative AI (GenAI), particularly in the last couple of years, with an increasing effect each year. And the thing with buzzes or trends, as you might want to call it, is there are always a bunch of words (buzzwords) that characterize each trend, and with GenAI, it’s not any different. In this article, I have explained common and “most likely to encounter” terminologies. The content of this article is based on my personal study and does not represent the entirety of GenAI terminologies. Understanding these concepts might be the key to feeling more comfortable learning about or listening to conversations about Generative Artificial Intelligence. Let’s get into it.
Artificial Intelligence
My simplest and favorite definition of AI is from a TED talk by Mustafa Suleyman, the CEO of Microsoft AI and the co-founder and former head of applied AI at DeepMind. He describes AI as a clever piece of software that has read most of the text on the open internet and can talk to you about anything you want, almost like a human.
Generative Artificial Intelligence
GenAI is a subset of artificial intelligence that focuses on creating new, original content, like text, images, audio, video, or code, by learning patterns from already existing data. Unlike traditional AI, which typically classifies, predicts, or recommends based on input, generative AI produces new data that mimics the nature of the data it was trained on.
Large Language Models
AI is a broad concept. Large Language Models (LLMs), on the other hand, are a specific type of Artificial Intelligence that specializes in processing, understanding, and generating human-like text. LLMs are trained on massive amounts of text data, and because of this, they can perform various tasks, from predicting what word comes next in a sentence to translating languages, understanding code, summarizing text, and much more.
How exactly do LLMs work?
LLMs are built on a framework called Transformers. You can think of transformers as the foundation upon which LLMs are built. Another way to think about it is transformers are like the engine of a car that provides power to the car, while LLMs are specific car models that that are built using the engine (transformers) but could have unique designs, features, and purposes. Another way to think about transformers is like super-smart brains for computers that understand language. They can read a whole sentence simultaneously and determine how all the words relate and how different parts of a sentence work together to make sense. Transformers works on the principle of self-attention mechanism. What self-attention does for a transformer is that it helps it zero in on the most important part of whatever input it is getting. Self-attention helps to determine relationships between different words and phrases in sentences. For example, in the following sentence: “The tiger jumped out of a tree to get a drink because it was thirsty.” “the tiger” and “it” are the same object, so we expect these two words to be strongly connected. Self-attention helps transformers make this connection just like our brains will.
How do transformers learn from processed text?
This is where training the model comes in. With transformers, mountains of text are used in the training process. The model is trained like a feedback loop, where we feed it batches of text. It tries to predict what comes next. We check its answers, and based on how correctly it answers, the model tweaks its parameters to get better predictions. So, it’s constantly learning and improving.
Input Preparation and Data Representation for LLMs
There are a series of processing and representation steps involved in LLMs:
- Normalization: Where required, the text should be standardized by removing redundant whitespace, texts, etc.
- Tokenization involves breaking down the text data into bite-sized chunks and mapping them to integer token IDs.
- Embeddings convert the chunks into numerical representations that the transformer can understand and work with. There are different types of embeddings, with text embeddings being the most widely used, especially for natural language processing. Other kinds of embeddings are image embeddings, multi-modal embeddings, etc.
- Vector Database is the building that houses the embeddings. It is designed to manage and query embeddings. The vector database is different from traditional databases for structured data in that the latter are not optimized for vector search. Picking the right vector database depends on several factors, including the size of your data, the kind of queries, budget, security needs, etc.
- Vector Search uses the created embeddings to search for words based on the word itself and its meaning. For example, it searches for the word sheep even if a different word, like lamb, is used to depict it because they have the same meaning.
- Positional Encoding adds information about the position of each token in the sequence to help the transformer understand word order.
Fine-Tuning LLMs
Fine-tuning is taking pre-trained general-purpose language models (LLMs) and turning them into specialized tools that excel at specific tasks. Fine-tuning is typically less costly and more efficient compared to pre-training.
Prompt Engineering
The act of using an LLM after fine-tuning is called prompt engineering. It involves crafting effective instructions (prompts) to guide the LLM and get the desired result. With prompting, it’s not just about what we ask or the instructions given to the LLM, but how well we ask/instruct it. There are different techniques of prompting. Some of them are:
- In zero-shot prompting, the LLM is just given a task description and uses its existing response to curate a response.
- In few-shot prompting, the LLM is given a few examples to steer it in the right direction.
- In the chain of thought prompting, we walk the LLM through the problem step by step like you're teaching it to think things through logically. You can think of it as breaking down a complex problem into smaller bits the model can walk through. This prompting approach is super helpful when you don’t have much data.
Retrieval Augmented Generation
One key application of embedding is retrieval. Embeddings are used to supercharge LLMs using retrieval-augmented generation (RAG). RAG combines information retrieval with LLMs' generative capability. It gives LLMs access to data outside of their own training data through a retriever system powered by embeddings and vector search.
AI Agents
An AI agent is an application that autonomously performs tasks. It does this by perceiving its environment, making decisions, and acting to achieve these goals. The key difference between models and agents is that models are limited to the data they are trained on, which means their knowledge is quite static. Agents, on the other hand, are dynamic. They can leverage external information, sources, and services to perform their tasks. Agents are essentially more flexible and powerful.
This article covers one aspect of my personal study of AI. In subsequent articles, I’ll share the outcomes of my study and exploration with AI applications.
Until next time.