Generative AI has added a whole new collection of terms to the technology landscape, and like with every new and evolving technology there is a fair bit of confusion with what these terms mean, so here goes my ever evolving list of all the terms that would help you better understand what these really mean.
-
Ada - Refers to adaptive models like AdaM that can optimize themselves during training.
-
Attention Mechanism - A component in neural networks, especially Transformers, that allows the model to focus on specific parts of the input data. For example, when translating a sentence from English to French, attention helps the model concentrate on relevant English words while generating each French word.
-
Autoencoder - A type of neural network used for unsupervised learning. It encodes input data into a compressed representation and then decodes it to recreate the input.
-
Backpropagation - An optimization algorithm used for minimizing the error in neural networks by adjusting the weights.
-
Beam Search - A search algorithm used in sequence prediction tasks. It keeps track of a fixed number of the best partial solutions (sequences) to improve the quality of generated sequences.
-
Bias (in AI) - When an AI model has pre-existing inclinations due to its training data. It can result in unfair or incorrect predictions.
-
CLIP - Contrastive Language-Image Pre-training - an image + text model used to steer image generation.
-
Denoising - A process where the model is trained to reconstruct its input data from a corrupted version of it. This helps the model learn to focus on essential features and ignore noise.
-
Diffusion model - Generative models that convert noise to images via iterative refinement.
-
Embedding - A vector representation of words or items that encodes semantic meaning. Used to input words to language models.
-
Epoch - One full cycle of passing the entire dataset through a neural network during training.
-
Few-shot learning - Using a small labeled dataset to adapt a model to a new task or dataset.
-
Fine-tuning - The process of taking a pre-trained model and training it further on a specific dataset to adapt it to a particular task.
-
Generative Adversarial Network (GAN) - A type of AI model that consists of two networks – a generator and a discriminator. The generator tries to produce fake data, while the discriminator attempts to differentiate between real and fake data. Over time, the generator improves its ability to produce convincing fakes.
-
Generative AI - A subset of AI techniques that are used to create content, such as images, text, or music. They learn from existing data to generate new, previously unseen samples.
-
Gradient Descent - An optimization algorithm that adjusts the parameters of a model iteratively to minimize the loss function.
-
Hallucination - In the context of AI language models like GPT, hallucination refers to the model generating information that isn’t accurate or isn’t based on its training data. It “imagines” details that aren’t factual.
-
Latent Space - In the context of generative models, it’s the abstract space in which representations of data live. Generative models often navigate and sample this space to produce new content.
-
Loss Function - A mathematical function that quantifies how well the AI model’s predictions match the actual data. Training aims to minimize this value.
-
Neural Network - Computational systems inspired by the structure of biological neural networks. They consist of layers of interconnected nodes (neurons) and are used for various machine learning tasks.
-
Overfitting - When an AI model learns the training data too well, including its noise and outliers, making it perform poorly on new, unseen data.
-
Perplexity - A measurement of how well a language model predicts a sample. Lower perplexity indicates better generation.
-
Prompt engineering - Designing the text prompts fed to language models to produce better results.
-
Prompt Templates - Structured prompts or questions given to a model to guide its responses. For example, instead of asking “tell me about X,” a prompt template might be “Provide a brief summary of X highlighting its main features.”
-
RAG (Retrieval-Augmented Generation) - An approach combining retrieval (searching through a database of information) and generation (producing new content). For instance, when asked a question, RAG might search for relevant passages and then use those passages to generate a coherent answer.
-
Regularization - Techniques used in training to prevent overfitting, like adding a penalty to the loss function.
-
Small Language Model - A Small Language Model is a machine learning model that is trained on a limited amount of text data to generate natural language. Small language models have a more constrained knowledge capacity compared to large models, but can still produce surprisingly coherent text. The key advantages of small language models are that they require less compute to train and run, making them more accessible and easier to deploy in applications.
-
Softmax - A function that turns scores into probabilities used for next-token prediction in language models.
-
Temperature - A parameter that can be adjusted when sampling from the model’s output distribution. A higher temperature makes the output more random, while a lower temperature makes it more deterministic.
-
Token - An individual semantic unit in text, like a word, subword, or punctuation. The inputs and outputs of language models.
-
Tokenization - The process of converting input data (like text) into tokens, which are smaller chunks, such as words or subwords. For instance, the sentence “ChatGPT is great!” might be tokenized into [“ChatGPT”, “is”, “great”, ”!”].
-
Top-k Sampling - A decoding strategy where the model selects the next word/token from the top k most likely candidates instead of considering the entire vocabulary.
-
Top-p Sampling (Nucleus Sampling) - Another decoding strategy where the model chooses the next word/token from a narrowed vocabulary that sums up to a cumulative probability p, ensuring more randomness than Top-k sampling.
-
Transfer Learning - A machine learning method where a pre-trained model is fine-tuned for a slightly different task. This often reduces the amount of required data and training time.
-
Transformer Architecture - A neural network architecture that uses self-attention mechanisms to weigh input data differently and is particularly successful in natural language processing tasks. Models like GPT (Generative Pre-trained Transformer) use this architecture.
-
Transformer - A type of neural network architecture based on attention mechanisms, commonly used in large language models like GPT-3.
-
Variational Autoencoder (VAE) - A type of autoencoder that adds probabilistic constraints to the encoding process, making the model generate new, similar data.
-
Vector Database - A vector database is a database system optimized for storing and querying vector representations of objects, like numeric embeddings. It provides efficient similarity searches across high-dimensional vector data.
-
Zero-shot, One-shot, Few-shot Learning - Approaches where models are trained or perform tasks with little to no examples. In a “zero-shot” scenario, the model hasn’t seen any example of the task. In “one-shot”, it has seen just one example, and in “few-shot”, a limited number of examples.