Large Language Model on Jinying Tech Blog

Large Language Model on Jinying Tech Blog https://chejinying.com/tech/tags/large-language-model/ Recent content in Large Language Model on Jinying Tech Blog Hugo en-us Thu, 22 Jan 2026 11:00:00 +0800 LLM Abbreviations Glossary https://chejinying.com/tech/posts/llm/abbreviations/ Thu, 22 Jan 2026 11:00:00 +0800 https://chejinying.com/tech/posts/llm/abbreviations/ <p>A quick reference for common abbreviations in the LLM (Large Language Model) domain.</p> <h1 id="training--techniques">Training & Techniques</h1> <table> <thead> <tr> <th>Abbreviation</th> <th>Full Name</th> <th>What It Is</th> </tr> </thead> <tbody> <tr> <td><strong>SFT</strong></td> <td>Supervised Fine-Tuning</td> <td>Training on curated question-answer pairs</td> </tr> <tr> <td><strong>RL</strong></td> <td>Reinforcement Learning</td> <td>Learning by trial and reward signals</td> </tr> <tr> <td><strong>RLHF</strong></td> <td>Reinforcement Learning from Human Feedback</td> <td>RL where humans rank outputs to guide training</td> </tr> <tr> <td><strong>DPO</strong></td> <td>Direct Preference Optimization</td> <td>Simpler alternative to RLHF, no reward model needed</td> </tr> <tr> <td><strong>GRPO</strong></td> <td>Group Relative Policy Optimization</td> <td>RL technique used in reasoning models (DeepSeek)</td> </tr> <tr> <td><strong>PPO</strong></td> <td>Proximal Policy Optimization</td> <td>Popular RL algorithm for training LLMs</td> </tr> <tr> <td><strong>LoRA</strong></td> <td>Low-Rank Adaptation</td> <td>Memory-efficient fine-tuning technique</td> </tr> <tr> <td><strong>QLoRA</strong></td> <td>Quantized LoRA</td> <td>LoRA + 4-bit quantization for even less memory</td> </tr> </tbody> </table> <h1 id="architecture--models">Architecture & Models</h1> <table> <thead> <tr> <th>Abbreviation</th> <th>Full Name</th> <th>What It Is</th> </tr> </thead> <tbody> <tr> <td><strong>LLM</strong></td> <td>Large Language Model</td> <td>The models like GPT, Claude, Llama</td> </tr> <tr> <td><strong>NLP</strong></td> <td>Natural Language Processing</td> <td>Field of AI dealing with human language</td> </tr> <tr> <td><strong>RNN</strong></td> <td>Recurrent Neural Network</td> <td>Older architecture before Transformers</td> </tr> <tr> <td><strong>LSTM</strong></td> <td>Long Short-Term Memory</td> <td>Improved RNN that handles longer sequences</td> </tr> <tr> <td><strong>GRU</strong></td> <td>Gated Recurrent Unit</td> <td>Simplified version of LSTM</td> </tr> <tr> <td><strong>MLP</strong></td> <td>Multilayer Perceptron</td> <td>Basic fully-connected neural network</td> </tr> <tr> <td><strong>GPT</strong></td> <td>Generative Pre-trained Transformer</td> <td>OpenAI’s model architecture</td> </tr> <tr> <td><strong>MoE</strong></td> <td>Mixture of Experts</td> <td>Architecture where only some “experts” activate per token</td> </tr> <tr> <td><strong>MQA</strong></td> <td>Multi-Query Attention</td> <td>Attention optimization sharing key-value heads</td> </tr> <tr> <td><strong>GQA</strong></td> <td>Grouped-Query Attention</td> <td>Middle ground between MHA and MQA</td> </tr> <tr> <td><strong>MHA</strong></td> <td>Multi-Head Attention</td> <td>Standard attention with multiple heads</td> </tr> </tbody> </table> <h1 id="applications--deployment">Applications & Deployment</h1> <table> <thead> <tr> <th>Abbreviation</th> <th>Full Name</th> <th>What It Is</th> </tr> </thead> <tbody> <tr> <td><strong>RAG</strong></td> <td>Retrieval-Augmented Generation</td> <td>Combining LLMs with external knowledge retrieval</td> </tr> <tr> <td><strong>API</strong></td> <td>Application Programming Interface</td> <td>Way to access LLMs over the internet</td> </tr> <tr> <td><strong>VRAM</strong></td> <td>Video Random Access Memory</td> <td>GPU memory needed to run models</td> </tr> <tr> <td><strong>MCP</strong></td> <td>Model Context Protocol</td> <td>Standard for connecting LLMs to external tools</td> </tr> <tr> <td><strong>A2A</strong></td> <td>Agent-to-Agent Protocol</td> <td>Standard for agent interoperability</td> </tr> </tbody> </table> <h1 id="evaluation--benchmarks">Evaluation & Benchmarks</h1> <table> <thead> <tr> <th>Abbreviation</th> <th>Full Name</th> <th>What It Is</th> </tr> </thead> <tbody> <tr> <td><strong>MMLU</strong></td> <td>Massive Multitask Language Understanding</td> <td>Popular benchmark for testing LLMs</td> </tr> <tr> <td><strong>CoT</strong></td> <td>Chain-of-Thought</td> <td>Prompting technique for step-by-step reasoning</td> </tr> <tr> <td><strong>PRM</strong></td> <td>Process Reward Model</td> <td>Model that scores intermediate reasoning steps</td> </tr> </tbody> </table> <h1 id="quantization--optimization">Quantization & Optimization</h1> <table> <thead> <tr> <th>Abbreviation</th> <th>Full Name</th> <th>What It Is</th> </tr> </thead> <tbody> <tr> <td><strong>GGUF</strong></td> <td>GPT-Generated Unified Format</td> <td>File format for quantized models (llama.cpp)</td> </tr> <tr> <td><strong>GPTQ</strong></td> <td>GPT Quantization</td> <td>Post-training quantization method</td> </tr> <tr> <td><strong>AWQ</strong></td> <td>Activation-aware Weight Quantization</td> <td>Quantization preserving important weights</td> </tr> <tr> <td><strong>FP16</strong></td> <td>16-bit Floating Point</td> <td>Half-precision number format</td> </tr> <tr> <td><strong>FP32</strong></td> <td>32-bit Floating Point</td> <td>Full-precision number format</td> </tr> <tr> <td><strong>INT8</strong></td> <td>8-bit Integer</td> <td>Low-precision integer format</td> </tr> </tbody> </table> <h1 id="data--preprocessing">Data & Preprocessing</h1> <table> <thead> <tr> <th>Abbreviation</th> <th>Full Name</th> <th>What It Is</th> </tr> </thead> <tbody> <tr> <td><strong>BoW</strong></td> <td>Bag-of-Words</td> <td>Text representation ignoring word order</td> </tr> <tr> <td><strong>TF-IDF</strong></td> <td>Term Frequency-Inverse Document Frequency</td> <td>Text weighting technique</td> </tr> </tbody> </table>