llama.cpp

llama.cpp

Verified

The C/C++ engine powering local AI — lightning-fast inference that Ollama and LM Studio build on.

About llama.cpp

llama.cpp is the foundational C/C++ library for running quantized LLMs on consumer hardware. Created by Georgi Gerganov, it powers tools like Ollama and LM Studio behind the scenes. It supports GGUF model format, GPU offloading, and runs on virtually any platform.

Key Features

  • C/C++ for maximum performance
  • GGUF quantization format
  • GPU offloading (CUDA, Metal, Vulkan)
  • Server mode with OpenAI-compatible API
  • Runs on everything from Raspberry Pi to servers

Pros & Cons

Pros

+ Fastest local inference engine

+ Runs on virtually any hardware

+ Foundation of the local AI ecosystem

Cons

- Command-line interface only

- Requires compilation for best performance

- Steep learning curve for beginners

Use Cases

Building local AI applicationsMaximum performance local inferenceEmbedded AI in appsResearch and benchmarking
Pricing
Open Source

Free and open-source. MIT license.

Who It's For
C/C++ developersML engineersEmbedded systems developersPerformance enthusiasts
Details
Companyggml.org
Founded2023
WebsiteVisit