Hugging Face's production-ready toolkit for deploying LLMs. Optimized for high throughput with tensor parallelism, quantization, and Flash Attention.
Free, open source alternative to OpenAI API. Run LLMs, generate images, audio, and more locally or on-prem with no GPU required.
LLM inference in pure C/C++. Run LLaMA and other models on consumer hardware with CPU and GPU support. The engine behind many local AI apps.
Desktop app to discover, download, and run local LLMs. User-friendly GUI for running open-source models on your computer.
High-throughput and memory-efficient inference and serving engine for LLMs. Supports PagedAttention for fast model serving.
Run large language models locally. Get up and running with LLaMA, Mistral, Gemma, and other open models with a single command.