AI Model Serving

Text Generation Inference

Hugging Face's production-ready toolkit for deploying LLMs. Optimized for high throughput with tensor parallelism, quantization, and Flash Attention.

For Developers Docker Hugging Face+1

LocalAI

AI Model Serving

Free, open source alternative to OpenAI API. Run LLMs, generate images, audio, and more locally or on-prem with no GPU required.

Docker Self-Hosted Open Source+1

llama.cpp

AI Frameworks & Libraries AI Model Serving

LLM inference in pure C/C++. Run LLaMA and other models on consumer hardware with CPU and GPU support. The engine behind many local AI apps.

For Developers CLI Open Source+1

LM Studio

AI DevTools AI Model Serving

Desktop app to discover, download, and run local LLMs. User-friendly GUI for running open-source models on your computer.

Free Tier No Signup Required Local-First+1

vLLM

AI Model Serving

High-throughput and memory-efficient inference and serving engine for LLMs. Supports PagedAttention for fast model serving.

For Developers Self-Hosted Python+1

Ollama

AI Frameworks & Libraries AI Model Serving

Run large language models locally. Get up and running with LLaMA, Mistral, Gemma, and other open models with a single command.

For Developers CLI Open Source+1

Category

Explore by categories

All

AI Assistant

AI Audio

AI Image

AI Search

AI SEO

AI Transcription

AI Writing

Content Optimization

Question Research

SEO Suite

AI Image Generation

AI Audio & Voice

AI Code Generation

AI Chatbots & Assistants

AI Automation

AI Frameworks & Libraries

AI DevTools

AI Data & Analytics

AI Research

AI Design

AI Productivity

AI Video

AI Workflows

AI Design Tools

AI Video Editor

Text Generation Inference

LocalAI

llama.cpp

LM Studio

vLLM

Ollama