Embedding Models

Name Parameters (M) Max input tokens Dimensions Strengths
intfloat/e5-small-v2 33.4 512 384 Good quality, low latency
mixedbread-ai/mxbai-embed-large-v1 335 512 1024 High quality, higher latency
Alibaba-NLP/gte-base-en-v1.5 137 8192 768 Supports up to 8,000 input tokens
Alibaba-NLP/gte-large-en-v1.5 434 8192 1024 Highest quality, 8,000 input tokens

Instruct Models

Name Parameters (B) Active Parameters (B) Context size Strengths
meta-llama/Llama-3.2-1B-Instruct 1 1 128 Lowest latency
meta-llama/Llama-3.2-3B-Instruct 3 3 128 Low latency
meta-llama/Meta-Llama-3.1-405B-Instruct 405 405 128k Highest quality
meta-llama/Meta-Llama-3.1-70B-Instruct 70 70 128k High quality
meta-llama/Meta-Llama-3.1-8B-Instruct 8 8 128k Low latency
microsoft/Phi-3-mini-128k-instruct 3.8 3.8 128k Low latency
mistralai/Mixtral-8x7B-Instruct-v0.1 56 12.9 32k MOE high quality
mistralai/Mistral-7B-Instruct-v0.2 7 7 32k Low latency

Summarization Models

Name Parameters (B) Context size Strengths
google/pegasus-xsum 568 512 8k