Models

We offer two generative model tiers and two embedding model tiers, each optimized for different use cases. Generative models support Student and Teacher roles for automatic optimization. Learn more about model roles.

Generative Models

📘
Note on initial costs
New deployments use Teacher pricing during an initial grace period while the platform learns your usage patterns. Once optimization completes, requests automatically transition to the lower-cost Student model. Learn more about the grace period.

Jackal

Many developers choose Parlant as a friendly, full-featured Conversational AI framework, even when they're not building compliance-sensitive agents (which is Parlant's most unique strength).

For use cases that aren't mission-critical, accuracy is often a should have but not an absolute must. When there are no critical legal or financial repercussions if the agent occasionally falls short on instruction-following fidelity, a certain degree of accuracy can be traded off for costs. Typical use cases include education, AI copilots, lead generation, and similar applications.

Jackal is ideal for these scenarios. It provides a balance between accuracy and cost. It's more accurate than generic, off-the-shelf models at similar price points, thanks to its Parlant-specific specialization.

If your agent doesn't handle particularly sensitive use cases, Jackal delivers excellent value.

Pricing

Role	Input (per 1M tokens)	Output (per 1M tokens)
Student	$0.30	$2.50
Teacher	$0.60	$3.00

Rate Limits

See Jackal tier rate limits for RPM, TPM, and TPD limits by usage tier.

Bison

Many companies use Parlant for its unique strength in managing compliance-sensitive use cases, such as financial services, healthcare, large-scale proactive customer service, and similar domains.

In these scenarios, accuracy is crucial. Mishaps on the agent's part can lead to financial, legal, or reputational damage.

Bison was created for these use cases. While still providing a better price point than generic off-the-shelf models, it is based on a larger model that handles more nuance and complexity.

This leads to higher precision in tasks such as guideline matching, tool calling, and response generation.

Choose Bison when your application handles sensitive, high-stakes decisions where accuracy is non-negotiable.

Pricing

Role	Input (per 1M tokens)	Output (per 1M tokens)
Student	$0.90	$5.00
Teacher	$1.50	$15.00

Rate Limits

See Bison tier rate limits for RPM, TPM, and TPD limits by usage tier.

Embedding Models

Embedding models generate vector representations for semantic search, similarity matching, and retrieval tasks. Unlike generative models, they do not use the Teacher/Student optimization system.

Generally speaking, it isn't necessary to choose between embedding tiers in Parlant. When using EmcieService in Parlant, the framework automatically selects the appropriate embedding model based on the needs of each task—using high-fidelity embeddings where precision matters and lower-fidelity embeddings where speed is sufficient.

Parlant uses embeddings relatively lightly, and since embedding models are quite inexpensive, we leave this decision to the framework.

Jackal Embedding

The cost-optimized embedding tier, used by Parlant for tasks where retrieval speed takes priority over maximum precision.

Pricing
$0.01 / 1M tokens

Bison Embedding

The high-fidelity embedding tier, used by Parlant when retrieval accuracy is important or when working with more complex, nuanced content.

Pricing
$0.12 / 1M tokens

Rate Limits

See Embedding model rate limits for RPM, TPM, and TPD limits by usage tier.