Model Tiers

When you configure your Parlant agent to use Emcie, one of the first decisions you'll make is which model tier to use. This page explains what model tiers are, why they exist, and how to choose the right one for your use case.

The Relationship Between Model Size and Cost

At the heart of modern language models are parameters: these are the numerical weights that encode the model's knowledge and reasoning capabilities. A model with more parameters can, in general, capture more nuance and handle more complex tasks. However, more parameters come at a cost.

Every time a model processes a token, it must perform computations across all of its parameters. A model with 7 billion parameters requires far less compute per token than one with 70 billion parameters. Since cloud inference is ultimately priced by compute consumption, this creates a direct relationship: larger models cost more to run.

This presents a fundamental tradeoff. Larger models tend to produce more accurate and nuanced outputs, but they cost more per request. Smaller models are cheaper, but they may struggle with tasks that require sophisticated reasoning or fine-grained understanding. The challenge, then, is finding the right balance between cost and accuracy for your particular use case.

Optimization Opportunities in Parlant

Parlant's architecture creates a unique opportunity for cost optimization. When your agent generates a response, Parlant doesn't make a single monolithic request to a language model. Instead, it orchestrates multiple internal NLP requests, each serving a different purpose in the response generation pipeline.

Some of these requests are relatively simple. For instance, classifying whether a gross observation applies to a conversation's current state may not require the full reasoning power of a large model. Other requests are generally more demanding, such as evaluating whether a nuanced guideline applies to the current context, or determining exactly where a customer stands within a complex conversational journey.

This heterogeneity is the key to a good optimization strategy. If all requests required the same level of complexity, there would be little room for optimization. But because different requests have different complexity profiles, there's an opportunity to match each request with an appropriately sized model. Simpler tasks can be handled by smaller, cheaper models without any loss in quality, while complex tasks can be routed to larger models that have the capacity to handle them correctly.

Tiers Instead of Models

This is why Emcie doesn't offer a single "model" that you select for your agent. Instead, Emcie offers model tiers.

A model tier is a class of foundation models spanning a range of sizes, which the platform then utilizes with model-specific prompt optimizations, as well as fine-tuning (SFT and RLVR).

When you select a tier, you're not choosing a single model, but rather a family of models that the platform will draw from based on the nature of each request.

For simpler Parlant tasks, the platform typically selects a smaller model from within that tier. For more complex tasks, it selects a larger one.

This approach is what allows Emcie to optimize costs automatically. You simply choose the tier that reflects your overall priorities, and the platform handles the rest.

What Accuracy Means in Practice

Understanding the tradeoff between tiers requires understanding what "accuracy" means in practical terms for a Parlant agent.

Consider guideline matching. Parlant evaluates a set of guidelines to determine which ones apply to the current conversational context. With a lower-cost tier, the models may occasionally miss subtle nuances in when a guideline should activate. If your guidelines have fine-grained conditions or edge cases that depend on careful interpretation of context, you may find that the lower tier doesn't catch every case with the same level of precision.

Journey state recognition presents a similar challenge. Parlant tracks where customers are within conversational journeys, which are essentially state diagrams that guide the flow of interaction. Determining the current state requires understanding what has happened in the conversation and mapping it to the correct position in the journey. Lower tiers may occasionally misjudge this, especially in complex journeys with nuanced state transitions or states that have subtle distinctions between them.

These differences may not matter for every use case. If your agent handles straightforward interactions where edge cases are rare, a lower model tier will likely perform well. Conversely, if absolute reliability is a bigger concern, the higher model tier provides the precision you need while still taking advantage of optimizations.

The Two Tiers

Emcie offers two model tiers: Jackal and Bison.

Jackal

Jackal is the cost-optimized tier. The models within Jackal range from approximately 0.3 billion to 20 billion active parameters.

Many developers choose Parlant because it's a friendly framework, even when they're not building compliance-sensitive agents. For these use cases, accuracy is a should have but not an absolute must have—there are no critical legal or financial repercussions if the agent occasionally falls short on instruction-following fidelity. Typical use cases include education, AI copilots, lead generation, and similar applications.

Jackal is ideal for these scenarios. It provides a balance between accuracy and cost: more accurate than the vast majority of off-the-shelf models at similar price points, thanks to its Parlant-specific specialization.

Bison

Bison is the accuracy-optimized tier. The models within Bison range from approximately 20 billion to 180 billion active parameters.

Many companies use Parlant for compliance-sensitive use cases—financial services, healthcare, large-scale proactive customer service, and similar domains. In these scenarios, accuracy is crucial. Mishaps on the agent's part can lead to financial, legal, or reputational damage.

Bison was created for these use cases. While still providing a better price point than generic off-the-shelf models, it is based on larger models that handle more nuance and complexity. This leads to higher precision in tasks such as guideline matching, tool calling, and response generation.

Configuring Your Model Tier

You configure your model tier by setting the EMCIE_MODEL_TIER environment variable before starting your Parlant server.

export EMCIE_MODEL_TIER="jackal"

To use the higher tier instead:

export EMCIE_MODEL_TIER="bison"

If you don't set this variable, Emcie's NLP Service in Parlant defaults to Jackal. This setting is typically configured once at deployment time and applies to all requests from your agent.

Embedding Models

Each tier also includes embedding models for vector generation, used in semantic search and similarity operations. Jackal includes a cost-efficient embedding model with 1536 dimensions, while Bison includes a higher-fidelity embedding model with 3072 dimensions.

You don't need to choose between embedding tiers yourself. When using EmcieService in Parlant, the framework automatically selects the appropriate embedding model based on the needs of each task—using high-fidelity embeddings where precision matters and lower-fidelity embeddings where speed is sufficient. Parlant uses embeddings relatively lightly, and since embedding models are very inexpensive, we leave this decision to the framework.