Model Roles

Within each model tier, Emcie employs a Teacher/Student architecture to optimize your agent's inference costs while preserving quality. This page explains how this architecture works, what the different roles mean, and how to configure them for your deployment.

The Optimization Challenge

Large language models are powerful, but they are expensive to run at scale. The straightforward alternative, using smaller models, often sacrifices the accuracy and nuance that customer-facing applications require. This creates a tension: you want the quality of large models at the cost of small ones.

We've seen this tension come up many times in the Parlant community on both GitHub and Parlant's Discord.

Our solution lies in specialization. A small model (SLM) that has been specifically trained or optimized for your particular use case can often match the performance of a much larger general-purpose model on that specific task.

This is the essence of distillation: transferring the knowledge and behavior of a larger "teacher" model into a smaller, more efficient "student" model.

Emcie automates this entire process. You deploy your agent, and Emcie handles the Parlant-specific optimizations behind the scenes, gradually transitioning your workload from expensive teacher models to cost-efficient student models without any manual intervention.

How Emcie Optimizes Your Agent

Emcie applies two complementary optimization techniques, both of which leverage the teacher model's outputs as a learning signal, while additionally incorporating domain-specific principles for Parlant and the conversational context.

1. Dynamic Prompt Optimization

The first technique involves optimizing the prompts used for each of Parlant's internal NLP tasks. Parlant's default prompts are designed to be generic; they work reliably across a wide range of use cases out of the box.

However, generic prompts leave room for improvement when you know more about the specific use case.

By observing how the teacher model handles your actual traffic, the platform learns what your agent needs to specialize for. It then generates optimized prompts tailored to your use case and the target student models. These optimized prompts help smaller models perform tasks that would otherwise require larger ones.

Importantly, this technique does not modify any model weights. It simply provides a more specialized, and often compressed, set of instructions to the models used within the tier, taking into account model-specific biases for different tasks.

2. SLM Distillation (Optional)

The second technique goes further by fine-tuning SLMs directly on your agent's specific output expectations.

When you enable SLM training in your settings, the platform may train, when additional accuracy is detected as needed, a custom model that encodes the behavioral patterns learned from the teacher.

Training these expectations into the model's weights yields more accurate and consistent results than prompt optimization alone. This is particularly valuable for agents with strict or nuanced behavioral requirements, or cases where the student model cannot fully close the gap with the teacher's quality using prompt optimizations alone.

The Grace Period

Cost and speed optimizations are only good when a sufficient level of behavioral correctness is maintained.

When you first deploy your agent with Emcie, or when you make changes to your agent's configuration, there is no usage data yet. However, Emcie needs to observe how the teacher model handles your actual traffic, to use its completions as a reference point, before it can create effective optimizations.

This observation phase is called the grace period. During the grace period, the platform routes all requests to the teacher model within your selected tier. As the teacher processes these requests, the platform records the completions and learns from them. These completions become the training signal for subsequent optimization.

flowchart LR
    A[User] -->|Message| B[Parlant]
    B -->|LLM Request| C[Emcie]
    C --> D[Teacher Model]
    D x@-->|High-Quality Completion| F[Dataset]
    D -.->|LLM Response| B
    B -.->|Reply| A

    x@{ animate: true }
    A@{ shape: manual-input }
    D@{ shape: div-rect }
    F@{ shape: cyl }

    style B fill:#006a49,color:#fff,stroke-width:0px
    style C fill:#4c2efe,color:#fff,stroke-width:0px

The grace period typically spans a few hundred conversations, though the exact duration depends on the diversity and volume of your traffic. The more varied the interactions during this period, the more robust the resulting optimizations will be.

Teacher and Student Roles

The two roles within each tier serve distinct purposes in the optimization lifecycle.

Teacher

The teacher is the larger, more capable model within the tier. It serves as the source of truth for optimization. Its outputs define what quality looks like for your use case. During grace periods, the teacher handles all requests, producing the high-quality completions that Emcie uses to train and validate the student.

Because the teacher is a larger model, it has a higher price per token. However, this cost is an investment: the data collected during the grace period enables the optimizations that will dramatically reduce your costs once the student takes over.

Student

The student is the smaller, cost-optimized model. After the grace period, once Emcie has generated optimized prompts and optionally fine-tuned the student on your specific patterns, requests transition to the student model.

The student costs significantly less per token than the teacher. Despite this cost reduction, the student is specifically optimized for your use case, allowing it to match the teacher's quality on the tasks your agent actually performs.

The Optimization Lifecycle

The full optimization lifecycle proceeds through several stages, managed entirely, and seamlessly, by the platform.

flowchart LR
    A[Deploy<br/>Teacher] --> B[Grace Period<br/>Teacher]
    B --> C[Training Platform]
    C --> D[Validation]
    D --> E[Optimized<br/>Student]
    E -.->|Continuous Monitoring| D

When you first deploy, all requests go to the teacher. As traffic flows through, the platform collects data and eventually trains an optimized student configuration. Before transitioning requests to the student, the platform validates that the student achieves acceptable quality on held-out examples. Once validation passes, requests begin routing to the student, and your costs decrease.

The platform continues to monitor the student's performance over time. If any issues are detected, such as when model drift occurs, or when agent configuration changes, it can adjust accordingly.

Handling Configuration Changes

One of Emcie's most valuable capabilities is its intelligent handling of configuration changes. When you update your Parlant agent—modifying a guideline, adjusting a journey, or changing any other configuration element—the platform detects the change and responds appropriately.

Rather than invalidating all optimizations and starting from scratch, it analyzes what specifically changed. Only the entities affected by the change need to go through a new grace period. Entities that were not modified can continue using their existing optimizations.

flowchart LR
    A[Configuration Change Detected] --> B[Analyze What Changed]
    B --> C{Entity Modified?}
    C -->|Guideline A: Modified| D[Teacher<br/>New Grace Period]
    C -->|Journey B: Unchanged| E[Student<br/>Already Optimized]
    C -->|Guideline C: Unchanged| F[Student<br/>Already Optimized]

This selective approach minimizes the cost impact of configuration changes. You can iterate on your agent's behavior without repeatedly paying for full grace periods across your entire configuration.

Data Privacy

Understanding how your data is handled is important when using an optimization platform.

For Prompt Optimizations

When Emcie optimizes prompts for your agent, your data undergoes deep de-identification and obfuscation before being processed. The data is stored on compliant systems (view our Trust Center). The resulting optimization artifacts (the improved prompts) are not used for training any models and remain specific to your account. Your data stays yours, and can be removed at any time upon request.

For SLM Distillation

If you choose to enable SLM training, you agree to share the same de-identified data in accordance with our Data Processing Agreement. This data sharing enables the fine-tuning process that produces custom models optimized for your specific use case. In exchange, you receive the benefit of models trained specifically on your patterns, achieving higher accuracy while still paying the chosen tier's price point.

For more information about our security practices and compliance certifications, visit our Trust Center.

Configuring Model Role

You configure the model role by setting the EMCIE_MODEL_ROLE environment variable. There are three options, each suited to different scenarios.

Auto (Default)

Auto mode is the recommended setting for production deployments. In this mode, Emcie manages role transitions dynamically, using the teacher during grace periods and transitioning to the student once optimization is complete. It tracks which entities require re-optimization after configuration changes and handles transitions seamlessly.

export EMCIE_MODEL_ROLE="auto"

With auto mode, you don't need to think about role management. The platform handles everything automatically.

Teacher (Recommended for Development)

During development, it often makes sense to force all requests to use the teacher model. This ensures you see the highest-quality outputs while building and testing your agent's configuration. You want to validate that your guidelines, journeys, and other settings work correctly before any optimizations are applied on top of them.

export EMCIE_MODEL_ROLE="teacher"

This setting is more expensive than the student role, but the clarity it provides during development is valuable. Once you're confident in your configuration, you can switch to auto mode for production.

Student (For Controlled Production)

In some scenarios, you may want to guarantee that production traffic always uses the student model, ensuring the lowest possible cost. This is particularly useful when you have already completed the grace period in a staging environment, and are sure that your configuration won't change dynamically in production.

The workflow is as follows: deploy your agent to staging with auto mode enabled, run simulated or canary traffic through it until the grace period completes and optimization finishes, then deploy to production with the student role forced. Because the optimization was already completed in staging, you know the student is ready, and you can be confident that production will always run at the lower cost point.

export EMCIE_MODEL_ROLE="student"

Be careful with this setting: forcing student mode before optimization is complete will result in degraded quality. Only use it when you have verified that optimization has finished.

Development and Production Workflow

Putting these pieces together, here is a recommended workflow for taking an agent from development to production.

During development, set the model role to teacher. This lets you iterate on your agent with full confidence that you're seeing the best possible outputs. Focus on getting the behavior right without worrying about optimization.

When you're ready to prepare for production, deploy to a staging environment with the model role set to auto. Run simulated traffic or route a portion of real traffic through staging. Allow the grace period to complete and optimization to finish. Monitor the transition and verify that the student performs acceptably.

For production, you have two options. You can use auto mode, which provides dynamic management and handles any future configuration changes gracefully. Alternatively, if you want guaranteed lowest cost and have completed the grace period in staging, you can force student mode to ensure all production traffic uses the optimized path.

Whichever approach you choose, Emcie's role system ensures that you maintain control over the cost-quality tradeoff while benefiting from automated optimization.