Getting Started

Get up and running with Emcie's NLP Service for optimized Parlant inference

Emcie is an auto-optimizing inference platform for Parlant agents.

Why Emcie

Running AI agents in production is expensive. Large language models (LLMs) deliver excellent results but their per-token costs add up quickly at scale. Small language models (SLMs) are cheaper, but often lack the accuracy needed for customer-facing applications.

Emcie addresses this challenge by automatically optimizing your Parlant agent's inference by automated distillation of bespoke SLMs, in addition to dynamic prompt optimizations tailored for your use case.

This allows you to deliver Parlant agents with LLM-level accuracy at significantly reduced costs.

How It Works

Emcie uses a Teacher/Student approach. When you first deploy, your agent runs on a larger "Teacher" model that produces high-quality results for each of Parlant's NLP tasks.

flowchart LR
    B[Parlant] -->|LLM Request| C[Emcie]
    C --> D[Large Teacher Model]
    D x@-->|Learn High-Quality Completion| F[Dataset]
    D -->|Response| B

    x@{ animate: true }
    D@{ shape: div-rect }
    F@{ shape: cyl }

    style B fill:#006a49,color:#fff,stroke-width:0px
    style D fill:#666,color:#fff,stroke:#fff,stroke-width:1px
    style F fill:#ddd,color:#333,stroke:#aaa,stroke-width:1px
    style C fill:#4c2efe,color:#fff,stroke-width:0px
During Optimization

The training platform records these interactions, harnessing the completions of the teacher model to learn how to properly handle your agent's actual usage patterns. Once enough data has been gathered, the platform distills a cost-efficient "Student" configuration (SLM and/or optimized prompts, as the case may be) and seamlessly transitions your requests to this more cost-efficient execution path, while maintaining the accuracy of completions.

flowchart LR
    B[Parlant] -->|LLM Request| C[Emcie]
    C --> D[Small Student Model]
    F[Dataset] x@-->|Teach High-Quality Completion| D
    D -->|Response| B

    F@{ shape: cyl }
    x@{ animate: true }

    style B fill:#006a49,color:#fff,stroke-width:0px
    style C fill:#4c2efe,color:#fff,stroke-width:0px
    style F fill:#ddd,color:#333,stroke:#aaa,stroke-width:1px
    style D fill:#666,color:#fff,stroke:#fff,stroke-width:1px
After Optimization

Once requests are handled through the student model, operational expenses drop dramatically, typically by at least 5x and often by up to 10x compared to off-the-shelf LLMs.

Learn More About Optimization Methods

As mentioned above, Emcie applies two types of optimizations:

1. Dynamic Prompt Optimizations

Since Parlant's default prompts are designed to be generic, they work well "off the shelf" across a wide range of use cases. The flip side of this is that generic prompts leave a lot of room for use case specific optimizations.

Generally speaking, the more you know about a use case, the more you can optimize it, as you have a clearer understanding of what isn't required. This is the core of Emcie's optimization approach.

By learning from your agent's actual usage patterns, Emcie gains a dynamic understanding what your agent's underlying models need to specialize for. It then optimizes prompts for the different tasks that Parlant runs, specialized for your use case, and updates these optimized versions automatically as you change your agent's configuration, or as customer usage patterns drift.

2. Automated SLM Distillation (Optional)

For even better results, you can enable SLM training in your settings. When enabled, the training platform may fine-tune a small language model on your agent's specific output expectations, if it finds that prompt optimizations alone cannot achieve the same level of precision as the teacher model.

Training these expectations directly into the model's weights tends to yield more accurate and consistent results than prompt optimizations alone. This added layer of training is especially valuable for agents with strict or nuanced behavioral requirements.

We recommend turning this option on if you find that accuracy or consistency can be improved.

📘

Info

Your data undergoes deep de-identification and obfuscation, and is stored on a SOC 2 compliant system. See our Trust Center for more information.


Importantly, both optimizations happen automatically in the background. You don't need to manage training, validation, or model switching. Emcie handles it seamlessly for you.

Prerequisites

  • Parlant SDK installed (pip install parlant)
  • An API key (get one on the API Keys page)

Quick Setup

1. Set Your API Key

export EMCIE_API_KEY="your-api-key-here"

Optional: Select Model Tier

The default model tier is Jackal, which is ideal for most use cases where accuracy is important but not legally or financially critical—such as education, AI copilots, and lead generation.

For compliance-sensitive applications—financial services, healthcare, large-scale customer service—where mishaps can lead to real damage, we recommend using Bison:

export EMCIE_MODEL_TIER="bison"

Learn more about Models.

2. Configure Parlant to Use Emcie

In your Parlant application, use p.NLPServices.emcie() as your NLP service:

import parlant.sdk as p

...

async with p.Server(
  nlp_service=p.NLPServices.emcie,  # <<< Change here
  # Other arguments...
  ) as server:
  ...

3. Run Your Agent

That's it. Your Parlant agent now uses Emcie for inference.

During the first few hundred conversations, Emcie collects data from the Teacher model and trains a Student configuration optimized for your specific use case. Once ready, your requests automatically switch to the cost-efficient Student model, with no additional action required on your part.

📘

Understanding initial costs

During this initial grace period, you'll be charged at Teacher rates (shown in Models). This is expected, as the platform needs high-quality training data before it can optimize. Once the grace period completes, costs drop significantly as traffic shifts to the Student model. Learn more about the optimization lifecycle.

Next Steps