For the complete documentation index, see llms.txt. This page is also available as Markdown.

Getting Started

This quick-start guide will help you build and run your first encoderfile in under 10 minutes.

Prerequisites

encoderfile CLI Tool

You need the encoderfile CLI tool installed:

  • Pre-built binary (recommended) - Fastest setup for Linux/macOS users

curl -fsSL https://raw.githubusercontent.com/mozilla-ai/encoderfile/main/install.sh | sh

Python with Optimum

For exporting models to ONNX:

Requires Python 3.13+

pip install optimum[onnxruntime] onnxruntime

There are some resources that you can check about the ONNX runtime, what HF models it supports, and how to export a model in HF to this format:

  • https://onnxruntime.ai/huggingface

  • https://huggingface.co/docs/optimum-onnx/onnx/usage_guides/export_a_model

  • https://huggingface.co/docs/transformers/serialization#onnx

Your First Encoderfile

Let's build a sentiment analysis model as an example.

Step 1: Export Model to ONNX

Export a HuggingFace model to ONNX format:

This creates a directory with the required files:

Available task types:

  • feature-extraction - For embedding models

  • text-classification - For sequence classification

  • token-classification - For NER/token tagging

Step 2: Create Configuration File

Create sentiment-config.yml:

Key fields:

  • name - Model identifier (used in API responses)

  • path - Path to the model directory with ONNX weights

  • model_type - embedding, sequence_classification, or token_classification

  • output_path - Where to output the binary (optional, defaults to ./<name>.encoderfile)

Step 3: Build the Binary

Build your encoderfile:

Note: If you built the CLI from source, use: ./target/release/encoderfile build -f sentiment-config.yml

The binary will be created at ./build/sentiment-analyzer.encoderfile.

Step 4: Run the Server

Start your encoderfile server:

You should see:

Step 5: Make Predictions

Test with curl:

Expected response:

Quick Examples

Embedding Model

Token Classification (NER)

Common Tasks

Server Configuration

Custom ports:

HTTP only (disable gRPC):

gRPC only (disable HTTP):

CLI Inference

Run inference without starting a server:

Using Pre-Exported Models

Some HuggingFace models already have ONNX weights:

Troubleshooting

ONNX Export Fails

  • Check model compatibility (must be encoder-only)

  • Try a different task type

  • Check the model's HuggingFace page for known issues

Build Fails

  • Ensure the model directory has model.onnx, tokenizer.json, and config.json

  • Verify the model type matches the architecture

  • See our guide on building for detailed troubleshooting

Server Won't Start

  • Check if ports are already in use

  • Try different ports with --http-port and --grpc-port

  • Check file permissions: chmod +x ./build/my-model.encoderfile

Inference Errors

  • Check input format matches the expected schema

  • Verify the server is running

  • Check server logs for error messages

Next Steps

Last updated