Introduction

Deploy Encoder Transformers as self-contained, single-binary executables.
Encoderfile packages transformer encoders—and their classification heads—into a single, self-contained executable.
Replace fragile, multi-gigabyte Python containers with lean, auditable binaries that have zero runtime dependencies. Written in Rust and built on ONNX Runtime, Encoderfile ensures strict determinism and high performance for financial platforms, content moderation pipelines, and search infrastructure.
Why Encoderfile?
While Llamafile focuses on generative models, Encoderfile is purpose-built for encoder architectures. It is designed for environments where compliance, latency, and determinism are non-negotiable.
Zero Dependencies: No Python, no PyTorch, no network calls. Just a fast, portable binary.
Smaller Footprint: Binaries are measured in megabytes, not the gigabytes required for standard container deployments.
Protocol Agnostic: Runs as a REST API, gRPC microservice, CLI tool, or MCP Server out of the box.
Compliance-Friendly: Deterministic and offline-safe, making it ideal for strict security boundaries.
Note for Windows users: Pre-built binaries are not available for Windows. Please see our guide on building from source for instructions on building from source.
Use Cases
Microservices
Run as a standalone gRPC or REST service on localhost or in production.
AI Agents
Register as an MCP Server to give agents reliable classification tools.
Batch Jobs
Use the CLI mode (infer) to process text pipelines without spinning up servers.
Edge Deployment
Deploy sentiment analysis, NER, or embeddings anywhere without Docker or Python.
Supported Models
Encoderfile supports encoder-only transformers for:
Token Embeddings - clustering, embeddings (BERT, DistilBERT, RoBERTa)
Sequence Classification - Sentiment analysis, topic classification
Token Classification - Named Entity Recognition, PII detection
Sentence Embeddings - Semantic search, clustering
See our guide on building from source for detailed instructions on building the CLI tool from source.
Generation models (GPT, T5) are not supported. See CLI Reference for complete model type details.
Quick Start
1. Install CLI
Download the pre-built CLI tool:
Or build from source (see Building Guide).
2. Export Model & Build
Export a HuggingFace model and build it into a binary:
See the Building Guide for detailed export options and configuration.
3. Run & Test
Start the server and make predictions:
See the API Reference for complete endpoint documentation.
Next Steps: Try the Token Classification Cookbook for a complete walkthrough.
How It Works
Encoderfile compiles your model into a self-contained binary by embedding ONNX weights, tokenizer, and config directly into Rust code. The result is a portable executable with zero runtime dependencies.
Documentation
Getting Started
Installation & Setup - Complete setup guide from installation to first deployment
Building Guide - Export models and configure builds
Tutorials
Token Classification (NER) - Build a Named Entity Recognition system
Transforms Guide - Custom post-processing with Lua scripts
Python Library
Building with Python - Build encoderfiles programmatically with the Python package
Python API Reference - Complete reference for all classes and functions
Reference
CLI Reference - Full documentation for
build,serve, andinfercommandsAPI Reference - REST, gRPC, and MCP endpoint specifications
Community & Support
GitHub Issues - Report bugs or request features
Contributing Guide - Learn how to contribute
Code of Conduct - Community guidelines
Standard builds of Encoderfile require glibc to run because of the ONNX runtime. See this issue on progress on building Encoderfile for musl linux.
Last updated