For the complete documentation index, see llms.txt. This page is also available as Markdown.

CLI Reference

Overview

Encoderfile provides two command-line tools:

  1. cli - Rust-based build tool for creating encoderfile binaries from ONNX models

  2. encoderfile - Rust-based runtime binary for serving models and running inference

Build Tool: cli

The cli build command compiles HuggingFace transformer models (with ONNX weights) into self-contained executable binaries using a YAML configuration file.

build

Validates a model configuration and builds a self-contained Rust binary with embedded model assets.

Usage

# If you haven't installed the CLI tool yet, build it first:
cargo build --bin encoderfile --release

# Then run it:
./target/release/encoderfile build -f <config.yml> [OPTIONS]

# Or install it to your system:
cargo install --path encoderfile --bin encoderfile
encoderfile build -f <config.yml> [OPTIONS]

Options

Option
Short
Type
Required
Description

-

-f

Path

Yes

Path to YAML configuration file

--output-dir

-

Path

No

Override output directory from config

--cache-dir

-

Path

No

Override cache directory from config

--no-build

-

Flag

No

Generate project files without building

--base-binary-path

-

Path

No

Specify custom local base binary

--platform

-

Option

No

Target platform for compiled binary (e.g., aarch64-apple-darwin, x86_64-unknown-linux-gnu). Equivalent of Cargo's --target. Default is the architecture of whatever machine you are using.

--runtime-version

-

Option

No

Override default encoderfile runtime version

--no-download

-

Flag

No

Disable downloading of base binary

Configuration File Format

Create a YAML configuration file (e.g., config.yml) with the following structure:

Model Types

  • embedding - For models using AutoModel or AutoModelForMaskedLM

    • Outputs: last_hidden_state with shape [batch_size, sequence_length, hidden_size]

  • sequence_classification - For models using AutoModelForSequenceClassification

    • Outputs: logits with shape [batch_size, num_labels]

  • token_classification - For models using AutoModelForTokenClassification

    • Outputs: logits with shape [batch_size, num_tokens, num_labels]

Examples

Build an embedding model:

Create embedding-config.yml:

Build:

Build a sentiment classifier:

Create sentiment-config.yml:

Build:

Build a NER model with transform:

Create ner-config.yml:

Build:

Generate without building:

Override output directory:

Build Process

The build command performs the following steps:

  1. Loads configuration - Parses the YAML config file

  2. Validates model files - Checks for required files:

    • model.onnx - ONNX model weights (or path specified in config)

    • tokenizer.json - Tokenizer configuration (or path specified in config)

    • config.json - Model configuration (or path specified in config)

  3. Validates ONNX model - Checks the ONNX model structure and compatibility

  4. Embeds assets - Appends embedded artifacts to a pre-built base binary

  5. Outputs binary - Copies the binary to the specified output path

Output

Upon successful build, you'll find the binary at the path specified in output_path.

If output_path is not specified, the binary defaults to:

For example, with name: my-model and output_path: ./build/my-model.encoderfile:

This binary is completely self-contained and includes:

  • ONNX model weights (embedded at compile time)

  • Tokenizer configuration (embedded)

  • Model metadata (embedded)

  • Full inference runtime

Requirements

Before building, ensure you have:

  • Valid ONNX model files

If you are compiling the encoderfile CLI from source, make sure you also have:

Troubleshooting

Error: "No such file: model.onnx"

Error: "Could not locate model config at path"

Error: "No such directory"

Error: "Cannot locate cache directory"


version

Prints the encoderfile version.

Usage

Output


Runtime Binary: encoderfile

After building with the cli tool, the resulting .encoderfile binary provides inference capabilities.

Architecture

The runtime CLI is built with the following components:

  • Server Mode: Hosts models via HTTP and/or gRPC endpoints

  • Inference Mode: Performs one-off inference operations from the command line

  • Multi-Model Support: Automatically detects and routes to the appropriate model type

Commands

serve

Starts the encoderfile server with HTTP and/or gRPC endpoints for model inference.

Usage

Options

Option
Type
Default
Description

--grpc-hostname

String

[::]

Hostname/IP address for the gRPC server

--grpc-port

String

50051

Port for the gRPC server

--http-hostname

String

0.0.0.0

Hostname/IP address for the HTTP server

--http-port

String

8080

Port for the HTTP server

--disable-grpc

Boolean

false

Disable the gRPC server

--disable-http

Boolean

false

Disable the HTTP server

Examples

Start both HTTP and gRPC servers (default):

Start only HTTP server:

Start only gRPC server:

Custom ports and hostnames:

Notes

  • At least one server type (HTTP or gRPC) must be enabled

  • The server will display a banner upon successful startup

  • Both servers run concurrently using async tasks


infer

Performs inference on input text using the configured model. The model type is automatically detected based on configuration.

Usage

Arguments

Argument
Required
Description

<INPUTS>

Yes

One or more text strings to process

Options

Option
Type
Default
Description

-f, --format

Enum

json

Output format (currently only JSON is supported)

-o, --out-dir

String

None

Output file path; if not provided, prints to stdout

Model Types

The inference behavior depends on the model type configured:

1. Embedding Models

Generates vector embeddings for input text.

Example:

With normalization disabled:

2. Sequence Classification Models

Classifies entire sequences (e.g., sentiment analysis, topic classification).

Example:

3. Token Classification Models

Labels individual tokens (e.g., Named Entity Recognition, Part-of-Speech tagging).

Example:

Output Formats

Currently, only JSON format is supported (--format json). The output structure varies by model type:

Embedding Output

Sequence Classification Output

Token Classification Output

Saving Output to File

Save results to a file:

Process multiple inputs and save:

Configuration

The CLI relies on external configuration to determine:

  • Model type (Embedding, SequenceClassification, TokenClassification)

  • Model path and parameters

  • Server settings

Ensure your configuration is properly set before running commands. Refer to the main encoderfile configuration documentation for details.

Error Handling

The CLI will return appropriate error messages for:

  • Invalid configuration (e.g., both servers disabled)

  • Missing required arguments

  • Model loading failures

  • Inference errors

  • File I/O errors

Examples

Basic Inference Workflow

Server Workflow

Batch Processing

Custom Server Configuration

Troubleshooting

Both servers cannot be disabled

Error: "Cannot disable both gRPC and HTTP"

Solution: Enable at least one server type:

Output not appearing

If output isn't visible, check:

  1. Ensure you're not redirecting output to a file unintentionally

  2. Check file permissions if using --out-dir

  3. Verify the model is correctly configured

Model type detection

The CLI automatically detects model type from configuration. If inference behaves unexpectedly:

  1. Verify your model configuration

  2. Ensure the model type matches your use case

  3. Check model compatibility

Complete Workflow Example

Here's a complete workflow from model export to deployment:

Step 1: Export Model to ONNX

Step 2: Create Configuration File

Create sentiment-config.yml:

Step 3: Build Encoderfile Binary

This creates: ./build/sentiment-analyzer.encoderfile

Step 4: Run Inference

Option A: Start server and use HTTP/gRPC

Option B: Direct CLI inference

Step 5: Deploy

Command Reference Summary

Command
Tool
Purpose

./target/release/encoderfile build -f config.yml

encoderfile

Build self-contained binary from ONNX model

./target/release/encoderfile version

encoderfile

Print version information

<model>.encoderfile serve

encoderfile

Start HTTP/gRPC inference server

<model>.encoderfile infer

encoderfile

Run single inference from command line

<model>.encoderfile mcp

encoderfile

Start MCP server

Additional Resources

Last updated