For the complete documentation index, see llms.txt. This page is also available as Markdown.

API Reference

Overview

Encoderfile provides three API interfaces for model inference:

  • HTTP REST API - JSON-based HTTP endpoints (default port: 8080)

  • gRPC API - Protocol Buffer-based RPC service (default port: 50051)

  • MCP (Model Context Protocol) - Integration with MCP-compatible systems

The available endpoints depend on the model type your encoderfile was built with:

  • embedding - Extract token embeddings from text

  • sequence_classification - Classify entire text sequences (e.g., sentiment analysis)

  • token_classification - Classify individual tokens (e.g., Named Entity Recognition)


HTTP REST API

All endpoints return JSON responses. Errors return appropriate HTTP status codes with error messages.

Common Endpoints

These endpoints are available for all model types:

GET /health

Health check endpoint to verify the server is running.

Response:

Status Codes:

  • 200 OK - Server is healthy

Example:


GET /model

Returns metadata about the loaded model.

Response:

Fields:

  • model_id (string) - The model identifier specified during build

  • model_type (string) - Type of model loaded

  • id2label (object, optional) - Label mappings for classification models (not present for embedding models)

Status Codes:

  • 200 OK - Successful

Example:

Example Response:


GET /openapi.json

Returns the OpenAPI specification for the API.

Response:

  • OpenAPI 3.0 JSON specification

Status Codes:

  • 200 OK - Successful

Example:


Embedding Models

POST /predict

Generate embeddings for input text sequences.

Request Body:

Fields:

  • inputs (array of strings, required) - Text sequences to embed

  • normalize (boolean, required) - Whether to L2-normalize the embeddings

  • metadata (object, optional) - Custom key-value pairs to include in response

Response:

Response Fields:

  • results (array) - One result per input sequence

    • embeddings (array) - One embedding per token in the sequence

      • embedding (array of floats) - The embedding vector

      • token_info (object, optional) - Information about the token

        • token (string) - The token text

        • token_id (integer) - The token's vocabulary ID

        • start (integer) - Character offset where token starts

        • end (integer) - Character offset where token ends

  • model_id (string) - The model identifier

  • metadata (object, optional) - Custom metadata from request

Status Codes:

  • 200 OK - Successful

  • 422 Unprocessable Entity - Invalid input

  • 500 Internal Server Error - Server error

Example:

Example Response:


Sequence Classification Models

POST /predict

Classify entire text sequences.

Request Body:

Fields:

  • inputs (array of strings, required) - Text sequences to classify

  • metadata (object, optional) - Custom key-value pairs to include in response

Response:

Response Fields:

  • results (array) - One result per input sequence

    • logits (array of floats) - Raw model outputs before softmax

    • scores (array of floats) - Probability scores after softmax (sum to 1.0)

    • predicted_index (integer) - Index of the highest-scoring class

    • predicted_label (string, optional) - Label corresponding to the predicted index (if model has label mappings)

  • model_id (string) - The model identifier

  • metadata (object, optional) - Custom metadata from request

Status Codes:

  • 200 OK - Successful

  • 422 Unprocessable Entity - Invalid input

  • 500 Internal Server Error - Server error

Example:

Example Response:


Token Classification Models

POST /predict

Classify individual tokens in text sequences.

Request Body:

Fields:

  • inputs (array of strings, required) - Text sequences to process

  • metadata (object, optional) - Custom key-value pairs to include in response

Response:

Response Fields:

  • results (array) - One result per input sequence

    • tokens (array) - One classification per token

      • token_info (object) - Information about the token

        • token (string) - The token text

        • token_id (integer) - The token's vocabulary ID

        • start (integer) - Character offset where token starts

        • end (integer) - Character offset where token ends

      • logits (array of floats) - Raw model outputs before softmax

      • scores (array of floats) - Probability scores after softmax (sum to 1.0)

      • label (string) - The predicted label for this token

      • score (float) - The probability score for the predicted label

  • model_id (string) - The model identifier

  • metadata (object, optional) - Custom metadata from request

Status Codes:

  • 200 OK - Successful

  • 422 Unprocessable Entity - Invalid input

  • 500 Internal Server Error - Server error

Example:

Example Response:


gRPC API

The gRPC API provides the same functionality as the HTTP REST API using Protocol Buffers. Three services are available depending on your model type.

Connection Details

  • Default hostname: [::] (all interfaces)

  • Default port: 50051

  • Protocol: gRPC (HTTP/2)

Service Definitions

All proto files are located in encoderfile/proto/.

Common Service Methods

All three services implement these methods:

GetModelMetadata

Returns metadata about the loaded model.

Request: Empty (GetModelMetadataRequest)

Response:


Embedding Service

Service: encoderfile.Embedding

Predict

Generate embeddings for input text sequences.

Request:

Response:

Example (grpcurl):


Sequence Classification Service

Service: encoderfile.SequenceClassification

Predict

Classify entire text sequences.

Request:

Response:

Example (grpcurl):


Token Classification Service

Service: encoderfile.TokenClassification

Predict

Classify individual tokens in text sequences.

Request:

Response:

Example (grpcurl):


gRPC Error Codes

gRPC errors use standard status codes:

Status Code
HTTP Equivalent
Description

INVALID_ARGUMENT

422

Invalid input provided

INTERNAL

500

Internal server error or configuration error


MCP (Model Context Protocol)

Encoderfile supports Model Context Protocol, allowing integration with MCP-compatible systems.

Connection Details

  • Endpoint: /mcp

  • Transport: HTTP-based MCP protocol (Streamable HTTP only)

  • Port: Same as HTTP server (default: 8080)

MCP Tools

Each model type exposes a single tool via MCP:

Embedding Models

Tool: run_encoder

Description: "Performs embeddings for input text sequences."

Parameters: Same as HTTP EmbeddingRequest

Returns: Same as HTTP EmbeddingResponse


Sequence Classification Models

Tool: run_encoder

Description: "Performs sequence classification of input text sequences."

Parameters: Same as HTTP SequenceClassificationRequest

Returns: Same as HTTP SequenceClassificationResponse


Token Classification Models

Tool: run_encoder

Description: "Performs token classification of input text sequences."

Parameters: Same as HTTP TokenClassificationRequest

Returns: Same as HTTP TokenClassificationResponse


MCP Server Information

When connected, the MCP server provides:

  • Protocol Version: 2025-06-18

  • Capabilities: Tools only

  • Server Info: Build environment details

MCP Usage Example

To use with an MCP client:


Error Handling

Error Types

Encoderfile uses three error types:

Error Type
HTTP Status
gRPC Status
MCP Error Code
Description

InputError

422 Unprocessable Entity

INVALID_ARGUMENT

INVALID_REQUEST

Invalid input data

InternalError

500 Internal Server Error

INTERNAL

INTERNAL_ERROR

Runtime error

ConfigError

500 Internal Server Error

INTERNAL

INTERNAL_ERROR

Configuration error

Error Response Format

HTTP REST

Errors return a plain text error message with the appropriate status code:

gRPC

Errors return a Status object:

MCP

Errors return an MCP error object:


Client Examples

Python (HTTP)

Python (gRPC)

JavaScript (HTTP)

Go (gRPC)

cURL (HTTP)


Rate Limiting & Performance

Batching

All endpoints support batch processing by providing multiple inputs in a single request:

Batch processing is more efficient than multiple single requests.

Concurrency

Encoderfile uses async I/O and can handle multiple concurrent requests. The exact concurrency limit depends on:

  • Available system resources (CPU, memory)

  • Model size and complexity

  • Input sequence length

Best Practices

  1. Batch requests when processing multiple texts

  2. Reuse connections (HTTP keep-alive, gRPC channel pooling)

  3. Set appropriate timeouts for long sequences

  4. Monitor memory usage with large batches or long sequences

  5. Use gRPC for high-throughput scenarios (lower overhead than HTTP/JSON)


See Also

Last updated