API Reference
Overview
Encoderfile provides three API interfaces for model inference:
HTTP REST API - JSON-based HTTP endpoints (default port: 8080)
gRPC API - Protocol Buffer-based RPC service (default port: 50051)
MCP (Model Context Protocol) - Integration with MCP-compatible systems
The available endpoints depend on the model type your encoderfile was built with:
embedding- Extract token embeddings from textsequence_classification- Classify entire text sequences (e.g., sentiment analysis)token_classification- Classify individual tokens (e.g., Named Entity Recognition)
HTTP REST API
All endpoints return JSON responses. Errors return appropriate HTTP status codes with error messages.
Common Endpoints
These endpoints are available for all model types:
GET /health
GET /healthHealth check endpoint to verify the server is running.
Response:
Status Codes:
200 OK- Server is healthy
Example:
GET /model
GET /modelReturns metadata about the loaded model.
Response:
Fields:
model_id(string) - The model identifier specified during buildmodel_type(string) - Type of model loadedid2label(object, optional) - Label mappings for classification models (not present for embedding models)
Status Codes:
200 OK- Successful
Example:
Example Response:
GET /openapi.json
GET /openapi.jsonReturns the OpenAPI specification for the API.
Response:
OpenAPI 3.0 JSON specification
Status Codes:
200 OK- Successful
Example:
Embedding Models
POST /predict
POST /predictGenerate embeddings for input text sequences.
Request Body:
Fields:
inputs(array of strings, required) - Text sequences to embednormalize(boolean, required) - Whether to L2-normalize the embeddingsmetadata(object, optional) - Custom key-value pairs to include in response
Response:
Response Fields:
results(array) - One result per input sequenceembeddings(array) - One embedding per token in the sequenceembedding(array of floats) - The embedding vectortoken_info(object, optional) - Information about the tokentoken(string) - The token texttoken_id(integer) - The token's vocabulary IDstart(integer) - Character offset where token startsend(integer) - Character offset where token ends
model_id(string) - The model identifiermetadata(object, optional) - Custom metadata from request
Status Codes:
200 OK- Successful422 Unprocessable Entity- Invalid input500 Internal Server Error- Server error
Example:
Example Response:
Sequence Classification Models
POST /predict
POST /predictClassify entire text sequences.
Request Body:
Fields:
inputs(array of strings, required) - Text sequences to classifymetadata(object, optional) - Custom key-value pairs to include in response
Response:
Response Fields:
results(array) - One result per input sequencelogits(array of floats) - Raw model outputs before softmaxscores(array of floats) - Probability scores after softmax (sum to 1.0)predicted_index(integer) - Index of the highest-scoring classpredicted_label(string, optional) - Label corresponding to the predicted index (if model has label mappings)
model_id(string) - The model identifiermetadata(object, optional) - Custom metadata from request
Status Codes:
200 OK- Successful422 Unprocessable Entity- Invalid input500 Internal Server Error- Server error
Example:
Example Response:
Token Classification Models
POST /predict
POST /predictClassify individual tokens in text sequences.
Request Body:
Fields:
inputs(array of strings, required) - Text sequences to processmetadata(object, optional) - Custom key-value pairs to include in response
Response:
Response Fields:
results(array) - One result per input sequencetokens(array) - One classification per tokentoken_info(object) - Information about the tokentoken(string) - The token texttoken_id(integer) - The token's vocabulary IDstart(integer) - Character offset where token startsend(integer) - Character offset where token ends
logits(array of floats) - Raw model outputs before softmaxscores(array of floats) - Probability scores after softmax (sum to 1.0)label(string) - The predicted label for this tokenscore(float) - The probability score for the predicted label
model_id(string) - The model identifiermetadata(object, optional) - Custom metadata from request
Status Codes:
200 OK- Successful422 Unprocessable Entity- Invalid input500 Internal Server Error- Server error
Example:
Example Response:
gRPC API
The gRPC API provides the same functionality as the HTTP REST API using Protocol Buffers. Three services are available depending on your model type.
Connection Details
Default hostname:
[::](all interfaces)Default port:
50051Protocol: gRPC (HTTP/2)
Service Definitions
All proto files are located in encoderfile/proto/.
Common Service Methods
All three services implement these methods:
GetModelMetadata
Returns metadata about the loaded model.
Request: Empty (GetModelMetadataRequest)
Response:
Embedding Service
Service: encoderfile.Embedding
Predict
PredictGenerate embeddings for input text sequences.
Request:
Response:
Example (grpcurl):
Sequence Classification Service
Service: encoderfile.SequenceClassification
Predict
PredictClassify entire text sequences.
Request:
Response:
Example (grpcurl):
Token Classification Service
Service: encoderfile.TokenClassification
Predict
PredictClassify individual tokens in text sequences.
Request:
Response:
Example (grpcurl):
gRPC Error Codes
gRPC errors use standard status codes:
INVALID_ARGUMENT
422
Invalid input provided
INTERNAL
500
Internal server error or configuration error
MCP (Model Context Protocol)
Encoderfile supports Model Context Protocol, allowing integration with MCP-compatible systems.
Connection Details
Endpoint:
/mcpTransport: HTTP-based MCP protocol (Streamable HTTP only)
Port: Same as HTTP server (default: 8080)
MCP Tools
Each model type exposes a single tool via MCP:
Embedding Models
Tool: run_encoder
Description: "Performs embeddings for input text sequences."
Parameters: Same as HTTP EmbeddingRequest
Returns: Same as HTTP EmbeddingResponse
Sequence Classification Models
Tool: run_encoder
Description: "Performs sequence classification of input text sequences."
Parameters: Same as HTTP SequenceClassificationRequest
Returns: Same as HTTP SequenceClassificationResponse
Token Classification Models
Tool: run_encoder
Description: "Performs token classification of input text sequences."
Parameters: Same as HTTP TokenClassificationRequest
Returns: Same as HTTP TokenClassificationResponse
MCP Server Information
When connected, the MCP server provides:
Protocol Version:
2025-06-18Capabilities: Tools only
Server Info: Build environment details
MCP Usage Example
To use with an MCP client:
Error Handling
Error Types
Encoderfile uses three error types:
InputError
422 Unprocessable Entity
INVALID_ARGUMENT
INVALID_REQUEST
Invalid input data
InternalError
500 Internal Server Error
INTERNAL
INTERNAL_ERROR
Runtime error
ConfigError
500 Internal Server Error
INTERNAL
INTERNAL_ERROR
Configuration error
Error Response Format
HTTP REST
Errors return a plain text error message with the appropriate status code:
gRPC
Errors return a Status object:
MCP
Errors return an MCP error object:
Client Examples
Python (HTTP)
Python (gRPC)
JavaScript (HTTP)
Go (gRPC)
cURL (HTTP)
Rate Limiting & Performance
Batching
All endpoints support batch processing by providing multiple inputs in a single request:
Batch processing is more efficient than multiple single requests.
Concurrency
Encoderfile uses async I/O and can handle multiple concurrent requests. The exact concurrency limit depends on:
Available system resources (CPU, memory)
Model size and complexity
Input sequence length
Best Practices
Batch requests when processing multiple texts
Reuse connections (HTTP keep-alive, gRPC channel pooling)
Set appropriate timeouts for long sequences
Monitor memory usage with large batches or long sequences
Use gRPC for high-throughput scenarios (lower overhead than HTTP/JSON)
See Also
CLI Documentation - Command-line interface reference
Getting Started - Getting started guide
Contributing Guide - Development setup
Last updated