> For the complete documentation index, see [llms.txt](https://docs.mozilla.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.mozilla.ai/llamafile/whisperfile/server.md).

# Server

The whisper-server provides an HTTP API for speech-to-text transcription. Audio files are passed to the inference model via HTTP requests. MP3, FLAC, and OGG files are automatically converted to WAV format.

## Usage

Build and run the server with a model:

```bash
.cosmocc/4.0.2/bin/make -j8 o//whisperfile
o//whisperfile/whisper-server -m models/whisper-tiny.en-q5_1.bin
```

The server accepts the following options:

```
whisper-server options:
  -m FNAME, --model FNAME     Path of Whisper model weights
  --host ADDR                 Hostname or IP address to bind to (default: 127.0.0.1)
  --port PORT                 Port number (default: 8080)
  -l LANG, --language LANG    Default spoken language ('auto' for auto-detect)
  -tr, --translate            Translate audio into English text
  -t N, --threads N           Number of threads to use during computation
  -ng, --no-gpu               Disable GPU acceleration
  --gpu VALUE                 Select GPU backend (auto, apple, amd, nvidia, disable)
  --log-disable               Suppress logging output
```

Run `whisper-server --help` for the complete list of options.

> \[!WARNING] **Do not run the server with administrative privileges and ensure it's operated in a sandbox environment, especially since it involves risky operations like accepting user file uploads. Always validate and sanitize inputs to guard against potential security threats.**

## HTTP Endpoints

### GET /health

Returns server health status as JSON. Returns HTTP 503 if the model is still loading.

```bash
curl http://localhost:8080/health
```

Response when ready (HTTP 200):

```json
{"status": "ok"}
```

Response while model is loading (HTTP 503):

```json
{"status": "loading model"}
```

### POST /inference

Transcribe an audio file. Send as multipart/form-data with the audio file in a field named "file".

Optional form fields:

* `response_format` - Output format: json, text, srt, vtt, verbose\_json (default: json)
* `language` - Spoken language or 'auto' for detection
* `translate` - Set to 'true' to translate to English
* `temperature` - Sampling temperature
* `prompt` - Initial prompt for the model

Example:

```bash
curl http://localhost:8080/inference \
  -F "file=@whisper.cpp/samples/jfk.wav" \
  -F "response_format=json"
```

Response (HTTP 200):

```json
{"text": " And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country."}
```

### POST /load

Load a different model at runtime.

```bash
curl http://localhost:8080/load \
  -F "model=/path/to/model.bin"
```

Response (HTTP 200):

```
Load was successful!
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.mozilla.ai/llamafile/whisperfile/server.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.