> For the complete documentation index, see [llms.txt](https://docs.mozilla.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.mozilla.ai/llamafile/whisperfile/gpu.md).

# Using GPUs

GPU acceleration is most beneficial for the medium and large models. The tiny model is already fast on CPU, so the speedup there is minimal.

Pass `--gpu auto` to let whisperfile detect and use the best available GPU on your system. If no supported GPU is found, it falls back to CPU silently:

```bash
whisperfile -m models/ggml-medium.en.bin -f audio.wav --gpu auto
```

You can also target a specific backend:

* `--gpu apple` — Apple Metal (macOS, works on Apple Silicon and AMD GPUs)
* `--gpu nvidia` — NVIDIA CUDA (requires CUDA Toolkit to be installed)
* `--gpu amd` — AMD ROCm (requires ROCm to be installed on Linux)

To disable GPU acceleration entirely:

```bash
whisperfile -m models/ggml-medium.en.bin -f audio.wav --no-gpu
```

## Troubleshooting

**`ggml_backend_load_best: search path does not exist` warnings**

These are benign. They appear when whisperfile searches for GPU backend libraries and doesn't find them — usually because no GPU is present or configured. Transcription will continue on CPU. To suppress them, redirect stderr:

```bash
whisperfile -m models/ggml-medium.en.bin -f audio.wav 2>/dev/null
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.mozilla.ai/llamafile/whisperfile/gpu.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
