Add support for realtime API #3714

mudler · 2024-10-01T20:29:04Z

Is your feature request related to a problem? Please describe.

OpenAI just extended their API with realtime support with web sockets
https://openai.com/index/introducing-the-realtime-api/?s=09

Describe the solution you'd like

LocalAI should support backends with voice capabilities and introduce a compatible API endpoint with OpenAI clients.

Ideally it should support also function calling as OpenAI does:

Under the hood, the Realtime API lets you create a persistent WebSocket connection to exchange messages with GPT-4o. The API supports function calling(opens in a new window), which makes it possible for voice assistants to respond to user requests by triggering actions or pulling in new context. For example, a voice assistant could place an order on behalf of the user or retrieve relevant customer information to personalize its responses.

Seems that also Chat completion API is gonna have audio output/input too, but API specs are not available yet:

Audio in the Chat Completions API will be released in the coming weeks, as a new model gpt-4o-audio-preview. With gpt-4o-audio-preview, developers can input text or audio into GPT-4o and receive responses in text, audio, or both.

Describe alternatives you've considered

Additional context

#3602
#3722

API docs: https://platform.openai.com/docs/guides/realtime https://platform.openai.com/docs/api-reference/realtime-client-events/session-update

https://github.com/tmc/grpc-websocket-proxy
https://github.com/openconfig/grpctunnel

https://github.com/mudler/LocalAI/tree/feat/realtime

open source models that can handle realtime speech:

The text was updated successfully, but these errors were encountered:

mudler · 2024-10-02T09:11:10Z

A good candidate VAD library: https://github.com/snakers4/silero-vad/tree/master/examples/go

mattkanwisher · 2024-10-02T15:01:25Z

I started looking at stubbing out the api, it's mostly just json, curious why you are suggesting the grpc-websocket-proxy?

mudler · 2024-10-02T15:07:32Z

I started looking at stubbing out the api, it's mostly just json, curious why you are suggesting the grpc-websocket-proxy?

I was digging a bit into projects that are interfacing with grpc and websockets - was just adding some code/notes here to pick up brain with, very preliminar search, that might be useful as reference/getting some ideas from

mudler · 2024-10-03T16:43:48Z

I started looking at stubbing out the api

Are you going to open up a PR? I was about to start playing with it as well, but if you are already taking a stab at it I'd go with #3670 instead :)

Update: Opened #3722 with what I had laying around. Now gonna have a look at vLLM first 🥽

thiswillbeyourgithub · 2024-10-12T13:25:12Z

A good candidate VAD library: https://github.com/snakers4/silero-vad/tree/master/examples/go

Heard about that one btw : https://github.com/wavey-ai/mel-spec?tab=readme-ov-file

Also copy pasting maybe useful ressources I linked in another repo, as 4o will not be the only one to support this if we want to bot rely too much on openai's code :

i saw on hackernews that agents by livekits used to make the openai realtime api as well as cerebras voice seems to be open source.

They have tons of demos and code on their github. I think there must be a llama-omni implementation somewhere that would be a killer feature for open-webui!

Here's a particularly interesting demo that connects stt + llm + tts: https://github.com/livekit/agents/blob/main/examples/voice-pipeline-agent/minimal_assistant.py

I made an issue to ask for a demo for Llama-Omni, also for kyutai's moshi model. There's also model's moshi implementation : https://github.com/modal-labs/quillman

fofsinx · 2024-11-02T02:51:51Z

I'm trying to build the server implementation based on openai spec for their Realtime API.

https://github.com/iamharshdev/OLlamaGate

mudler · 2024-11-06T13:27:44Z

There is a WIP branch over here : #3722

Contribution and feedbacks always welcome!

mudler added enhancement New feature or request roadmap labels Oct 1, 2024

mudler mentioned this issue Oct 2, 2024

chore: get model also from query #3716

Merged

This was referenced Oct 18, 2024

Support audio input and output in /chat/completions #3877

Open

feat: Realtime API support #3722

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for realtime API #3714

Add support for realtime API #3714

mudler commented Oct 1, 2024 •

edited

Loading

mudler commented Oct 2, 2024

mattkanwisher commented Oct 2, 2024

mudler commented Oct 2, 2024

mudler commented Oct 3, 2024 •

edited

Loading

thiswillbeyourgithub commented Oct 12, 2024 •

edited

Loading

fofsinx commented Nov 2, 2024

mudler commented Nov 6, 2024

Add support for realtime API #3714

Add support for realtime API #3714

Comments

mudler commented Oct 1, 2024 • edited Loading

mudler commented Oct 2, 2024

mattkanwisher commented Oct 2, 2024

mudler commented Oct 2, 2024

mudler commented Oct 3, 2024 • edited Loading

thiswillbeyourgithub commented Oct 12, 2024 • edited Loading

fofsinx commented Nov 2, 2024

mudler commented Nov 6, 2024

mudler commented Oct 1, 2024 •

edited

Loading

mudler commented Oct 3, 2024 •

edited

Loading

thiswillbeyourgithub commented Oct 12, 2024 •

edited

Loading