[Live API] Session resumption becomes unstable after prior audio+video use; audio/text-only resumes work

### Description

I am seeing what looks like a model-side or Live API session-resumption issue when using `google-genai` with `gemini-3.1-flash-live-preview`.

Session resumption works reliably for audio/text-only sessions. However, if the session has ever included both microphone audio and video frames, the resumed session later fails with a `1007 invalid argument` style close/error, even when:

- reconnect itself succeeds
- the mic/video are turned off before reconnect
- the resumed session initially appears healthy
- the first post-resume utterance may even trigger a normal tool call / model behavior

This makes it look less like a bad reconnect boundary on the client and more like resumed multimodal session state becoming invalid/unstable after prior audio+video use.

### Environment

- OS: Windows
- Python: 3.12.10
- SDK: `google-genai` version: `1.30`
- Model: `gemini-3.1-flash-live-preview`
- Transport: Live API over WebSocket via `client.aio.live.connect(...)`
- Input types used:
  - audio: `audio/pcm;rate=16000`
  - video: `image/jpeg`
  - text via `send_realtime_input(text=...)`

### Minimal config

I connect roughly like this:

```python
config = types.LiveConnectConfig(
    response_modalities=[types.Modality.AUDIO],
    system_instruction=types.Content(parts=[types.Part(text="...")]),
    input_audio_transcription=types.AudioTranscriptionConfig(),
    output_audio_transcription=types.AudioTranscriptionConfig(),
    realtime_input_config=types.RealtimeInputConfig(
        turn_coverage="TURN_INCLUDES_AUDIO_ACTIVITY_AND_ALL_VIDEO",
    ),
    context_window_compression=types.ContextWindowCompressionConfig(
        sliding_window=types.SlidingWindow(),
    ),
    session_resumption=types.SessionResumptionConfig(handle=connect_handle),
)

async with client.aio.live.connect(
    model="gemini-3.1-flash-live-preview",
    config=config,
) as session:
    ...
```
And realtime inputs are sent using:

```python
### tested even with asyncio.Lock()

await session.send_realtime_input(
    audio=types.Blob(data=pcm_chunk, mime_type="audio/pcm;rate=16000")
)

await session.send_realtime_input(
    video=types.Blob(data=jpeg_bytes, mime_type="image/jpeg")
)

await session.send_realtime_input(text="...")
```

#### Steps to reproduce
* Start a Live API session with audio output enabled.
* Send microphone audio and video frames in the same session.
* Wait for GoAway / perform session resumption using the latest resumable handle.
* Before reconnect, stop mic and video so there is no active video at the moment of reconnect.
* Reconnect successfully using SessionResumptionConfig(handle=...).
* After reconnect, send only microphone audio.

#### Expected behavior
The resumed session should remain valid after reconnect, and post-resume audio input should continue working regardless of whether the pre-reconnect session had previously included video.

#### Actual behavior
If the session had previously included audio+video, the resumed session later fails with an invalid-argument style error/close (1007), even though:

* the reconnect itself succeeds
* the resumed session can initially accept input
* the first post-resume voice input may produce a valid response or tool call
* In contrast, if the session only used voice+text before reconnect, resumption works correctly.

#### Key observations
The following patterns were consistent across repeated tests:

* Audio/text-only sessions resume successfully.
* Audio-only after reconnect can work.
* A session that had ever used video+audio before reconnect is much more likely to fail after resume.
* Turning video off before reconnect does not avoid the failure if video+audio had been used earlier in the same session.
* Adding a fixed delay between disconnect and reconnect did not resolve it.
* Changing turn coverage did not resolve it.

#### Mitigations already tried
I already tried all of the following on the client side:

* stopping new realtime input during handoff
* sending audio_stream_end=True before closing the old session
* draining pending input queues before reconnect
* pausing microphone capture during reconnect
* delaying reconnect by 5 seconds
* using TURN_INCLUDES_AUDIO_ACTIVITY_AND_ALL_VIDEO
* ensuring reconnect uses the latest frozen resumable handle
* disabling live input during the reconnect boundary
* asyncio.Lock() to serialise audio, text, and video frame inputs

These changes improved reconnect hygiene but did not resolve the failure pattern above.

#### Why this looks model/service-side instead of client-side
The strongest signal is:

* sessions that only used voice+text resume fine
* sessions that had previously used video+audio later fail after resume, even when video is already off before reconnect

That suggests the resumable handle itself may be valid, but the resumed multimodal session state may become unstable once the prior session history included video+audio.

#### Logs / diagnostics
At reconnect time, the session opens successfully and I receive normal session-resumption updates / reconnect status. The failure occurs only after resumed interaction continues.

Representative symptom:

* reconnect succeeds
* post-resume audio is sent
* later the session closes/fails with a 1007 invalid argument style error
* If helpful, I can provide a sanitized full event/log trace, but I wanted to first confirm whether this is a known limitation or bug in Live API resumption for * * sessions that previously carried both audio and video.

#### Questions
* Is session resumption currently expected to be reliable for sessions that previously used both audio and video?
* Is there any known limitation where resumed sessions should avoid multimodal history or prior video input?
* Does gemini-3.1-flash-live-preview have any current caveat around resuming a session that earlier had combined audio+video traffic?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Live API] Session resumption becomes unstable after prior audio+video use; audio/text-only resumes work #2290

Description

Environment

Minimal config

Steps to reproduce

Expected behavior

Actual behavior

Key observations

Mitigations already tried

Why this looks model/service-side instead of client-side

Logs / diagnostics

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Live API] Session resumption becomes unstable after prior audio+video use; audio/text-only resumes work #2290

Description

Description

Environment

Minimal config

Steps to reproduce

Expected behavior

Actual behavior

Key observations

Mitigations already tried

Why this looks model/service-side instead of client-side

Logs / diagnostics

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions