Skip to content

Vertex AI Gemini generateContent w/ audio input doesn't work #14745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
BoyuanLong opened this issue Apr 21, 2025 · 4 comments · Fixed by #14747
Closed

Vertex AI Gemini generateContent w/ audio input doesn't work #14745

BoyuanLong opened this issue Apr 21, 2025 · 4 comments · Fixed by #14747
Assignees

Comments

@BoyuanLong
Copy link

Description

Hi,

I'm using the Vertex AI generateContent API to call Gemini (2.0 flash/flash-lite etc) with audio input, and sometimes it doesn't work. It throws internalError that is hard to fix with application code.

My code is something like this:

    var parts: [PartsRepresentable] = []
    parts.append(InlineDataPart(audio, "audio/mp3": mimeType))
    ...
    try await model.generateContent(prompt, parts)

And it will throw something like this most of the time:

internalError(underlying: Swift.DecodingError.keyNotFound(CodingKeys(stringValue: "tokenCount", intValue: nil), Swift.DecodingError.Context(codingPath: [CodingKeys(stringValue: "usageMetadata", intValue: nil), CodingKeys(stringValue: "promptTokensDetails", intValue: nil), _CodingKey(stringValue: "Index 0", intValue: 0)], debugDescription: "No value associated with key CodingKeys(stringValue: \"tokenCount\", intValue: nil) (\"tokenCount\").", underlyingError: nil)))

I believe there's some issue in how the server counts audio tokens, and this field will be missing some of the time. So the tokenCount in audio modality will be missing, and thus the error.

Thank you for taking a look, and please let me know if there's anything we could do as a short term fix :)

Reproducing the issue

No response

Firebase SDK Version

11.11

Xcode Version

16.3

Installation Method

Swift Package Manager

Firebase Product(s)

VertexAI

Targeted Platforms

iOS

Relevant Log Output

11.11.0 - [FirebaseVertexAI][I-VTX003000] JSON response: {
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "{\n  \"has_animal\": true,\n  \"thoughts\": \"Dog: 'Ugh, more human legs in my face. A dog deserves a better view. 🙄'\"\n}"
          }
        ]
      },
      "finishReason": "STOP",
      "avgLogprobs": -0.56113096383901739
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 2997,
    "candidatesTokenCount": 39,
    "totalTokenCount": 3036,
    "trafficType": "ON_DEMAND",
    "promptTokensDetails": [
      {
        "modality": "AUDIO"
      },
      {
        "modality": "TEXT",
        "tokenCount": 675
      },
      {
        "modality": "IMAGE",
        "tokenCount": 2322
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 39
      }
    ]
  },
  "modelVersion": "gemini-2.0-flash",
  "createTime": "2025-04-21T05:23:14.199357Z",
  "responseId": "QtYFaL2VDImThMIPgdDpqAQ"
}

If using Swift Package Manager, the project's Package.resolved

Expand Package.resolved snippet
Replace this line with the contents of your Package.resolved.

If using CocoaPods, the project's Podfile.lock

Expand Podfile.lock snippet
Replace this line with the contents of your Podfile.lock!
@google-oss-bot
Copy link

I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.

@andrewheard
Copy link
Contributor

Hi @BoyuanLong, thank you for the detailed report. This does look like a backend issue that we should handle in the SDKs until they fix it. Do you happen to have an MP3 file that you could share that frequently results in this issue? If not, I'll try some of my own.

@BoyuanLong
Copy link
Author

BoyuanLong commented Apr 21, 2025

Thank you for the quick response.

Re: mp3 file

We can try to find some in https://file-examples.com/index.php/sample-audio-files/sample-mp3-download/.
Instruction: Find one file -> [Download Sample MP3 File] (it'll direct you to a website with an audio file)-> Right click & Save As to download

@Blickwinkel1107
Copy link

Hi. The issue still persist. I'm using the latest package 11.12.0. However I can still see that response json missing tokenCount field.
Example output:

11.12.0 - [FirebaseVertexAI][I-VTX003000] JSON response: {
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "{\n  \"has_animal\": false,\n  \"pets\": []\n}"
          }
        ]
      },
      "finishReason": "STOP",
      "avgLogprobs": -1.809538419668873e-05
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 3008,
    "candidatesTokenCount": 18,
    "totalTokenCount": 3026,
    "trafficType": "ON_DEMAND",
    "promptTokensDetails": [
      {
        "modality": "IMAGE",
        "tokenCount": 2322
      },
      {
        "modality": "AUDIO"
      },
      {
        "modality": "TEXT",
        "tokenCount": 686
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 18
      }
    ]
  },
  "modelVersion": "gemini-2.0-flash-lite",
  "createTime": "2025-05-20T04:50:27.060206Z",
  "responseId": "EwosaK7WA_-MhMIPmdvrOQ"
}

Error message:

Error sending to Gemini: internalError(underlying: Swift.DecodingError.keyNotFound(CodingKeys(stringValue: "tokenCount", intValue: nil), Swift.DecodingError.Context(codingPath: [CodingKeys(stringValue: "usageMetadata", intValue: nil), CodingKeys(stringValue: "promptTokensDetails", intValue: nil), _CodingKey(stringValue: "Index 1", intValue: 1)], debugDescription: "No value associated with key CodingKeys(stringValue: \"tokenCount\", intValue: nil) (\"tokenCount\").", underlyingError: nil)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy