Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Capitalization and Punctuation in Transcribed Audios #815

Open
RomanLeo2003 opened this issue Apr 29, 2024 · 0 comments
Open

Comments

@RomanLeo2003
Copy link

After transcribing several audio files using medium model, I have noticed that the transcriptions lack capitalization and punctuation. For example:

Transcribed text with punctuation and capitalization: "Produces, for example, a Renault headlight. They say, yes, yes, we produce it."

Transcribed text without punctuation and capitalization: "produces for example a renault headlight they say yes yes we produce it"

I suspect that this issue might be due to some accumulated cache in the model (or something similar). This problem seems to occur with certain types of content, but I am not sure. BTW, sometimes the problem fixes itself after a few minutes of audio. Therefore, my questions are:

Why does this happen?
How can I fix it?

I use this configuration of parameters:

vad_parameters = {
    'threshold': 0.5,
    'min_speech_duration_ms': 400,
    'max_speech_duration_s': float("inf"),
    'min_silence_duration_ms': 400,
    'window_size_samples': 1024,
    'speech_pad_ms': 750
}

hallucination_silence_threshold = 0.8
model_size = "medium"
compute_type = "float16"
beam_size = 8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant