Transcription API accuracy benchmark: 95.1% accuracy rate (No.1)

INSIDE SALAD

Salad Transcription API Accuracy Benchmark: How it outperforms Deepgram, Assembly AI & AWS

Published: March 27, 2025

SaladCloud

Salad-Transcription-API-accuracy-benchmark

Benchmarking Salad Transcription APIs: Salad Transcription and Transcription Lite

We recently completed extensive accuracy benchmarks comparing our two transcription APIs – Salad Transcription API and Transcription Lite. Our goal was to measure and compare their accuracy across multiple languages using widely recognized, publicly available datasets and also compare their accuracy against existing transcription solutions. In this blog, we break down the methodology, workflow and results from our Transcription accuracy benchmark. For users interested in recreating the benchmark, we also provide publicly available scripts to recreate the benchmark and test the accuracy results.

Overview of our AI Transcription APIs

At Salad, we provide two AI transcription APIs offering different features and capabilities to the market.

Our two main APIs for Speech-to-text transcription are:

Salad Transcription API: Delivers the No.1 transcription accuracy in market for the lowest cost ($0.16/hour). This API includes all the standard transcription features such as speaker identification, timestamps, captions, but also includes more comprehensive LLM driven features such as summarization, multilingual translations, and insights analytics such as sentiment analysis and classification.
Transcription Lite: Offers quicker, lower-latency transcription with standard accuracy and includes essential features like timestamps, speaker diarization, and captions. Pricing starts from $0.03 per hour, again the lowest cost in the industry compared to APIs with comparable features and accuracy.

For more information about all the features, check out our documentation.

Transcription accuracy benchmarking methodology

Accuracy is often the most critical factor when evaluating transcription services, particularly for professional applications. To fairly assess our services, we adopted a benchmarking approach similar to what AssemblyAI used, utilizing publicly available datasets. We initially focused on English-language datasets already processed by Assembly AI to get direct comparisons to their results.

Datasets Used

We selected three datasets for our benchmarks:

CommonVoice: An extensive, crowdsourced multilingual database of datasets provided by Mozilla. We used Common Voice Corpus 5.1 featuring over 1 million validated audio files in English which is over 1,500 hours of speech.
Meanwhile Dataset: Consisting of 64 segments from “The Late Show with Stephen Colbert,” published as part of OpenAI’s Whisper release. Dataset Details
TED-LIUM Dataset: A collection of English-language TED talk recordings. Dataset Details. Note: We excluded segments without audible speech to ensure accuracy.

Workflow

Our benchmarking process included:

Audio Preprocessing: Audio samples were uploaded to Salad S4 storage.

Transcription: Audio files were transcribed using both the Salad Transcription API and Transcription Lite.

Normalization: Both the predicted transcripts and the ground truth were normalized using the open-source Whisper Normalizer to ensure consistency by standardizing punctuation, capitalization, and formatting. Normalization ensures that minor formatting differences do not affect accuracy results.

Below are examples of how transcripts were adjusted:

Original:

Truth: “everybody talks about happiness these days”
Result: ” Everybody talks about happiness these days.”

After Normalization:

Truth: “everybody talks about happiness these days”
Result: “everybody talks about happiness these days”

Original:

Truth: “i had somebody count the number of books with happiness in the title published in the last five years”
Result: ” I had somebody count the number of books with happiness in the title published in the last five years.”

After Normalization:

Truth: “i had somebody count the number of books with happiness in the title published in the last 5 years”
Result: “i had somebody count the number of books with happiness in the title published in the last 5 years”

Accuracy Evaluation: We calculated Word Error Rate (WER) for each file, using the JiWER library, to objectively compare transcription accuracy across datasets. The average WER was then determined for each dataset. You can find all our benchmarking scripts here: https://github.com/SaladTechnologies/salad-transcription-accuracy-benchmarks

Here is an example script:

 {
        "truth": "The other man, dressed casually, watches the multicoloured radioactive clouds advance upon them.",
        "result": " The other man, dressed casually, watches the multicolored radioactive clouds advance upon them.",
        "wer": 0.0
    },
    {
        "truth": "The Dutch outnumbered the Spanish army, but were caught off-guard by the Spanish attack.",
        "result": " The Dutch outnumbered the Spanish army but were caught off guard by the Spanish attack.",
        "wer": 0.0
    },
    {
        "truth": "When Alvin was a little boy, he loved to watch Bud Spencer and Terence Hill.",
        "result": " When Alvin was a little boy, he loved to watch Bud Spencer and Terrence Hill.",
        "wer": 0.06666666666666667
    },
    {
        "truth": "Capobianco wrote four novels jointly with William Barton.",
        "result": " Capo Bianco wrote four novels jointly with William Barton.",
        "wer": 0.25
    },
    {
        "truth": "Denise hoovered the rug.",
        "result": " Denise, who was the rug?",
        "wer": 0.5
    },

 {
        "truth": "The other man, dressed casually, watches the multicoloured radioactive clouds advance upon them.",
        "result": " The other man, dressed casually, watches the multicolored radioactive clouds advance upon them.",
        "wer": 0.0
    },
    {
        "truth": "The Dutch outnumbered the Spanish army, but were caught off-guard by the Spanish attack.",
        "result": " The Dutch outnumbered the Spanish army but were caught off guard by the Spanish attack.",
        "wer": 0.0
    },
    {
        "truth": "When Alvin was a little boy, he loved to watch Bud Spencer and Terence Hill.",
        "result": " When Alvin was a little boy, he loved to watch Bud Spencer and Terrence Hill.",
        "wer": 0.06666666666666667
    },
    {
        "truth": "Capobianco wrote four novels jointly with William Barton.",
        "result": " Capo Bianco wrote four novels jointly with William Barton.",
        "wer": 0.25
    },
    {
        "truth": "Denise hoovered the rug.",
        "result": " Denise, who was the rug?",
        "wer": 0.5
    },

Benchmark results: Word Error Rate (WER) for English

Dataset	Salad Transcription API	Salad Transcription Lite API	AssemblyAI Universal	Amazon Transcribe	Google Latest-long	Microsoft Azure Batch v3.1	Deepgram Nova 2	OpenAI Whisper
Common Voice	4.90%	18.70%	6.67%	8.98%	17.59%	7.81%	12.43%	8.83%
Meanwhile	4.30%	16.70%	4.77%	7.27%	11.67%	6.73%	5.56%	9.75%
TED-LIUM	4.20%	8.20%	7.21%	9.12%	11.69%	9.27%	8.98%	7.30%

Salad’s Transcription API is cost-effective and accurate. Meet with our transcription team today.

Our benchmarks show that the Salad Transcription API consistently delivers the best accuracy in the market compared to other transcription services in the market.

Expanding our benchmarks to more languages

After comparing our transcription APIs against all major competitors, we expanded our benchmarking efforts to include additional datasets and languages. Our goal is to measure performance across all languages and identify areas for further improvement.

The following table presents our latest benchmark results, showing accuracy and Word Error Rate (WER) for Salad Transcription API and Transcription Lite across multiple languages.

Dataset	Sub-dataset	Language	Full API Accuracy	Lite Accuracy	Full API WER	Lite WER
TED-LIUM	tedlium	English	95.8%	91.8%	4.2%	8.2%
Meanwhile	Meanwhile	English	95.7%	83.3%	4.3%	16.7%
CommonVoice	cv-corpus-5.1-2020-06-22	English	95.1%	81.3%	4.9%	18.7%
CommonVoice	cv-corpus-20.0-delta-2024-12-06	English	93.1%	78.1%	6.9%	21.9%
CommonVoice	cv-corpus-8.0-2022-01-19	Portugese	92%	55%	8%	45%
CommonVoice	cv-corpus-10.0-delta-2022-07-04	French	92%	54.3%	8%	45.7%
CommonVoice	cv-corpus-12.0-delta-2022-12-07	Spanish	94%	58.2%	6%	42.8%
CommonVoice	cv-corpus-14.0-delta-2023-06-23	Spanish	96.8%	79.5%	3.2%	20.5%
CommonVoice	cv-corpus-16.1-delta-2023-12-06	Spanish	95.7%	70.9%	4.3%	29.1%
CommonVoice	cv-corpus-13.0-delta-2023-03-09	German	96.3%	71.1%	3.7%	28.9%
CommonVoice	cv-corpus-20.0-2024-12-06	Hindi	84%	0% (translates to Eng)	16%	100%
CommonVoice		Italian	93.3%	54%	6.7%	46%
CommonVoice		Russian	96.4%	60%	3.6%	40%
CommonVoice	cv-corpus-17.0-2024-03-15	Hebrew	84.2%	12%	15.8%	88%
CommonVoice	cv-corpus-19.0-2024-09-13	Kazakh	51%	0%	49%	100%
CommonVoice	cv-corpus-9.0-2022-04-27	Urdu	78.8%	8.3%	21.2%	91.7%

Salad Transcription API performs exceptionally well in English and major European languages, achieving high accuracy in: English, Spanish, German, French, Portuguese, Italian, and Russian.

However, there is room for improvement in certain languages, particularly in: Thai, Kazakh, Hebrew, Hindi and Urdu. Transcription Lite currently performs well in English as the base language, as it’s optimized for speed.

Industry-Leading Pricing

While accuracy is a very important factor in choosing a transcription service, cost is just as important especially for large-scale applications. Salad’s Transcription APIs are not only among the most accurate but also the most affordable APIs compared to competitors.

Pricing Breakdown

Salad Transcription API: Just $0.16 per audio hour
Transcription Lite: Just $0.03 per audio hour

This makes Salad Transcription API the cheapest high-accuracy solution on the market, and Transcription Lite one of the most cost-effective, close to real-time transcription services available.

Key Takeaways from Our Benchmarks

Our benchmarking process, comparing Salad Transcription API and Salad Transcription Lite against major transcription providers and across multiple languages, has revealed several insights:

1. Leading accuracy in Transcription

Salad Transcription API consistently outperformed other transcription providers, achieving the lowest Word Error Rate (WER) across several English datasets tested.
In European languages such as Spanish, German, French, Portuguese, and Italian, our model also maintained accuracy levels above 90%.

2. Challenges in low-resource languages

Some languages, particularly Hindi, Kazakh, Thai, and Hebrew, had lower accuracy, highlighting areas where further improvements are needed.

3. Transcription Lite accuracy vs speed

While Transcription Lite offers near real-time transcription, its accuracy is lower compared to Salad Transcription API, particularly for non-English languages.
It remains a great option for English language for users needing fast, timestamped speech-to-text processing at a lower cost.

Next Steps

Expanding our dataset coverage to include more datasets and languages.

Improving transcription for non-English languages, particularly in low-resource languages.

We will continue updating our benchmarks and improving our transcription models to provide the best value, accuracy, and performance in the market. Stay tuned for more updates!

Schedule a call with our expert transcription team.

SaladCloud

SaladCloud is the world’s largest distributed cloud computing network with 11,000+ daily GPUs and 450,000 GPUs contributing compute, all at the lowest cost in the market.

Have questions about enterprise pricing for SaladCloud?

Salad Transcription API Accuracy Benchmark: How it outperforms Deepgram, Assembly AI & AWS

SaladCloud

Benchmarking Salad Transcription APIs: Salad Transcription and Transcription Lite

Overview of our AI Transcription APIs

Transcription accuracy benchmarking methodology

Datasets Used

Workflow

Benchmark results: Word Error Rate (WER) for English

Expanding our benchmarks to more languages

Industry-Leading Pricing

Pricing Breakdown

Key Takeaways from Our Benchmarks

Next Steps

Book a 15 min call with our team.

Related Blog Posts

Salad x Render Network: Milestones 1 and 2 Are Live

RNP-023 Approved: Salad Is Joining the Render Network

Use Cline with SaladCloud: Building Real Apps for Under $0.01

Subscribe To SaladCloud Newsletter & Stay Updated.