segment_id	system_id	doc_id	context	source	target	src_lang	tgt_lang	category	severity	annotator_id	correction	comment	timestamp
52eb12ed-1509-4114-aeb6-09f9ecb3f67a	translategemma-12b	case-study-asr		Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty	Huab chaws kev siv lus hauv kev ntsuas lus hauv ntau yam lus: Ib txoj kev <v>siv peev xwm uas muaj tus nqi qis</v>, uas tuaj yeem kho kom haum rau qhov nyuaj ntawm kev ntsuas lus.	en	hmn	Accuracy/Mistranslation	Major		Xaiv lus rau ASR ua siv rau ntau hom lus: Ib txoj kev pheej yig ua pauv raws seb txoj kev ntsuas lus nyuaj npaum cas.	unnatural use of words	2026-02-01T00:02:37.119Z
52eb12ed-1509-4114-aeb6-09f9ecb3f67a	translategemma-12b	case-study-asr		Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty	<v>Huab chaws </v>kev siv lus hauv kev ntsuas lus hauv ntau yam lus: Ib txoj kev siv peev xwm uas muaj tus nqi qis, uas tuaj yeem kho kom haum rau qhov nyuaj ntawm kev ntsuas lus.	en	hmn	Accuracy/Mistranslation	Critical		Xaiv lus rau ASR ua siv rau ntau hom lus: Ib txoj kev pheej yig ua pauv raws seb txoj kev ntsuas lus nyuaj npaum cas.	These 2 words together mean nothing intelligible	2026-02-01T00:05:42.318Z
52eb12ed-1509-4114-aeb6-09f9ecb3f67a	translategemma-12b	case-study-asr		Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty	Huab chaws kev siv lus hauv kev ntsuas lus hauv ntau yam lus: Ib txoj kev siv peev xwm uas muaj tus nqi qis, uas <v>tuaj yeem</v> kho kom haum rau qhov nyuaj ntawm kev ntsuas lus.	en	hmn	Accuracy/Addition	Major		Xaiv lus rau ASR ua siv rau ntau hom lus: Ib txoj kev pheej yig ua pauv raws seb txoj kev ntsuas lus nyuaj npaum cas.	Unnecessary	2026-02-01T00:08:25.781Z
52eb12ed-1509-4114-aeb6-09f9ecb3f67a	translategemma-12b	case-study-asr		Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty	Huab chaws kev siv lus hauv kev ntsuas lus hauv <v>ntau yam lus</v>: Ib txoj kev siv peev xwm uas muaj tus nqi qis, uas tuaj yeem kho kom haum rau qhov nyuaj ntawm kev ntsuas lus.	en	hmn	Fluency/Inconsistency	Minor		Xaiv lus rau ASR ua siv rau ntau hom lus: Ib txoj kev pheej yig ua pauv raws seb txoj kev ntsuas lus nyuaj npaum cas.	Awkward use of this phrase for "multilingual"	2026-02-01T00:10:50.456Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov <v>qhia lus uas siv tshuaj yeeb </v>(automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	en	hmn	Accuracy/Mistranslation	Critical		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	Completely wrong translation	2026-02-01T00:46:46.464Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas <v>tuaj yeem qhia ntau yam lus</v> tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	en	hmn	Accuracy/Mistranslation	Major		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	result of literal English to Hmong translation that completely disregards actual meaning of these words in Hmong	2026-02-01T00:54:50.066Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo <v>ntau</v>, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	en	hmn	Accuracy/Addition	Major		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	unnecessary 	2026-02-01T00:55:15.956Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv <v>tuaj yeem qhia ntau yam lus</v> siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	en	hmn	Accuracy/Addition	Critical		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	Repeat of error 2 	2026-02-01T00:56:19.599Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib <v>qho</v> qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	en	hmn	Fluency/Grammar	Major		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	wrong use of the article "qho"	2026-02-01T00:58:03.603Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li <v>qhia</v> hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	en	hmn	Terminology	Minor		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	Not fitting for use with "Duab"	2026-02-01T00:59:15.668Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). <v>Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe</v> (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	en	hmn	Accuracy/Mistranslation	Critical		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	This is pretty much gibberish resulting from an attempt at direct English to Hmong translation for each word. No Hmong person reading these words will understand what idea is being conveyed.	2026-02-01T01:02:25.028Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. <v>Ua ib qho piv txwv</v>, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	en	hmn	Fluency/Inconsistency	Minor		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	Not the typical way to say "for example" in Hmong.	2026-02-01T01:04:20.017Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm <v>ntawv los ntawm</v> 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	en	hmn	Accuracy/Addition	Major		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	Unnecessary	2026-02-01T01:05:00.285Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 <v>teej</v> ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	en	hmn	Accuracy/Mistranslation	Critical		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	Wrong word for "hour"	2026-02-01T01:05:39.358Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, <v>uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus</v>. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	en	hmn	Fluency/Grammar	Critical		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	Again, this is a direct English to Hmong attempt that fails to convey the idea. It also reads like a run-on sentence in Hmong. Too many ideas trying to be captured in one sentence.	2026-02-01T01:10:58.876Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov <v>tshuaj yeeb</v> ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	en	hmn	Accuracy/Mistranslation	Critical		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	Wrong word for "advances"	2026-02-01T01:14:19.274Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov <v>tseem ceeb</v>. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	en	hmn	Accuracy/Mistranslation	Critical		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	Wrong word for "challenges"	2026-02-01T01:15:30.348Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov <v>kev txawj ntseeg</v> sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	en	hmn	Accuracy/Mistranslation	Critical		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.		2026-02-01T01:16:15.533Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov <v>qauv lus </v>sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	en	hmn	Accuracy/Mistranslation	Critical		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.		2026-02-01T01:16:44.796Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj <v>kev ua haujlwm zoo tshwj xeeb</v> rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	en	hmn	Fluency/Inconsistency	Major		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	Unnatural use of words for "state of the art"	2026-02-01T01:18:48.470Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv <v>rau</v>, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	en	hmn	Fluency/Grammar	Critical		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	Wrong word to use with the previous word to mean "Moreover"	2026-02-01T01:19:49.033Z
bd23f9e4-a925-49af-8228-f16c3661aca1	translategemma-12b	case-study-asr		Multilingual automatic speech recognition (ASR) models have gained significant attention for their ability to recognize multiple languages using a single model [1, 2, 3, 4], as illustrated in Figure 1(a). Recent advances have led to impressive performance in various languages through large-scale supervised or self-supervised pre-training [3, 5, 6, 7, 8, 9, 10, 11, 12]. For example, Whisper [6] is trained on 680,000 hours of weakly multilingual data, enabling it to generalize effectively across standard ASR benchmarks, while USM [9] leverages 12 million hours of unlabeled data to achieve robust cross-lingual performance. Despite these advances, the application of multilingual ASR systems with a single model still faces significant challenges. Phonetic differences, syntactic variations, and vocabulary disparities across languages make it difficult to achieve consistent universal state-of-the-art (SOTA) performance. Moreover, imbalances in training data between high-resource and low-resource languages further limit the single-model solutions.	Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, <v>qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua</v>.	en	hmn	Accuracy/Mistranslation	Critical		Cov qhia lus uas siv tshuaj yeeb (automatic speech recognition, ASR) uas tuaj yeem qhia ntau yam lus tau txais kev pom zoo ntau, vim tias lawv tuaj yeem qhia ntau yam lus siv ib qho qauv xwb [1, 2, 3, 4], raws li qhia hauv Duab 1(a). Cov kev txhim kho tshiab no tau ua kom muaj kev ua haujlwm zoo ntau hauv ntau yam lus los ntawm kev kawm ntawv loj ntau los ntawm kev kawm ntawv uas muaj kev saub npe (supervised) lossis kev kawm ntawv uas tsis muaj kev saub npe (self-supervised) [3, 5, 6, 7, 8, 9, 10, 11, 12]. Ua ib qho piv txwv, Whisper [6] tau kawm ntawv los ntawm 680,000 teej ntawm cov ntaub ntawv uas muaj ntau yam lus, uas ua rau nws tuaj yeem siv tau zoo hauv ntau yam kev sim ASR, thaum USM [9] siv 12 lab teej ntawm cov ntaub ntawv uas tsis muaj kev saub npe los ua kom muaj kev ua haujlwm zoo hauv ntau yam lus. Txawm hais tias muaj cov kev txhim kho no, kev siv cov tshuaj yeeb ASR uas muaj ntau yam lus thiab siv ib qho qauv xwb tseem muaj qhov tseem ceeb. Cov kev txawj ntseeg sib txawv, cov qauv lus sib txawv, thiab cov lus siv sib txawv hauv ntau yam lus ua rau nws nyuaj los ua kom muaj kev ua haujlwm zoo tshwj xeeb rau txhua yam lus. Ntxiv rau, qhov sib txawv ntawm cov ntaub ntawv kawm ntawv ntawm cov lus uas muaj ntau cov ntaub ntawv thiab cov lus uas muaj me ntsis cov ntaub ntawv ntxiv ua rau qhov kev daws teb siv ib qho qauv xwb nyuaj dua.	Long chain of gibberish resulting from direct English to Hmong translation. No Hmong person would be able to understand what idea is being conveyed here.	2026-02-01T01:21:33.882Z
2ae4990b-6ac9-4b80-9737-3cc8227c6a19	translategemma-12b	case-study-asr		A common strategy to address these challenges is to use a language identification (LID) model that first detects the language of the input speech before invoking the corresponding SOTA ASR model for transcription, as shown in Figure 1(b). However, this two-stage approach has its drawbacks. Many SOTA models are commercial [12] and incur usage fees based on the volume of processing, making this method costly. Additionally, an incorrect LID prediction may trigger the wrong model, further affecting the user experience [13].	Ib txoj kev uas feem ntau siv los pab cov teeb meem no yog siv ib qho kev <v>kos</v> lus (language identification, LID) kom pom tias lus twg tau hais ua ntej, ces siv qhov kos lus uas zoo tshaj plaws (state-of-the-art, SOTA) kom sau lus, raws li qhia hauv Duab 1(b). Tab sis, txoj kev no muaj qee qhov tsis zoo. Ntau qhov kos lus zoo tshaj plaws yog khoom lag luam [12] thiab muaj tus nqi siv, raws li qhov ntau npaum li cas koj siv, uas ua rau txoj kev no tsis muaj txiaj. Tsis tas li ntawd, yog hais tias LID pom lus tsis raug, nws yuav ua rau siv qhov kos lus tsis raug, uas yuav ua rau tsis zoo rau cov neeg siv.	en	hmn	Accuracy/Mistranslation	Critical			wrong word for "identification"	2026-02-01T01:23:07.153Z
2ae4990b-6ac9-4b80-9737-3cc8227c6a19	translategemma-12b	case-study-asr		A common strategy to address these challenges is to use a language identification (LID) model that first detects the language of the input speech before invoking the corresponding SOTA ASR model for transcription, as shown in Figure 1(b). However, this two-stage approach has its drawbacks. Many SOTA models are commercial [12] and incur usage fees based on the volume of processing, making this method costly. Additionally, an incorrect LID prediction may trigger the wrong model, further affecting the user experience [13].	Ib txoj kev uas feem ntau siv los pab cov teeb meem no yog siv ib qho kev kos lus (language identification, LID) kom pom tias lus twg tau hais ua ntej, ces siv qhov kos lus uas zoo tshaj plaws (state-of-the-art, SOTA) kom sau lus, raws li <v>qhia</v> hauv Duab 1(b). Tab sis, txoj kev no muaj qee qhov tsis zoo. Ntau qhov kos lus zoo tshaj plaws yog khoom lag luam [12] thiab muaj tus nqi siv, raws li qhov ntau npaum li cas koj siv, uas ua rau txoj kev no tsis muaj txiaj. Tsis tas li ntawd, yog hais tias LID pom lus tsis raug, nws yuav ua rau siv qhov kos lus tsis raug, uas yuav ua rau tsis zoo rau cov neeg siv.	en	hmn	Fluency/Inconsistency	Minor			Not the appropriate verb to use with "Duab"	2026-02-01T01:25:28.380Z
2ae4990b-6ac9-4b80-9737-3cc8227c6a19	translategemma-12b	case-study-asr		A common strategy to address these challenges is to use a language identification (LID) model that first detects the language of the input speech before invoking the corresponding SOTA ASR model for transcription, as shown in Figure 1(b). However, this two-stage approach has its drawbacks. Many SOTA models are commercial [12] and incur usage fees based on the volume of processing, making this method costly. Additionally, an incorrect LID prediction may trigger the wrong model, further affecting the user experience [13].	Ib txoj kev uas feem ntau siv los pab cov teeb meem no yog siv ib qho kev kos lus (language identification, LID) kom pom tias lus twg tau hais ua ntej, <v>ces siv qhov kos lus uas zoo tshaj plaws (state-of-the-art, SOTA) kom sau lus</v>, raws li qhia hauv Duab 1(b). Tab sis, txoj kev no muaj qee qhov tsis zoo. Ntau qhov kos lus zoo tshaj plaws yog khoom lag luam [12] thiab muaj tus nqi siv, raws li qhov ntau npaum li cas koj siv, uas ua rau txoj kev no tsis muaj txiaj. Tsis tas li ntawd, yog hais tias LID pom lus tsis raug, nws yuav ua rau siv qhov kos lus tsis raug, uas yuav ua rau tsis zoo rau cov neeg siv.	en	hmn	Accuracy/Mistranslation	Critical			Inaccurate translation of what's trying to be conveyed in the English text	2026-02-01T01:26:55.552Z
2ae4990b-6ac9-4b80-9737-3cc8227c6a19	translategemma-12b	case-study-asr		A common strategy to address these challenges is to use a language identification (LID) model that first detects the language of the input speech before invoking the corresponding SOTA ASR model for transcription, as shown in Figure 1(b). However, this two-stage approach has its drawbacks. Many SOTA models are commercial [12] and incur usage fees based on the volume of processing, making this method costly. Additionally, an incorrect LID prediction may trigger the wrong model, further affecting the user experience [13].	Ib txoj kev uas feem ntau siv los pab cov teeb meem no yog siv ib qho kev kos lus (language identification, LID) kom pom tias lus twg tau hais ua ntej, ces siv qhov kos lus uas zoo tshaj plaws (state-of-the-art, SOTA) kom sau lus, raws li qhia hauv Duab 1(b). Tab sis, txoj kev no muaj qee qhov tsis zoo. Ntau qhov <v>kos lus zoo tshaj plaws </v>yog khoom lag luam [12] thiab muaj tus nqi siv, raws li qhov ntau npaum li cas koj siv, uas ua rau txoj kev no tsis muaj txiaj. Tsis tas li ntawd, yog hais tias LID pom lus tsis raug, nws yuav ua rau siv qhov kos lus tsis raug, uas yuav ua rau tsis zoo rau cov neeg siv.	en	hmn	Accuracy/Mistranslation	Critical			Inaccurate translation for "SOTA models"	2026-02-01T01:31:16.426Z
2ae4990b-6ac9-4b80-9737-3cc8227c6a19	translategemma-12b	case-study-asr		A common strategy to address these challenges is to use a language identification (LID) model that first detects the language of the input speech before invoking the corresponding SOTA ASR model for transcription, as shown in Figure 1(b). However, this two-stage approach has its drawbacks. Many SOTA models are commercial [12] and incur usage fees based on the volume of processing, making this method costly. Additionally, an incorrect LID prediction may trigger the wrong model, further affecting the user experience [13].	Ib txoj kev uas feem ntau siv los pab cov teeb meem no yog siv ib qho kev kos lus (language identification, LID) kom pom tias lus twg tau hais ua ntej, ces siv qhov kos lus uas zoo tshaj plaws (state-of-the-art, SOTA) kom sau lus, raws li qhia hauv Duab 1(b). Tab sis, txoj kev no muaj qee qhov tsis zoo. Ntau qhov kos lus zoo tshaj plaws yog khoom lag luam [12] thiab muaj tus nqi siv, raws li qhov ntau npaum li cas koj siv, <v>uas ua rau txoj kev no tsis muaj txiaj</v>. Tsis tas li ntawd, yog hais tias LID pom lus tsis raug, nws yuav ua rau siv qhov kos lus tsis raug, uas yuav ua rau tsis zoo rau cov neeg siv.	en	hmn	Fluency/Grammar	Critical			When placed at this part of the sentence, the entire sentence no longer makes any sense.	2026-02-01T01:32:32.093Z
2ae4990b-6ac9-4b80-9737-3cc8227c6a19	translategemma-12b	case-study-asr		A common strategy to address these challenges is to use a language identification (LID) model that first detects the language of the input speech before invoking the corresponding SOTA ASR model for transcription, as shown in Figure 1(b). However, this two-stage approach has its drawbacks. Many SOTA models are commercial [12] and incur usage fees based on the volume of processing, making this method costly. Additionally, an incorrect LID prediction may trigger the wrong model, further affecting the user experience [13].	Ib txoj kev uas feem ntau siv los pab cov teeb meem no yog siv ib qho kev kos lus (language identification, LID) kom pom tias lus twg tau hais ua ntej, ces siv qhov kos lus uas zoo tshaj plaws (state-of-the-art, SOTA) kom sau lus, raws li qhia hauv Duab 1(b). Tab sis, txoj kev no muaj qee qhov tsis zoo. Ntau qhov kos lus zoo tshaj plaws yog khoom lag luam [12] thiab muaj tus nqi siv, raws li qhov ntau npaum li cas koj siv, uas ua rau txoj kev no tsis muaj txiaj. Tsis tas li ntawd, yog hais tias LID pom lus tsis raug, nws yuav ua rau siv qhov kos lus tsis raug, <v>uas yuav ua rau tsis zoo rau cov neeg siv</v>.	en	hmn	Fluency/Inconsistency	Critical			This is placed here because the translation is trying to replicate English sentence structures with Hmong. If you try to structure your Hmong sentences like English sentences, the meaning will be lost or extremely confusing.	2026-02-01T01:34:32.859Z
175fd16f-70fc-4d14-a117-0ea4171fd81d	translategemma-12b	case-study-asr		Motivated by these limitations, we propose an alternative strategy that selectively invokes models based on the complexity of the input speech. In ASR tasks, the recognition difficulty varies significantly. Under clean acoustic conditions with simple vocabulary, both the SOTA and regular models typically yield low word error rates (WER). However, in noisy or acoustically challenging environments, the WER increases [14, 15, 16, 17], where robust SOTA models generally perform better [6]. This observation raises a key question: Can we distinguish between simple and complex speech inputs and adapt our ASR system accordingly?	<v>Yog vim lawv cov kev txwv txhim</v>, peb tau pom ib txoj kev tsis zoo uas siv cov qauv sib txawv raws li qhov tseeb ntawm suab uas tau los. Hauv cov haujlwm kuaj suab, qhov nyuaj ntawm kev kuaj suab muaj qhov sib txawv ntau. Thaum tsis muaj suab thiab siv cov lus qab zib me, cov qauv zoo tshaj plaws thiab cov qauv siv tau feem ntau ua kom muaj qhov tsis raug ntawm cov lus (WER) qis. Tab sis, thaum muaj suab thiab nyuaj rau kuaj suab, qhov WER nce [14, 15, 16, 17], thiab cov qauv zoo tshaj plaws uas muaj zog ua tau zoo dua [6]. Qhov no tau coj mus rau ib lo lus nug tseem ceeb: Peb puas tuaj yeem sib txawv ntawm cov suab uas yooj yim thiab nyuaj, thiab siv peb cov tshuab kuaj suab kom raug?	en	hmn	Fluency/Grammar	Major			I understand what's being said but the translation is not natural	2026-02-01T01:37:35.158Z
175fd16f-70fc-4d14-a117-0ea4171fd81d	translategemma-12b	case-study-asr		Motivated by these limitations, we propose an alternative strategy that selectively invokes models based on the complexity of the input speech. In ASR tasks, the recognition difficulty varies significantly. Under clean acoustic conditions with simple vocabulary, both the SOTA and regular models typically yield low word error rates (WER). However, in noisy or acoustically challenging environments, the WER increases [14, 15, 16, 17], where robust SOTA models generally perform better [6]. This observation raises a key question: Can we distinguish between simple and complex speech inputs and adapt our ASR system accordingly?	Yog vim lawv cov kev txwv txhim, peb tau pom ib txoj kev <v>tsis zoo</v> uas siv cov qauv sib txawv raws li qhov tseeb ntawm suab uas tau los. Hauv cov haujlwm kuaj suab, qhov nyuaj ntawm kev kuaj suab muaj qhov sib txawv ntau. Thaum tsis muaj suab thiab siv cov lus qab zib me, cov qauv zoo tshaj plaws thiab cov qauv siv tau feem ntau ua kom muaj qhov tsis raug ntawm cov lus (WER) qis. Tab sis, thaum muaj suab thiab nyuaj rau kuaj suab, qhov WER nce [14, 15, 16, 17], thiab cov qauv zoo tshaj plaws uas muaj zog ua tau zoo dua [6]. Qhov no tau coj mus rau ib lo lus nug tseem ceeb: Peb puas tuaj yeem sib txawv ntawm cov suab uas yooj yim thiab nyuaj, thiab siv peb cov tshuab kuaj suab kom raug?	en	hmn	Accuracy/Mistranslation	Critical			means "not good", instead of "alternative"	2026-02-01T01:39:28.304Z
175fd16f-70fc-4d14-a117-0ea4171fd81d	translategemma-12b	case-study-asr		Motivated by these limitations, we propose an alternative strategy that selectively invokes models based on the complexity of the input speech. In ASR tasks, the recognition difficulty varies significantly. Under clean acoustic conditions with simple vocabulary, both the SOTA and regular models typically yield low word error rates (WER). However, in noisy or acoustically challenging environments, the WER increases [14, 15, 16, 17], where robust SOTA models generally perform better [6]. This observation raises a key question: Can we distinguish between simple and complex speech inputs and adapt our ASR system accordingly?	Yog vim lawv cov kev txwv txhim, peb tau pom ib txoj kev tsis zoo uas siv cov qauv <v>sib txawv raws li qhov tseeb ntawm suab uas tau los</v>. Hauv cov haujlwm kuaj suab, qhov nyuaj ntawm kev kuaj suab muaj qhov sib txawv ntau. Thaum tsis muaj suab thiab siv cov lus qab zib me, cov qauv zoo tshaj plaws thiab cov qauv siv tau feem ntau ua kom muaj qhov tsis raug ntawm cov lus (WER) qis. Tab sis, thaum muaj suab thiab nyuaj rau kuaj suab, qhov WER nce [14, 15, 16, 17], thiab cov qauv zoo tshaj plaws uas muaj zog ua tau zoo dua [6]. Qhov no tau coj mus rau ib lo lus nug tseem ceeb: Peb puas tuaj yeem sib txawv ntawm cov suab uas yooj yim thiab nyuaj, thiab siv peb cov tshuab kuaj suab kom raug?	en	hmn	Accuracy/Mistranslation	Critical			gibberish	2026-02-01T01:40:17.152Z
175fd16f-70fc-4d14-a117-0ea4171fd81d	translategemma-12b	case-study-asr		Motivated by these limitations, we propose an alternative strategy that selectively invokes models based on the complexity of the input speech. In ASR tasks, the recognition difficulty varies significantly. Under clean acoustic conditions with simple vocabulary, both the SOTA and regular models typically yield low word error rates (WER). However, in noisy or acoustically challenging environments, the WER increases [14, 15, 16, 17], where robust SOTA models generally perform better [6]. This observation raises a key question: Can we distinguish between simple and complex speech inputs and adapt our ASR system accordingly?	Yog vim lawv cov kev txwv txhim, peb tau pom ib txoj kev tsis zoo uas siv cov qauv sib txawv raws li qhov tseeb ntawm suab uas tau los. Hauv cov haujlwm <v>kuaj suab</v>, qhov nyuaj ntawm kev kuaj suab muaj qhov sib txawv ntau. Thaum tsis muaj suab thiab siv cov lus qab zib me, cov qauv zoo tshaj plaws thiab cov qauv siv tau feem ntau ua kom muaj qhov tsis raug ntawm cov lus (WER) qis. Tab sis, thaum muaj suab thiab nyuaj rau kuaj suab, qhov WER nce [14, 15, 16, 17], thiab cov qauv zoo tshaj plaws uas muaj zog ua tau zoo dua [6]. Qhov no tau coj mus rau ib lo lus nug tseem ceeb: Peb puas tuaj yeem sib txawv ntawm cov suab uas yooj yim thiab nyuaj, thiab siv peb cov tshuab kuaj suab kom raug?	en	hmn	Accuracy/Mistranslation	Critical				2026-02-01T01:41:57.464Z
175fd16f-70fc-4d14-a117-0ea4171fd81d	translategemma-12b	case-study-asr		Motivated by these limitations, we propose an alternative strategy that selectively invokes models based on the complexity of the input speech. In ASR tasks, the recognition difficulty varies significantly. Under clean acoustic conditions with simple vocabulary, both the SOTA and regular models typically yield low word error rates (WER). However, in noisy or acoustically challenging environments, the WER increases [14, 15, 16, 17], where robust SOTA models generally perform better [6]. This observation raises a key question: Can we distinguish between simple and complex speech inputs and adapt our ASR system accordingly?	Yog vim lawv cov kev txwv txhim, peb tau pom ib txoj kev tsis zoo uas siv cov qauv sib txawv raws li qhov tseeb ntawm suab uas tau los. Hauv cov haujlwm kuaj suab, <v>qhov nyuaj ntawm kev kuaj suab muaj qhov sib txawv ntau</v>. Thaum tsis muaj suab thiab siv cov lus qab zib me, cov qauv zoo tshaj plaws thiab cov qauv siv tau feem ntau ua kom muaj qhov tsis raug ntawm cov lus (WER) qis. Tab sis, thaum muaj suab thiab nyuaj rau kuaj suab, qhov WER nce [14, 15, 16, 17], thiab cov qauv zoo tshaj plaws uas muaj zog ua tau zoo dua [6]. Qhov no tau coj mus rau ib lo lus nug tseem ceeb: Peb puas tuaj yeem sib txawv ntawm cov suab uas yooj yim thiab nyuaj, thiab siv peb cov tshuab kuaj suab kom raug?	en	hmn	Accuracy/Mistranslation	Critical			gibberish 	2026-02-01T01:42:47.232Z
175fd16f-70fc-4d14-a117-0ea4171fd81d	translategemma-12b	case-study-asr		Motivated by these limitations, we propose an alternative strategy that selectively invokes models based on the complexity of the input speech. In ASR tasks, the recognition difficulty varies significantly. Under clean acoustic conditions with simple vocabulary, both the SOTA and regular models typically yield low word error rates (WER). However, in noisy or acoustically challenging environments, the WER increases [14, 15, 16, 17], where robust SOTA models generally perform better [6]. This observation raises a key question: Can we distinguish between simple and complex speech inputs and adapt our ASR system accordingly?	Yog vim lawv cov kev txwv txhim, peb tau pom ib txoj kev tsis zoo uas siv cov qauv sib txawv raws li qhov tseeb ntawm suab uas tau los. Hauv cov haujlwm kuaj suab, qhov nyuaj ntawm kev kuaj suab muaj qhov sib txawv ntau. Thaum tsis muaj suab thiab siv <v>cov lus qab zib me</v>, cov qauv zoo tshaj plaws thiab cov qauv siv tau feem ntau ua kom muaj qhov tsis raug ntawm cov lus (WER) qis. Tab sis, thaum muaj suab thiab nyuaj rau kuaj suab, qhov WER nce [14, 15, 16, 17], thiab cov qauv zoo tshaj plaws uas muaj zog ua tau zoo dua [6]. Qhov no tau coj mus rau ib lo lus nug tseem ceeb: Peb puas tuaj yeem sib txawv ntawm cov suab uas yooj yim thiab nyuaj, thiab siv peb cov tshuab kuaj suab kom raug?	en	hmn	Accuracy/Mistranslation	Critical			wrong translation for "simple vocabulary"	2026-02-01T01:43:40.727Z
175fd16f-70fc-4d14-a117-0ea4171fd81d	translategemma-12b	case-study-asr		Motivated by these limitations, we propose an alternative strategy that selectively invokes models based on the complexity of the input speech. In ASR tasks, the recognition difficulty varies significantly. Under clean acoustic conditions with simple vocabulary, both the SOTA and regular models typically yield low word error rates (WER). However, in noisy or acoustically challenging environments, the WER increases [14, 15, 16, 17], where robust SOTA models generally perform better [6]. This observation raises a key question: Can we distinguish between simple and complex speech inputs and adapt our ASR system accordingly?	Yog vim lawv cov kev txwv txhim, peb tau pom ib txoj kev tsis zoo uas siv cov qauv sib txawv raws li qhov tseeb ntawm suab uas tau los. Hauv cov haujlwm kuaj suab, qhov nyuaj ntawm kev kuaj suab muaj qhov sib txawv ntau. Thaum tsis muaj suab thiab siv cov lus qab zib me, cov qauv zoo tshaj plaws thiab cov qauv siv tau feem ntau ua<v> kom muaj qhov tsis raug ntawm cov lus (WER) qis</v>. Tab sis, thaum muaj suab thiab nyuaj rau kuaj suab, qhov WER nce [14, 15, 16, 17], thiab cov qauv zoo tshaj plaws uas muaj zog ua tau zoo dua [6]. Qhov no tau coj mus rau ib lo lus nug tseem ceeb: Peb puas tuaj yeem sib txawv ntawm cov suab uas yooj yim thiab nyuaj, thiab siv peb cov tshuab kuaj suab kom raug?	en	hmn	Fluency/Grammar	Critical				2026-02-01T01:45:45.701Z
175fd16f-70fc-4d14-a117-0ea4171fd81d	translategemma-12b	case-study-asr		Motivated by these limitations, we propose an alternative strategy that selectively invokes models based on the complexity of the input speech. In ASR tasks, the recognition difficulty varies significantly. Under clean acoustic conditions with simple vocabulary, both the SOTA and regular models typically yield low word error rates (WER). However, in noisy or acoustically challenging environments, the WER increases [14, 15, 16, 17], where robust SOTA models generally perform better [6]. This observation raises a key question: Can we distinguish between simple and complex speech inputs and adapt our ASR system accordingly?	Yog vim lawv cov kev txwv txhim, peb tau pom ib txoj kev tsis zoo uas siv cov qauv sib txawv raws li qhov tseeb ntawm suab uas tau los. Hauv cov haujlwm kuaj suab, qhov nyuaj ntawm kev kuaj suab muaj qhov sib txawv ntau. Thaum tsis muaj suab thiab siv cov lus qab zib me, cov qauv zoo tshaj plaws thiab cov qauv siv tau feem ntau ua kom muaj qhov tsis raug ntawm cov lus (WER) qis. Tab sis, thaum muaj suab thiab nyuaj rau kuaj suab, qhov WER nce [14, 15, 16, 17], thiab cov qauv zoo tshaj plaws uas muaj zog ua tau zoo dua [6]. Qhov no tau coj mus rau ib lo lus nug tseem ceeb: <v>Peb puas tuaj yeem sib txawv ntawm cov suab uas yooj yim thiab nyuaj, thiab siv peb cov tshuab kuaj suab kom raug</v>?	en	hmn	Fluency/Inconsistency	Major			Extremely difficult to understand	2026-02-01T01:48:48.475Z
99d9ff87-4e96-404b-a0dd-491fe532e851	translategemma-12b	case-study-asr		The results indicate that, due to the selective invocation of SOTA models, the SIMA model achieves significant WER reductions of 18.6%, 9.3%, and 28.2% relative to the base model on the three datasets. Furthermore, compared to the random invocation strategy, SIMA consistently delivers lower WER, with improvements of 6.6%, 4.2%, and 16.8%. Notably, the improvement on the FLEURS dataset is especially significant, as it is out-of-domain for the base model but in-domain for the LID-Top model. These findings convincingly demonstrate SIMA’s remarkable ability to precisely determine when to invoke the SOTA model, thereby optimizing overall ASR performance.	Cov <v>yeeb yis</v> qhia tias, vim tias siv cov qauv zoo tshaj plaws (SOTA) kom raug, SIMA model muaj kev txhim kho loj heev hauv kev ntsuas lus (WER), yog 18.6%, 9.3%, thiab 28.2% ntau dua qhov qauv pib ntawm peb qhov datasets. Tsis tas li ntawd, piv txwv nrog kev siv qauv los xaus, SIMA ib txwm muaj qhov WER qis dua, nrog kev txhim kho ntawm 6.6%, 4.2%, thiab 16.8%. Qhov tseem ceeb, kev txhim kho ntawm FLEURS dataset yog qhov tseem ceeb tshwj xeeb, vim tias nws tsis yog qhov uas qauv pib siv tau, tab sis qauv LID-Top siv tau. Cov lus teb no qhia meej tias SIMA muaj peev xwm zoo heev los txiav txim tias yuav siv qauv zoo tshaj plaws (SOTA) thaum twg, uas pab txhim kho kev ua haujlwm ntawm kev ntsuas lus (ASR) tag nrho.	en	hmn	Accuracy/Mistranslation	Critical			doesn't mean anything	2026-02-01T01:49:56.999Z
99d9ff87-4e96-404b-a0dd-491fe532e851	translategemma-12b	case-study-asr		The results indicate that, due to the selective invocation of SOTA models, the SIMA model achieves significant WER reductions of 18.6%, 9.3%, and 28.2% relative to the base model on the three datasets. Furthermore, compared to the random invocation strategy, SIMA consistently delivers lower WER, with improvements of 6.6%, 4.2%, and 16.8%. Notably, the improvement on the FLEURS dataset is especially significant, as it is out-of-domain for the base model but in-domain for the LID-Top model. These findings convincingly demonstrate SIMA’s remarkable ability to precisely determine when to invoke the SOTA model, thereby optimizing overall ASR performance.	Cov yeeb yis qhia tias, vim tias siv cov qauv zoo tshaj plaws (SOTA) kom raug, SIMA model <v>muaj kev txhim kho loj heev hauv kev ntsuas lus </v>(WER), yog 18.6%, 9.3%, thiab 28.2% ntau dua qhov qauv pib ntawm peb qhov datasets. Tsis tas li ntawd, piv txwv nrog kev siv qauv los xaus, SIMA ib txwm muaj qhov WER qis dua, nrog kev txhim kho ntawm 6.6%, 4.2%, thiab 16.8%. Qhov tseem ceeb, kev txhim kho ntawm FLEURS dataset yog qhov tseem ceeb tshwj xeeb, vim tias nws tsis yog qhov uas qauv pib siv tau, tab sis qauv LID-Top siv tau. Cov lus teb no qhia meej tias SIMA muaj peev xwm zoo heev los txiav txim tias yuav siv qauv zoo tshaj plaws (SOTA) thaum twg, uas pab txhim kho kev ua haujlwm ntawm kev ntsuas lus (ASR) tag nrho.	en	hmn	Accuracy/Mistranslation	Critical			gibberish	2026-02-01T01:50:29.259Z
99d9ff87-4e96-404b-a0dd-491fe532e851	translategemma-12b	case-study-asr		The results indicate that, due to the selective invocation of SOTA models, the SIMA model achieves significant WER reductions of 18.6%, 9.3%, and 28.2% relative to the base model on the three datasets. Furthermore, compared to the random invocation strategy, SIMA consistently delivers lower WER, with improvements of 6.6%, 4.2%, and 16.8%. Notably, the improvement on the FLEURS dataset is especially significant, as it is out-of-domain for the base model but in-domain for the LID-Top model. These findings convincingly demonstrate SIMA’s remarkable ability to precisely determine when to invoke the SOTA model, thereby optimizing overall ASR performance.	Cov yeeb yis qhia tias, vim tias siv cov qauv zoo tshaj plaws (SOTA) kom raug, SIMA model muaj kev txhim kho loj heev hauv kev ntsuas lus (WER), yog 18.6%, 9.3%, thiab 28.2% ntau dua qhov qauv <v>pib ntawm </v>peb qhov datasets. Tsis tas li ntawd, piv txwv nrog kev siv qauv los xaus, SIMA ib txwm muaj qhov WER qis dua, nrog kev txhim kho ntawm 6.6%, 4.2%, thiab 16.8%. Qhov tseem ceeb, kev txhim kho ntawm FLEURS dataset yog qhov tseem ceeb tshwj xeeb, vim tias nws tsis yog qhov uas qauv pib siv tau, tab sis qauv LID-Top siv tau. Cov lus teb no qhia meej tias SIMA muaj peev xwm zoo heev los txiav txim tias yuav siv qauv zoo tshaj plaws (SOTA) thaum twg, uas pab txhim kho kev ua haujlwm ntawm kev ntsuas lus (ASR) tag nrho.	en	hmn	Accuracy/Mistranslation	Critical			odd translation for "base". No hmong person would understand this to mean "base".	2026-02-01T01:51:33.240Z
99d9ff87-4e96-404b-a0dd-491fe532e851	translategemma-12b	case-study-asr		The results indicate that, due to the selective invocation of SOTA models, the SIMA model achieves significant WER reductions of 18.6%, 9.3%, and 28.2% relative to the base model on the three datasets. Furthermore, compared to the random invocation strategy, SIMA consistently delivers lower WER, with improvements of 6.6%, 4.2%, and 16.8%. Notably, the improvement on the FLEURS dataset is especially significant, as it is out-of-domain for the base model but in-domain for the LID-Top model. These findings convincingly demonstrate SIMA’s remarkable ability to precisely determine when to invoke the SOTA model, thereby optimizing overall ASR performance.	Cov yeeb yis qhia tias, vim tias siv cov qauv zoo tshaj plaws (SOTA) kom raug, SIMA model muaj kev txhim kho loj heev hauv kev ntsuas lus (WER), yog 18.6%, 9.3%, thiab 28.2% ntau dua qhov qauv <v>pib ntawm </v>peb qhov datasets. Tsis tas li ntawd, piv txwv nrog kev siv qauv los xaus, SIMA ib txwm muaj qhov WER qis dua, nrog kev txhim kho ntawm 6.6%, 4.2%, thiab 16.8%. Qhov tseem ceeb, kev txhim kho ntawm FLEURS dataset yog qhov tseem ceeb tshwj xeeb, vim tias nws tsis yog qhov uas qauv pib siv tau, tab sis qauv LID-Top siv tau. Cov lus teb no qhia meej tias SIMA muaj peev xwm zoo heev los txiav txim tias yuav siv qauv zoo tshaj plaws (SOTA) thaum twg, uas pab txhim kho kev ua haujlwm ntawm kev ntsuas lus (ASR) tag nrho.	en	hmn	Accuracy/Mistranslation	Critical			odd translation for "base". No hmong person would understand this to mean "base".	2026-02-01T01:51:33.703Z
99d9ff87-4e96-404b-a0dd-491fe532e851	translategemma-12b	case-study-asr		The results indicate that, due to the selective invocation of SOTA models, the SIMA model achieves significant WER reductions of 18.6%, 9.3%, and 28.2% relative to the base model on the three datasets. Furthermore, compared to the random invocation strategy, SIMA consistently delivers lower WER, with improvements of 6.6%, 4.2%, and 16.8%. Notably, the improvement on the FLEURS dataset is especially significant, as it is out-of-domain for the base model but in-domain for the LID-Top model. These findings convincingly demonstrate SIMA’s remarkable ability to precisely determine when to invoke the SOTA model, thereby optimizing overall ASR performance.	Cov yeeb yis qhia tias, vim tias siv cov qauv zoo tshaj plaws (SOTA) kom raug, SIMA model muaj kev txhim kho loj heev hauv kev ntsuas lus (WER), yog 18.6%, 9.3%, thiab 28.2% ntau dua qhov qauv pib ntawm peb qhov datasets. Tsis tas li ntawd, piv <v>txwv</v> nrog kev siv qauv los xaus, SIMA ib txwm muaj qhov WER qis dua, nrog kev txhim kho ntawm 6.6%, 4.2%, thiab 16.8%. Qhov tseem ceeb, kev txhim kho ntawm FLEURS dataset yog qhov tseem ceeb tshwj xeeb, vim tias nws tsis yog qhov uas qauv pib siv tau, tab sis qauv LID-Top siv tau. Cov lus teb no qhia meej tias SIMA muaj peev xwm zoo heev los txiav txim tias yuav siv qauv zoo tshaj plaws (SOTA) thaum twg, uas pab txhim kho kev ua haujlwm ntawm kev ntsuas lus (ASR) tag nrho.	en	hmn	Accuracy/Addition	Critical			unnecessary and changes the meaning of the phrase	2026-02-01T01:52:27.144Z
99d9ff87-4e96-404b-a0dd-491fe532e851	translategemma-12b	case-study-asr		The results indicate that, due to the selective invocation of SOTA models, the SIMA model achieves significant WER reductions of 18.6%, 9.3%, and 28.2% relative to the base model on the three datasets. Furthermore, compared to the random invocation strategy, SIMA consistently delivers lower WER, with improvements of 6.6%, 4.2%, and 16.8%. Notably, the improvement on the FLEURS dataset is especially significant, as it is out-of-domain for the base model but in-domain for the LID-Top model. These findings convincingly demonstrate SIMA’s remarkable ability to precisely determine when to invoke the SOTA model, thereby optimizing overall ASR performance.	Cov yeeb yis qhia tias, vim tias siv cov qauv zoo tshaj plaws (SOTA) kom raug, SIMA model muaj kev txhim kho loj heev hauv kev ntsuas lus (WER), yog 18.6%, 9.3%, thiab 28.2% ntau dua qhov qauv pib ntawm peb qhov datasets. Tsis tas li ntawd, piv txwv nrog <v>kev siv qauv los xaus</v>, SIMA ib txwm muaj qhov WER qis dua, nrog kev txhim kho ntawm 6.6%, 4.2%, thiab 16.8%. Qhov tseem ceeb, kev txhim kho ntawm FLEURS dataset yog qhov tseem ceeb tshwj xeeb, vim tias nws tsis yog qhov uas qauv pib siv tau, tab sis qauv LID-Top siv tau. Cov lus teb no qhia meej tias SIMA muaj peev xwm zoo heev los txiav txim tias yuav siv qauv zoo tshaj plaws (SOTA) thaum twg, uas pab txhim kho kev ua haujlwm ntawm kev ntsuas lus (ASR) tag nrho.	en	hmn	Accuracy/Mistranslation	Critical			gibberish	2026-02-01T01:52:55.475Z
99d9ff87-4e96-404b-a0dd-491fe532e851	translategemma-12b	case-study-asr		The results indicate that, due to the selective invocation of SOTA models, the SIMA model achieves significant WER reductions of 18.6%, 9.3%, and 28.2% relative to the base model on the three datasets. Furthermore, compared to the random invocation strategy, SIMA consistently delivers lower WER, with improvements of 6.6%, 4.2%, and 16.8%. Notably, the improvement on the FLEURS dataset is especially significant, as it is out-of-domain for the base model but in-domain for the LID-Top model. These findings convincingly demonstrate SIMA’s remarkable ability to precisely determine when to invoke the SOTA model, thereby optimizing overall ASR performance.	Cov yeeb yis qhia tias, vim tias siv cov qauv zoo tshaj plaws (SOTA) kom raug, SIMA model muaj kev txhim kho loj heev hauv kev ntsuas lus (WER), yog 18.6%, 9.3%, thiab 28.2% ntau dua qhov qauv pib ntawm peb qhov datasets. Tsis tas li ntawd, piv txwv nrog kev siv qauv los xaus, SIMA ib txwm muaj qhov WER qis dua, nrog kev txhim kho ntawm 6.6%, 4.2%, thiab 16.8%. Qhov tseem ceeb, kev txhim kho ntawm FLEURS dataset yog qhov tseem ceeb tshwj xeeb, vim tias nws tsis yog <v>qhov uas qauv pib siv tau, tab sis qauv LID-Top siv tau</v>. Cov lus teb no qhia meej tias SIMA muaj peev xwm zoo heev los txiav txim tias yuav siv qauv zoo tshaj plaws (SOTA) thaum twg, uas pab txhim kho kev ua haujlwm ntawm kev ntsuas lus (ASR) tag nrho.	en	hmn	Accuracy/Mistranslation	Critical			gibberish	2026-02-01T01:54:13.261Z
99d9ff87-4e96-404b-a0dd-491fe532e851	translategemma-12b	case-study-asr		The results indicate that, due to the selective invocation of SOTA models, the SIMA model achieves significant WER reductions of 18.6%, 9.3%, and 28.2% relative to the base model on the three datasets. Furthermore, compared to the random invocation strategy, SIMA consistently delivers lower WER, with improvements of 6.6%, 4.2%, and 16.8%. Notably, the improvement on the FLEURS dataset is especially significant, as it is out-of-domain for the base model but in-domain for the LID-Top model. These findings convincingly demonstrate SIMA’s remarkable ability to precisely determine when to invoke the SOTA model, thereby optimizing overall ASR performance.	Cov yeeb yis qhia tias, vim tias siv cov qauv zoo tshaj plaws (SOTA) kom raug, SIMA model muaj kev txhim kho loj heev hauv kev ntsuas lus (WER), yog 18.6%, 9.3%, thiab 28.2% ntau dua qhov qauv pib ntawm peb qhov datasets. Tsis tas li ntawd, piv txwv nrog kev siv qauv los xaus, SIMA ib txwm muaj qhov WER qis dua, nrog kev txhim kho ntawm 6.6%, 4.2%, thiab 16.8%. Qhov tseem ceeb, kev txhim kho ntawm FLEURS dataset yog qhov tseem ceeb tshwj xeeb, vim tias nws tsis yog qhov uas qauv pib siv tau, tab sis qauv LID-Top siv tau. <v>Cov lus teb no</v> qhia meej tias SIMA muaj peev xwm zoo heev los txiav txim tias yuav siv qauv zoo tshaj plaws (SOTA) thaum twg, uas pab txhim kho kev ua haujlwm ntawm kev ntsuas lus (ASR) tag nrho.	en	hmn	Accuracy/Mistranslation	Critical			Wrong translation for "These findings"	2026-02-01T01:54:43.672Z
99d9ff87-4e96-404b-a0dd-491fe532e851	translategemma-12b	case-study-asr		The results indicate that, due to the selective invocation of SOTA models, the SIMA model achieves significant WER reductions of 18.6%, 9.3%, and 28.2% relative to the base model on the three datasets. Furthermore, compared to the random invocation strategy, SIMA consistently delivers lower WER, with improvements of 6.6%, 4.2%, and 16.8%. Notably, the improvement on the FLEURS dataset is especially significant, as it is out-of-domain for the base model but in-domain for the LID-Top model. These findings convincingly demonstrate SIMA’s remarkable ability to precisely determine when to invoke the SOTA model, thereby optimizing overall ASR performance.	Cov yeeb yis qhia tias, vim tias siv cov qauv zoo tshaj plaws (SOTA) kom raug, SIMA model muaj kev txhim kho loj heev hauv kev ntsuas lus (WER), yog 18.6%, 9.3%, thiab 28.2% ntau dua qhov qauv pib ntawm peb qhov datasets. Tsis tas li ntawd, piv txwv nrog kev siv qauv los xaus, SIMA ib txwm muaj qhov WER qis dua, nrog kev txhim kho ntawm 6.6%, 4.2%, thiab 16.8%. Qhov tseem ceeb, kev txhim kho ntawm FLEURS dataset yog qhov tseem ceeb tshwj xeeb, vim tias nws tsis yog qhov uas qauv pib siv tau, tab sis qauv LID-Top siv tau. Cov lus teb no qhia meej tias SIMA muaj peev xwm <v>zoo heev</v> los txiav txim tias yuav siv qauv zoo tshaj plaws (SOTA) thaum twg, uas pab txhim kho kev ua haujlwm ntawm kev ntsuas lus (ASR) tag nrho.	en	hmn	Fluency/Inconsistency	Major			unnecessary	2026-02-01T01:55:25.659Z
99d9ff87-4e96-404b-a0dd-491fe532e851	translategemma-12b	case-study-asr		The results indicate that, due to the selective invocation of SOTA models, the SIMA model achieves significant WER reductions of 18.6%, 9.3%, and 28.2% relative to the base model on the three datasets. Furthermore, compared to the random invocation strategy, SIMA consistently delivers lower WER, with improvements of 6.6%, 4.2%, and 16.8%. Notably, the improvement on the FLEURS dataset is especially significant, as it is out-of-domain for the base model but in-domain for the LID-Top model. These findings convincingly demonstrate SIMA’s remarkable ability to precisely determine when to invoke the SOTA model, thereby optimizing overall ASR performance.	Cov yeeb yis qhia tias, vim tias siv cov qauv zoo tshaj plaws (SOTA) kom raug, SIMA model muaj kev txhim kho loj heev hauv kev ntsuas lus (WER), yog 18.6%, 9.3%, thiab 28.2% ntau dua qhov qauv pib ntawm peb qhov datasets. Tsis tas li ntawd, piv txwv nrog kev siv qauv los xaus, SIMA ib txwm muaj qhov WER qis dua, nrog kev txhim kho ntawm 6.6%, 4.2%, thiab 16.8%. Qhov tseem ceeb, kev txhim kho ntawm FLEURS dataset yog qhov tseem ceeb tshwj xeeb, vim tias nws tsis yog qhov uas qauv pib siv tau, tab sis qauv LID-Top siv tau. Cov lus teb no qhia meej tias SIMA muaj peev xwm zoo heev los txiav txim tias yuav siv qauv zoo tshaj plaws (SOTA) thaum twg, <v>uas pab txhim kho kev ua haujlwm ntawm kev ntsuas lus (ASR) tag nrho</v>.	en	hmn	Fluency/Grammar	Critical			Another case of trying to use English sentence structure with Hmong, leading to the sentence making no sense	2026-02-01T01:56:15.393Z
d6c14eeb-ef11-4628-a4f9-fa30e1f54b0d	translategemma-12b	case-study-asr		The invocation decision accuracy (ACC) and F1 scores are approximately 70%, supporting our hypothesis that SLLMs can effectively differentiate speech inputs based on complexity. Although SIMA exhibits a slight WER gap compared to LID-Top, it reduces invocation costs by approximately 0.51× across the three datasets, significantly lowering associated expenses.	Cov ntawv ceeb toom txog kev txiav txim (ACC) thiab cov ntawv sau F1 yog ze li 70%, uas qhia tias peb pom zoo tias cov <v>tshuaj yeeb</v> SLLM tuaj yeem paub zoo txog qhov sib txawv ntawm cov lus hais raws li qhov tseeb. Txawm hais tias SIMA muaj qhov sib txawv me ntsis hais txog qhov tseeb ntawm kev paub lus (WER) piv rau LID-Top, tab sis nws txo qhov ntau ntxiv rau kev siv (invocation costs) los ntawm ze li 0.51 zaug hauv peb qhov datasets, uas txo qhov ntau ntxiv rau kev siv nyob rau hauv kev ua haujlwm.	en	hmn	Accuracy/Mistranslation	Critical				2026-02-01T01:57:33.521Z
d6c14eeb-ef11-4628-a4f9-fa30e1f54b0d	translategemma-12b	case-study-asr		The invocation decision accuracy (ACC) and F1 scores are approximately 70%, supporting our hypothesis that SLLMs can effectively differentiate speech inputs based on complexity. Although SIMA exhibits a slight WER gap compared to LID-Top, it reduces invocation costs by approximately 0.51× across the three datasets, significantly lowering associated expenses.	Cov ntawv ceeb toom txog kev txiav txim (ACC) thiab cov ntawv sau F1 yog ze li 70%, uas qhia tias peb pom zoo tias cov tshuaj yeeb SLLM <v>tuaj yeem paub zoo txog qhov sib txawv ntawm cov lus hais raws li qhov tseeb.</v> Txawm hais tias SIMA muaj qhov sib txawv me ntsis hais txog qhov tseeb ntawm kev paub lus (WER) piv rau LID-Top, tab sis nws txo qhov ntau ntxiv rau kev siv (invocation costs) los ntawm ze li 0.51 zaug hauv peb qhov datasets, uas txo qhov ntau ntxiv rau kev siv nyob rau hauv kev ua haujlwm.	en	hmn	Accuracy/Mistranslation	Critical			gibberish	2026-02-01T01:58:16.889Z
d6c14eeb-ef11-4628-a4f9-fa30e1f54b0d	translategemma-12b	case-study-asr		The invocation decision accuracy (ACC) and F1 scores are approximately 70%, supporting our hypothesis that SLLMs can effectively differentiate speech inputs based on complexity. Although SIMA exhibits a slight WER gap compared to LID-Top, it reduces invocation costs by approximately 0.51× across the three datasets, significantly lowering associated expenses.	Cov ntawv ceeb toom txog kev txiav txim (ACC) thiab cov ntawv sau F1 yog ze li 70%, uas qhia tias peb pom zoo tias cov tshuaj yeeb SLLM tuaj yeem paub zoo txog qhov sib txawv ntawm cov lus hais raws li qhov tseeb. <v>Txawm hais tias SIMA muaj qhov sib txawv me ntsis hais txog qhov tseeb ntawm kev paub lus (WER) piv rau LID-Top, tab sis nws txo qhov ntau ntxiv rau kev siv (invocation costs) los ntawm ze li 0.51 zaug hauv peb qhov datasets, uas txo qhov ntau ntxiv rau kev siv nyob rau hauv kev ua haujlwm.</v>	en	hmn	Accuracy/Mistranslation	Critical			gibberish, unintelligible	2026-02-01T01:59:27.556Z
973ea86d-504b-4772-9d98-7b6046ba88d0	translategemma-12b	case-study-asr		Although the current SIMA model significantly improves WER, it still lags behind Whisper [6] on out-of-domain data, FLEURS [28]. This limitation stems from our initial hypothesis that the base SLLM model can effectively perform the invoke task. Our base SLLM model [29] is inherently weaker than specialized models such as Whisper because of the limitation of training data. In future work, we plan to adopt Whisper [6] as the base model and further refine the SIMA system to improve the ASR performance of the SOTA model.	Txawm hais tias qhov <v>kev kos </v>SIMA tam sim no zoo dua ntau, tab sis tseem tsis zoo ib yam nrog Whisper [6] rau cov ntaub ntawv uas tsis yog qhov tseem ceeb, FLEURS [28]. Qhov kev txwv no ntsig los ntawm peb lub tswv yim uas peb pib tias qhov kev kos SLLM uas peb siv thaum pua tawm tuaj yeem ua haujlwm zoo. Qhov kev kos SLLM uas peb siv thaum pua tawm [29] yog qhov tsis muaj zog ntau dua li cov kos tshwj xeeb xws li Whisper vim tias muaj kev txwv ntawm cov ntaub ntawv uas siv rau kev kos. Hauv lub sijhawm ntxiv, peb npaj yuav siv Whisper [6] ua qhov kos uas peb siv thaum pua tawm thiab txuas ntxiv kho qhov kev kos SIMA kom zoo dua qhov kev kos uas muaj peb kos zoo tshwj xeeb.	en	hmn	Accuracy/Mistranslation	Critical				2026-02-01T02:01:20.688Z
973ea86d-504b-4772-9d98-7b6046ba88d0	translategemma-12b	case-study-asr		Although the current SIMA model significantly improves WER, it still lags behind Whisper [6] on out-of-domain data, FLEURS [28]. This limitation stems from our initial hypothesis that the base SLLM model can effectively perform the invoke task. Our base SLLM model [29] is inherently weaker than specialized models such as Whisper because of the limitation of training data. In future work, we plan to adopt Whisper [6] as the base model and further refine the SIMA system to improve the ASR performance of the SOTA model.	Txawm hais tias qhov kev kos SIMA tam sim no zoo dua ntau, <v>tab sis </v>tseem tsis zoo ib yam nrog Whisper [6] rau cov ntaub ntawv uas tsis yog qhov tseem ceeb, FLEURS [28]. Qhov kev txwv no ntsig los ntawm peb lub tswv yim uas peb pib tias qhov kev kos SLLM uas peb siv thaum pua tawm tuaj yeem ua haujlwm zoo. Qhov kev kos SLLM uas peb siv thaum pua tawm [29] yog qhov tsis muaj zog ntau dua li cov kos tshwj xeeb xws li Whisper vim tias muaj kev txwv ntawm cov ntaub ntawv uas siv rau kev kos. Hauv lub sijhawm ntxiv, peb npaj yuav siv Whisper [6] ua qhov kos uas peb siv thaum pua tawm thiab txuas ntxiv kho qhov kev kos SIMA kom zoo dua qhov kev kos uas muaj peb kos zoo tshwj xeeb.	en	hmn	Fluency/Grammar	Critical			unnecessary preposition	2026-02-01T02:02:34.257Z
973ea86d-504b-4772-9d98-7b6046ba88d0	translategemma-12b	case-study-asr		Although the current SIMA model significantly improves WER, it still lags behind Whisper [6] on out-of-domain data, FLEURS [28]. This limitation stems from our initial hypothesis that the base SLLM model can effectively perform the invoke task. Our base SLLM model [29] is inherently weaker than specialized models such as Whisper because of the limitation of training data. In future work, we plan to adopt Whisper [6] as the base model and further refine the SIMA system to improve the ASR performance of the SOTA model.	Txawm hais tias qhov kev kos SIMA tam sim no zoo dua ntau, tab sis tseem tsis zoo ib yam nrog Whisper [6] rau cov ntaub ntawv uas tsis yog qhov tseem ceeb, FLEURS [28]. Qhov kev <v>txwv no ntsig</v> los ntawm peb lub tswv yim uas peb pib tias qhov kev kos SLLM uas peb siv thaum pua tawm tuaj yeem ua haujlwm zoo. Qhov kev kos SLLM uas peb siv thaum pua tawm [29] yog qhov tsis muaj zog ntau dua li cov kos tshwj xeeb xws li Whisper vim tias muaj kev txwv ntawm cov ntaub ntawv uas siv rau kev kos. Hauv lub sijhawm ntxiv, peb npaj yuav siv Whisper [6] ua qhov kos uas peb siv thaum pua tawm thiab txuas ntxiv kho qhov kev kos SIMA kom zoo dua qhov kev kos uas muaj peb kos zoo tshwj xeeb.	en	hmn	Accuracy/Mistranslation	Critical			jibberish	2026-02-01T02:04:02.960Z
973ea86d-504b-4772-9d98-7b6046ba88d0	translategemma-12b	case-study-asr		Although the current SIMA model significantly improves WER, it still lags behind Whisper [6] on out-of-domain data, FLEURS [28]. This limitation stems from our initial hypothesis that the base SLLM model can effectively perform the invoke task. Our base SLLM model [29] is inherently weaker than specialized models such as Whisper because of the limitation of training data. In future work, we plan to adopt Whisper [6] as the base model and further refine the SIMA system to improve the ASR performance of the SOTA model.	Txawm hais tias qhov kev kos SIMA tam sim no zoo dua ntau, tab sis tseem tsis zoo ib yam nrog Whisper [6] rau cov ntaub ntawv uas tsis yog qhov tseem ceeb, FLEURS [28]. Qhov kev txwv no ntsig los ntawm peb lub tswv yim uas peb pib tias qhov kev kos SLLM uas <v>peb siv thaum pua tawm tuaj yeem ua haujlwm zoo</v>. Qhov kev kos SLLM uas peb siv thaum pua tawm [29] yog qhov tsis muaj zog ntau dua li cov kos tshwj xeeb xws li Whisper vim tias muaj kev txwv ntawm cov ntaub ntawv uas siv rau kev kos. Hauv lub sijhawm ntxiv, peb npaj yuav siv Whisper [6] ua qhov kos uas peb siv thaum pua tawm thiab txuas ntxiv kho qhov kev kos SIMA kom zoo dua qhov kev kos uas muaj peb kos zoo tshwj xeeb.	en	hmn	Accuracy/Mistranslation	Critical			gibberish	2026-02-01T02:04:48.777Z
973ea86d-504b-4772-9d98-7b6046ba88d0	translategemma-12b	case-study-asr		Although the current SIMA model significantly improves WER, it still lags behind Whisper [6] on out-of-domain data, FLEURS [28]. This limitation stems from our initial hypothesis that the base SLLM model can effectively perform the invoke task. Our base SLLM model [29] is inherently weaker than specialized models such as Whisper because of the limitation of training data. In future work, we plan to adopt Whisper [6] as the base model and further refine the SIMA system to improve the ASR performance of the SOTA model.	Txawm hais tias qhov kev kos SIMA tam sim no zoo dua ntau, tab sis tseem tsis zoo ib yam nrog Whisper [6] rau cov ntaub ntawv uas tsis yog qhov tseem ceeb, FLEURS [28]. Qhov kev txwv no ntsig los ntawm peb lub tswv yim uas peb pib tias qhov kev kos SLLM uas peb siv thaum pua tawm tuaj yeem ua haujlwm zoo. Q<v>hov kev kos SLLM uas peb siv thaum pua tawm [29] yog qhov tsis muaj zog ntau dua li cov kos tshwj xeeb xws li Whisper vim tias muaj kev txwv ntawm cov ntaub ntawv uas siv rau kev kos</v>. Hauv lub sijhawm ntxiv, peb npaj yuav siv Whisper [6] ua qhov kos uas peb siv thaum pua tawm thiab txuas ntxiv kho qhov kev kos SIMA kom zoo dua qhov kev kos uas muaj peb kos zoo tshwj xeeb.	en	hmn	Accuracy/Mistranslation	Critical			gibberish	2026-02-01T02:05:57.624Z
973ea86d-504b-4772-9d98-7b6046ba88d0	translategemma-12b	case-study-asr		Although the current SIMA model significantly improves WER, it still lags behind Whisper [6] on out-of-domain data, FLEURS [28]. This limitation stems from our initial hypothesis that the base SLLM model can effectively perform the invoke task. Our base SLLM model [29] is inherently weaker than specialized models such as Whisper because of the limitation of training data. In future work, we plan to adopt Whisper [6] as the base model and further refine the SIMA system to improve the ASR performance of the SOTA model.	Txawm hais tias qhov kev kos SIMA tam sim no zoo dua ntau, tab sis tseem tsis zoo ib yam nrog Whisper [6] rau cov ntaub ntawv uas tsis yog qhov tseem ceeb, FLEURS [28]. Qhov kev txwv no ntsig los ntawm peb lub tswv yim uas peb pib tias qhov kev kos SLLM uas peb siv thaum pua tawm tuaj yeem ua haujlwm zoo. Qhov kev kos SLLM uas peb siv thaum pua tawm [29] yog qhov tsis muaj zog ntau dua li cov kos tshwj xeeb xws li Whisper vim tias muaj kev txwv ntawm cov ntaub ntawv uas siv rau kev kos. <v>Hauv lub sijhawm ntxiv</v>, peb npaj yuav siv Whisper [6] ua qhov kos uas peb siv thaum pua tawm thiab txuas ntxiv kho qhov kev kos SIMA kom zoo dua qhov kev kos uas muaj peb kos zoo tshwj xeeb.	en	hmn	Accuracy/Mistranslation	Critical			Wrong way to say "In future work"	2026-02-01T02:06:26.987Z
973ea86d-504b-4772-9d98-7b6046ba88d0	translategemma-12b	case-study-asr		Although the current SIMA model significantly improves WER, it still lags behind Whisper [6] on out-of-domain data, FLEURS [28]. This limitation stems from our initial hypothesis that the base SLLM model can effectively perform the invoke task. Our base SLLM model [29] is inherently weaker than specialized models such as Whisper because of the limitation of training data. In future work, we plan to adopt Whisper [6] as the base model and further refine the SIMA system to improve the ASR performance of the SOTA model.	Txawm hais tias qhov kev kos SIMA tam sim no zoo dua ntau, tab sis tseem tsis zoo ib yam nrog Whisper [6] rau cov ntaub ntawv uas tsis yog qhov tseem ceeb, FLEURS [28]. Qhov kev txwv no ntsig los ntawm peb lub tswv yim uas peb pib tias qhov kev kos SLLM uas peb siv thaum pua tawm tuaj yeem ua haujlwm zoo. Qhov kev kos SLLM uas peb siv thaum pua tawm [29] yog qhov tsis muaj zog ntau dua li cov kos tshwj xeeb xws li Whisper vim tias muaj kev txwv ntawm cov ntaub ntawv uas siv rau kev kos. Hauv lub sijhawm ntxiv, peb npaj yuav siv Whisper [6] ua qhov kos uas peb siv thaum pua tawm thiab txuas ntxiv kho qhov kev kos SIMA <v>kom zoo dua qhov kev kos uas muaj peb kos zoo tshwj xeeb.</v>	en	hmn	Accuracy/Mistranslation	Critical			gibberish	2026-02-01T02:07:08.711Z
973ea86d-504b-4772-9d98-7b6046ba88d0	translategemma-12b	case-study-asr		Although the current SIMA model significantly improves WER, it still lags behind Whisper [6] on out-of-domain data, FLEURS [28]. This limitation stems from our initial hypothesis that the base SLLM model can effectively perform the invoke task. Our base SLLM model [29] is inherently weaker than specialized models such as Whisper because of the limitation of training data. In future work, we plan to adopt Whisper [6] as the base model and further refine the SIMA system to improve the ASR performance of the SOTA model.	Txawm hais tias qhov kev kos SIMA tam sim no zoo dua ntau, tab sis tseem tsis zoo ib yam nrog Whisper [6] rau cov ntaub ntawv uas tsis yog qhov tseem ceeb, FLEURS [28]. Qhov kev txwv no ntsig los ntawm peb lub tswv yim uas peb pib tias qhov kev kos SLLM uas peb siv thaum pua tawm tuaj yeem ua haujlwm zoo. Qhov kev kos SLLM uas peb siv thaum pua tawm [29] yog qhov tsis muaj zog ntau dua li cov kos tshwj xeeb xws li Whisper vim tias muaj kev txwv ntawm cov ntaub ntawv uas siv rau kev kos. Hauv lub sijhawm ntxiv, peb npaj yuav siv Whisper [6] u<v>a qhov kos uas peb siv thaum pua tawm thiab txuas ntxiv kho qhov kev kos SIMA </v>kom zoo dua qhov kev kos uas muaj peb kos zoo tshwj xeeb.	en	hmn	Accuracy/Mistranslation	Critical			gibberish	2026-02-01T02:07:37.139Z