网站介绍

SEW-D-tiny

SEW-D by ASAPP Research
The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Speaker Identification, Intent Classification, Emotion Recognition, etc…
Paper: Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
Authors: Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi
Abstract
This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. For example, under the 100h-960h semi-supervised setup on LibriSpeech, SEW achieves a 1.9x inference speedup compared to wav2vec 2.0, with a 13.5% relative reduction in word error rate. With a similar inference time, SEW reduces word error rate by 25-50% across different model sizes.
The original model can be found under https://github.com/asappresearch/sew#model-checkpoints .

Usage

See this blog for more information on how to fine-tune the model. Note that the class Wav2Vec2ForCTC has to be replaced by SEWDForCTC.

特别声明

本站Ai工具导航提供的“asapp/sew-d-tiny-100k”来源于网络，不保证外部链接的准确性和完整性，同时，对于该外部链接的指向，不由“Ai工具导航”实际控制，在“2025-10-05 21:03:38”收录时，该网页上的内容，都属于合规合法，后期网页的内容如出现违规，可以直接联系网站管理员进行删除，“Ai工具导航”不承担任何责任。

流量统计

7天
30天
90天
365天

页面浏览量

独立访客数

链接点击量

asapp/sew-d-tiny-100k

举报

网站介绍

SEW-D-tiny

Usage

流量统计

猜你喜欢

Iconoir

Boxicons

谷歌字体

方正字库

造字工坊

iFontCloud 文鼎雲字庫

Fonts In Use

Kukla Kit

AVATARZ

Handz

Getillustrations

Doodle Ipsum