STT
From BC$ MobileTV Wiki
Speech-To-Text (commonly abbreviated as STT) is a method of generating a textual representation of an audio clip, voice message or live speech segment.
Tools
- Nuance -- Dragon Business Transcription: http://www.nuance.com/dragon/transcription-solutions/
- Nuance -- Dragon Medical Dication: http://www.nuance.com/for-healthcare/by-solutions/speech-recognition/dragon-medical/
- Nuance -- Nina API: http://www.programmableweb.com/api/nuance-nina (move over Siri here comes Nina[1])
Resources
- Mozilla -- Common Voice: https://voice.mozilla.org/en/datasets (multi-language audio STT training/testing dataset)[2]
- Mozilla -- DeepSpeech: https://github.com/mozilla/DeepSpeech | DOCS | EXAMPLES[3][4]
- iSpeech Dictation: http://www.ispeech.org/apps/dictation
- IfByPhone: http://ifbyphone.com
Tutorials
- Android -- Simple Speech-To-Text using built-in API: http://viralpatel.net/blogs/android-speech-to-text-api/
- Converting from Speech to Text with JavaScript: https://tutorialzine.com/2017/08/converting-from-speech-to-text-with-javascript
- Convert Speech to Text in Android Application: https://www.stacktips.com/tutorials/android/speech-to-text-in-android | SRC
- How Neural Networks Recognize Speech-to-Text: https://dzone.com/articles/how-to-train-a-neural-network-to-recognize-speech
- IBM Watson -- Speech to Text: https://speech-to-text-demo.ng.bluemix.net (web-based component with ability to recognize multiple speakers)
- IBM Watson -- Visual Recognition: https://www.ibm.com/watson/services/visual-recognition/ | DEMO
- IBM Watson team -- React Components library: https://watson-developer-cloud.github.io/react-components/
External Links
- wikipedia: Speech-to-text reporter
- wikipedia: Transcription (linguistics)
- Ifbyphone Voice-To-Text Transcription Gives Users a Choice: http://public.ifbyphone.com/about/press/voice-to-text-transcription
- Forget Clicks And Views, The Future Is All About Listening: http://www.mediapost.com/publications/article/292318/forget-clicks-and-views-the-future-is-all-about-l.html
- Google AI blog -- Announcing AudioSet - A Dataset for Audio Event Research: https://ai.googleblog.com/2017/03/announcing-audioset-dataset-for-audio.html
References
- ↑ Move Over Siri, Nuance’s Nina Is Here For iOS: http://accesstechnews.blogspot.ca/2012/08/move-over-siri-nuances-nina-is-here-for.html
- ↑ The Fisher Corpus -- a Resource for the Next Generations of Speech-to-Text: https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/lrec2004-fisher-corpus.pdf
- ↑ Deep Speech -- Scaling up end-to-end speech recognition: https://arxiv.org/abs/1412.5567
- ↑ DeepSpeech 0.6 -- Mozilla’s Speech-to-Text Engine Gets Fast, Lean, and Ubiquitous: https://hacks.mozilla.org/2019/12/deepspeech-0-6-mozillas-speech-to-text-engine/