Speech Recognition
From BC$ MobileTV Wiki
In computer science and information technology, Speech Recognition is the act of receiving a speaker's voice as audio input within a specific program, and subsequently rendering that input into a machine understandable format within a given software program, system or platform. [1]
Speech Recognition can be best characterized as "knowing or deducing what the speaker has said".
Contents
Specs
- Web Speech API Specification: http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
- JSpeech Grammar Format (JSGF): https://www.w3.org/TR/jsgf/
Web Speech API
- Web Speech API demo: https://www.google.com/intl/en/chrome/demos/speech.html
Common Voice
- Mozilla - Common Voice: https://commonvoice.mozilla.org/en (initiative to help teach machines how real people speak)
Tools
- Rev.AI -- accurate Speech Recognition: https://www.rev.ai/
- Microsoft Speech API: http://msdn.microsoft.com/en-us/library/jj127860.aspx
- wikipedia: Microsoft Speech API
- Bing - Speech Search, Synthesis & Commands (for Windows8 / WindowsPhone users): http://www.bing.com/dev/en-us/speech
- Speech Recognition in Google "simulator": https://console.actions.google.com/project/speechrecognition-37401/simulator/
- Chrome extensions -- Speech Recognition Anywhere: https://chrome.google.com/webstore/detail/speech-recognition-anywhe/kdnnmhpmcakdilnofmllgcigkibjonof
- iSpeech: http://ispeech.org | iSpeech API Specification Version 2.0
- Speex - A Free Codec For Free Speech: http://www.speex.org/
- Asterisk is an open-source VoIP platform/server which can be used for call-in IVR or Speech Recognition by Phone: http://www.asterisk.org/downloads
- Nuance features the largest vocabulary (500,000+ words): http://www.nuance.com/mobilesearch/
- IBM ViaVoice seems easy to use and efficient (2007 award): http://www-306.ibm.com/software/pervasive/embedded_viavoice/
- Lumenvox is cheap to do tests and trials on (just $50 US for 500-word test engine): http://www.lumenvox.com/ (Voice Recognition & Speech-to-Text both)
- GotVoice - Mobile VoiceMail and Speech-to-Text: http://www.gotvoice.com
- Advanced Media - Japanese company with AmiVoice software, specially targeted for Mobile Phones: http://www.advanced-media.co.jp/ [1]
- 3M SyncStream: http://solutions.3m.com.au/wps/portal/3M/en_AU/HIS_AU/home/products-services/dictation-transcription/syncstream/
- FirstDraft: http://firstdraft.infraware.com/
- VoxReports: http://www.atirix.com/VoxReports.aspx
- SpeechQ (for Radiology): http://mmodal.com/products/speechq/
- MediSpeech: http://www.g2speech.com/solutions/medispeech.html
Nuance
- Nuance: http://www.nuance.con | Dragon NaturallySpeaking | DOWNLOAD | API[4]
- wikipedia: Dragon NaturallySpeaking[5]
- Nuance - Audio input device (compatibility checker): http://support.nuance.com/compatibility/
- Vansonic HS-GEN-C analog headset/microphone (with adaptor): http://www.ybsales.com/Nuance-HS-GEN-C-Dragon-Stereo-Communication-Headset.html[6]
Resources
- Mozilla dev docs -- SpeechRecognition API: https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition (Chrome support only so far)
- Sphinx: https://cmusphinx.github.io/ | SAMPLES (Java-based OSS Speech Recognition toolkit)
- J.A.R.V.I.S. (Java-Speech-API): https://github.com/lkuza2/java-speech-api
- Java Speech API (JSR 113) implementation: https://sourceforge.net/projects/jsapi/
- Java Speech API - Frequently Asked Questions: https://www.oracle.com/technetwork/java/jsapifaq-135248.html#implementation
- Windows Speech Recognition commands: https://support.microsoft.com/en-us/help/12427/windows-speech-recognition-commands
- SpeechAPI: http://www.speechapi.com/
- Bing Speech API: https://www.microsoft.com/cognitive-services/en-us/speech-api[8][9]
- Nuance Dragon API: http://dragonmobile.nuancemobiledeveloper.com[10][11]
- Web-Accessible Multimodal Internet Applications (WAMI): http://wami.csail.mit.edu (Speech/Voice for Flash)
- Microphone activity (FLASH): http://livedocs.adobe.com/flash/9.0/main/wwhelp/wwhimpl/common/html/wwhelp.htm?context=LiveDocs_Parts&file=00001864.html
- 10 Types of Microphones: http://electronics.howstuffworks.com/gadgets/audio-music/question309.htm
- Speech-enabled Mobile Web Apps via Nuance's new Dragon Speech Recognition API: http://dragonmobile.nuancemobiledeveloper.com/phpbb/viewtopic.php?f=21&t=416&p=1771#p1771
- wami -- A Java-script API for speech recognition: https://code.google.com/p/wami/
- The SUMO Heavy 2019 Voice Commerce Survey: https://www.sumoheavy.com/the-sumo-heavy-2019-voice-commerce-survey/
Tutorials
- Java-Speech-Recognizer-Tutorial: https://github.com/goxr3plus/Java-Speech-Recognizer-Tutorial--Calculator | VIDEO #1 to VIDEO #5
- Basics of Java Speech Grammar Format (JSGF): https://puneetk.com/basics-of-java-speech-grammar-format-jsgf
- Building a language model: https://cmusphinx.github.io/wiki/tutoriallm/
- Use the Java Speech API (JSPAPI): https://www.rgagnon.com/javadetails/java-use-java-speech-api.html
- JSAPI 1.0 Setup using FreeTTS: https://freetts.sourceforge.io/docs/jsapi_setup.html
- FreeTTS Programmer's Guide: https://freetts.sourceforge.io/docs/ProgrammerGuide.html
- Speech Recognition -- javax.speech.recognition: http://www.ling.helsinki.fi/kit/2004s/ctl310gen/L7-Speech/JSAPI/Recognition.html[12]
- Java Sound API -- Soundbanks: https://www.oracle.com/technetwork/java/soundbanks-135798.html
- Merapi AIR/Java Speech Recognition and Voice Control: http://javadz.wordpress.com/2009/06/14/42/
- HOW TO - Developing Dragon NaturallySpeaking applications that recognize speech: http://www.chant.net/support/knowledgebase/howtos/h071128.aspx
- How to Add Voice Interactivity to Your Site: http://dev.opera.com/articles/view/add-voice-interactivity-to-your-site/ (Works only with WindowsXP)
- How To Use Speech Recognition in Windows XP: http://support.microsoft.com/kb/306901
- How Electret Energy Harvester (and Microphones) Work: http://tikalon.com/blog/blog.php?article=2011/electret
- Speaking in Context - Designing Voice Interfaces: http://punchcut.com/perspectives/speaking-context-designing-voice-interfaces
- Chunked Transfer-Encoding in PHP With Guzzle: http://mtdowling.com/blog/2012/01/27/chunked-encoding-in-php-with-guzzle/
- How to backup and restore a Dragon User Profile: https://nuance.custhelp.com/app/answers/detail/a_id/15129/~/how-to-backup-and-restore-a-dragon-user-profile
- Dragon Backup location: http://www.pcspeak.com/hints/general/backup.shtml
- Setting Speech Recognition & Text-To-Speech settings in Windows 7: Setting speech options: http://windows.microsoft.com/en-us/windows/setting-speech-options#1TC=windows-7
- Making the Most of Cortana in Windows 10: http://www.dummies.com/how-to/content/making-the-most-of-cortana-in-windows-10.html
- Microsoft Retooling Cortana For Mobile Users: https://www.mediapost.com/publications/article/323613/microsoft-retooling-cortana-for-mobile-users.html
- Ask a UXpert -- Adobe XD - How to Prototype Voice Experiences that Delight Users: https://theblog.adobe.com/how-to-voice-ui-prototyping-delight-users/
- JavaScript Speech Recognition: https://davidwalsh.name/speech-recognition
External Links
- wikipedia: Speech Recognition
- wikipedia: Voice user interface (VUI)
- wikipedia: Telematics
- wikipedia: Chunked transfer encoding
- Deconstructing Google Mobile's Voice Search on the iPhone: http://waxy.org/2008/11/deconstructing_google_mobiles_voice_search_on_the_iphone
- Cheap, Easy Human-powered Audio Transcription with Amazon's Mechanical Turk: http://waxy.org/2008/09/audio_transcription_with_mechanical_turk/
- Japanese SINGER, SONG WRITER... voice recognition, shaping and correction [2]
- HSTP: Hyperspeech Transfer Protocol: http://www.readwriteweb.com/archives/hstp_hyperspeech_transfer_protocol.php
- Speech Recognition for a Digital Video Library: http://www.informedia.cs.cmu.edu/documents/jasis96.pdf
- How to install and configure speech recognition in Windows XP: http://support.microsoft.com/kb/306537
- Siri creates legal woe for Apple: http://www.washingtonpost.com/business/technology/siri-creates-legal-woe-for-apple/2012/03/13/gIQAg8U59R_story.html?tid=pm_pop
- NTT Docomo, Japan’s biggest mobile carrier announces Siri competitor, ‘Shabette Concier’: http://www.zdnet.com/blog/asia/japans-biggest-mobile-carrier-announces-siri-competitor-8216shabette-concier/1188
- Speech Recognition AS3 Preview 2: http://vimeo.com/14025090
- Medical transcription and speech Recognition: http://www.ideamarketers.com/?articleid=2571684
- Medical Speech Recognition Software Streamlines Workflow: http://ezinearticles.com/?Medical-Speech-Recognition-Software-Streamlines-Workflow&id=3740865
- Doctors Use Speech Recognition Tools To Enhance Patient E-Health Records: http://www.informationweek.com/software/productivity-applications/doctors-use-speech-recognition-tools-to/207800986
- The Disadvantages of Medical Speech Recognition: http://www.ehow.com/info_8548829_disadvantages-medical-speech-recognition.html
- Unilever Is Right -- Voice Is The New Mobile: https://www.mediapost.com/publications/article/308502/unilever-is-right-voice-is-the-new-mobile.html
- How voice technology is transforming computing: https://www.economist.com/news/leaders/21713836-casting-magic-spell-it-lets-people-control-world-through-words-alone-how-voice
- Google Launching Amazon Echo Competitor Without Nest: https://www.mediapost.com/publications/article/276789/google-launching-amazon-echo-competitor-without-ne.html
- Making audio searchable with Cloud Speech: https://hackernoon.com/making-audio-searchable-with-cloud-speech-
- Smart Speaker Sales Projected To Grow 50% By 2019: https://www.mediapost.com/publications/article/323608/smart-speaker-sales-projected-to-grow-50-by-2019.html
- Speech Recognition Finally On The Map: https://www.mediapost.com/publications/article/323607/speech-recognition-finally-on-the-map.html
- 52% Of Consumers Worry About Passive Listening By Their Voice Assistants: https://www.mediapost.com/publications/article/340889/52-of-consumers-worry-about-passive-listening-by.html
- Google To Refine Privacy Settings For Voice Assistant: https://www.mediapost.com/publications/article/341077/google-to-refine-privacy-settings-for-voice-assist.html
- Doing more to protect your privacy with the Assistant: https://www.blog.google/products/assistant/doing-more-protect-your-privacy-assistant/
- The Year Ahead -- The Rise Of Voice-Enabled News, Media Consolidation And More Partnerships: https://www.mediapost.com/publications/article/345159/the-year-ahead-the-rise-of-voice-enabled-news-me.html
- COVID-19 may push retailers to use voice assistants instead of touch screens: https://retailwire.com/discussion/covid-19-may-push-retailers-to-use-voice-assistants-instead-of-touch-screens/
- Australian researchers collect children’s voice biometrics to help with speech impairments: https://www.biometricupdate.com/202101/australian-researchers-collect-childrens-voice-biometrics-to-help-with-speech-impairments
- How to Delete Google Assistant Recordings: https://www.howtogeek.com/713484/how-to-delete-google-assistant-recordings/
- Changelog Podcast -- 25 years of speech technology innovation: https://changelog.com/practicalai/133
References
- ↑ wikipedia:Speech Recognition
- ↑ Common Voice dataset tops 20,000 hours: https://hacks.mozilla.org/2022/04/common-voice-dataset-tops-20000-hours/
- ↑ 37 Recognition APIs - AT&T Speech, Moodstocks and Rekognition: http://blog.programmableweb.com/2013/09/09/37-recognition-apis-att-speech-moodstocks-and-rekognition/
- ↑ Nuance launches mobile developer program: Will all apps be speech enabled?: www.zdnet.com/blog/btl/nuance-launches-mobile-developer-program-will-all-apps-be-speech-enabled/43771
- ↑ A Short History of Dragon Naturally Speaking Software: https://voicerecognition.com.au
- ↑ Vansonic HS-GEN-C poor audio quality: https://www.knowbrainer.com/forums/forum/messageview.cfm?catid=6&threadid=18517
- ↑ Samsung and LG smart TVs (unintentionally) share your voice data behind the fine print, use same solution by Nuance: https://www.consumerreports.org/cro/news/2015/02/who-s-the-third-party-that-samsung-and-lg-smart-tvs-are-sharing-your-voice-data-with/index.htm
- ↑ Announcing the first Technical Preview of Microsoft Azure Stack: https://azure.microsoft.com/en-us/blog/announcing-the-first-technical-preview-of-microsoft-azure-stack/
- ↑ Announcing the developer preview for Bing’s new Search APIs: https://blogs.msdn.microsoft.com/bingdevcenter/2016/03/15/announcing-the-developer-preview-for-bings-new-search-apis/
- ↑ Nuance Nina API: http://www.programmableweb.com/api/nuance-nina
- ↑ HTTP Services for Nuance Mobile Developer Program: http://dragonmobile.nuancemobiledeveloper.com/public/Help/HttpInterface/HTTP_Services_for_NDEV_v1.2_Silver_Version.pdf
- ↑ javax.speech API (JavaDocs): https://docs.oracle.com/cd/E17802_01/products/products/java-media/speech/forDevelopers/jsapi-doc/javax/speech/package-summary.html
See Also
Voice Recognition | STT | HCI | Nuance | Apple | Amazon | Google | Microsoft | A11Y