Speech Recognition
From BC$ MobileTV Wiki
In computer science and information technology, Speech Recognition is the act of receiving a speaker's voice as audio input within a specific program, and subsequently rendering that input into a machine understandable format within a given software program, system or platform. [1]
Speech Recognition can be best characterized as "knowing or deducing what the speaker has said".
Specs
- Web Speech API Specification: http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
Tools
- Microsoft Speech API: http://msdn.microsoft.com/en-us/library/jj127860.aspx
- wikipedia: Microsoft Speech API
- Bing - Speech Search, Synthesis & Commands (for Windows8 / WindowsPhone users): http://www.bing.com/dev/en-us/speech
- Google Chrome - WebSpeech API (DEMO): https://www.google.com/intl/en/chrome/demos/speech.html
- iSpeech: http://ispeech.org | iSpeech API Specification Version 2.0
- Speex - A Free Codec For Free Speech: http://www.speex.org/
- Asterisk is an open-source VoIP platform/server which can be used for call-in IVR or Speech Recognition by Phone: http://www.asterisk.org/downloads
- Nuance features the largest vocabulary (500,000+ words): http://www.nuance.com/mobilesearch/
- IBM ViaVoice seems easy to use and efficient (2007 award): http://www-306.ibm.com/software/pervasive/embedded_viavoice/
- Lumenvox is cheap to do tests and trials on (just $50 US for 500-word test engine): http://www.lumenvox.com/ (Voice Recognition & Speech-to-Text both)
- GotVoice - Mobile VoiceMail and Speech-to-Text: http://www.gotvoice.com
- Advanced Media - Japanese company with AmiVoice software, specially targeted for Mobile Phones: http://www.advanced-media.co.jp/ [1]
- 3M SyncStream: http://solutions.3m.com.au/wps/portal/3M/en_AU/HIS_AU/home/products-services/dictation-transcription/syncstream/
- FirstDraft: http://firstdraft.infraware.com/
- VoxReports: http://www.atirix.com/VoxReports.aspx
- SpeechQ (for Radiology): http://mmodal.com/products/speechq/
- MediSpeech: http://www.g2speech.com/solutions/medispeech.html
Resources
- SpeechAPI: http://www.speechapi.com/
- Nuance Dragon API: http://dragonmobile.nuancemobiledeveloper.com[3][4]
- Web-Accessible Multimodal Internet Applications (WAMI): http://wami.csail.mit.edu (Speech/Voice for Flash)
- Microphone activity (FLASH): http://livedocs.adobe.com/flash/9.0/main/wwhelp/wwhimpl/common/html/wwhelp.htm?context=LiveDocs_Parts&file=00001864.html
- 10 Types of Microphones: http://electronics.howstuffworks.com/gadgets/audio-music/question309.htm
- Speech-enabled Mobile Web Apps via Nuance's new Dragon Speech Recognition API: http://dragonmobile.nuancemobiledeveloper.com/phpbb/viewtopic.php?f=21&t=416&p=1771#p1771
- wami -- A Java-script API for speech recognition: https://code.google.com/p/wami/
Tutorials
- Merapi AIR/Java Speech Recognition and Voice Control: http://javadz.wordpress.com/2009/06/14/42/
- HOW TO - Developing Dragon NaturallySpeaking applications that recognize speech: http://www.chant.net/support/knowledgebase/howtos/h071128.aspx
- How to Add Voice Interactivity to Your Site: http://dev.opera.com/articles/view/add-voice-interactivity-to-your-site/ (Works only with WindowsXP)
- How To Use Speech Recognition in Windows XP: http://support.microsoft.com/kb/306901
- How Electret Energy Harvester (and Microphones) Work: http://tikalon.com/blog/blog.php?article=2011/electret
- Speaking in Context - Designing Voice Interfaces: http://punchcut.com/perspectives/speaking-context-designing-voice-interfaces
- Chunked Transfer-Encoding in PHP With Guzzle: http://mtdowling.com/blog/2012/01/27/chunked-encoding-in-php-with-guzzle/
- How to backup and restore a Dragon User Profile: https://nuance.custhelp.com/app/answers/detail/a_id/15129/~/how-to-backup-and-restore-a-dragon-user-profile
- Dragon Backup location: http://www.pcspeak.com/hints/general/backup.shtml
- Setting Speech Recognition & Text-To-Speech settings in Windows 7: Setting speech options: http://windows.microsoft.com/en-us/windows/setting-speech-options#1TC=windows-7
- Making the Most of Cortana in Windows 10: http://www.dummies.com/how-to/content/making-the-most-of-cortana-in-windows-10.html
External Links
- wikipedia: Speech Recognition
- wikipedia: Telematics
- wikipedia: Chunked transfer encoding
- Deconstructing Google Mobile's Voice Search on the iPhone: http://waxy.org/2008/11/deconstructing_google_mobiles_voice_search_on_the_iphone/
- A Short History of Dragon Naturally Speaking Software: https://voicerecognition.com.au
- Cheap, Easy Human-powered Audio Transcription with Amazon's Mechanical Turk: http://waxy.org/2008/09/audio_transcription_with_mechanical_turk/
- Japanese SINGER, SONG WRITER... voice recognition, shaping and correction [2]
- HSTP: Hyperspeech Transfer Protocol: http://www.readwriteweb.com/archives/hstp_hyperspeech_transfer_protocol.php
- Speech Recognition for a Digital Video Library: http://www.informedia.cs.cmu.edu/documents/jasis96.pdf
- How to install and configure speech recognition in Windows XP: http://support.microsoft.com/kb/306537
- Nuance launches mobile developer program: Will all apps be speech enabled?: www.zdnet.com/blog/btl/nuance-launches-mobile-developer-program-will-all-apps-be-speech-enabled/43771
- Siri creates legal woe for Apple: http://www.washingtonpost.com/business/technology/siri-creates-legal-woe-for-apple/2012/03/13/gIQAg8U59R_story.html?tid=pm_pop
- NTT Docomo, Japan’s biggest mobile carrier announces Siri competitor, ‘Shabette Concier’: http://www.zdnet.com/blog/asia/japans-biggest-mobile-carrier-announces-siri-competitor-8216shabette-concier/1188
- Speech Recognition AS3 Preview 2: http://vimeo.com/14025090
- Medical transcription and speech Recognition: http://www.ideamarketers.com/?articleid=2571684
- Medical Speech Recognition Software Streamlines Workflow: http://ezinearticles.com/?Medical-Speech-Recognition-Software-Streamlines-Workflow&id=3740865
- Doctors Use Speech Recognition Tools To Enhance Patient E-Health Records: http://www.informationweek.com/software/productivity-applications/doctors-use-speech-recognition-tools-to/207800986
- The Disadvantages of Medical Speech Recognition: http://www.ehow.com/info_8548829_disadvantages-medical-speech-recognition.html
References
- ↑ wikipedia:Speech Recognition
- ↑ 37 Recognition APIs - AT&T Speech, Moodstocks and Rekognition: http://blog.programmableweb.com/2013/09/09/37-recognition-apis-att-speech-moodstocks-and-rekognition/
- ↑ Nuance Nina API: http://www.programmableweb.com/api/nuance-nina
- ↑ HTTP Services for Nuance Mobile Developer Program: http://dragonmobile.nuancemobiledeveloper.com/public/Help/HttpInterface/HTTP_Services_for_NDEV_v1.2_Silver_Version.pdf
See Also
Voice Recognition | STT | HCI