Rakta Papneja, a video journalist with the Network 18 group, has to record interviews, do video calls, and produce podcasts. For all that work she uses transcription apps.
"Rather than painstakingly converting the voice to text, I use transcription apps to simplify my work,” she says. "The only downside is that sometimes it (the app) doesn't catch the right words due to accents.”
Whether you’re a reporter interviewing people, a lawyer meeting clients, or an entrepreneur recording chats, you want to focus on the conversations without having to take notes or transcribe. Transcription apps help in that.
“Now more than ever, we’re all very busy — juggling family, work, friends, and whatever else life throws our way. I use Google Speech-to-Text transcription services for storytelling and recording anecdotes that others share. All it takes is voice and technology helps us to harness the power of our own voice,” says Saira Ahmed, a writer who lives in Gurugram.
What time does it take to transcribe an hour of audio? The industry standard is four hours of transcription time for one hour of clear audio, or a 4:1 ratio.
There are several transcribing tools and the best for you will depend on needs and budget. Google Speech-to-Text is a free tool that uses machine learning to transcribe audio files in real time. It is easy to use and can handle multiple languages and dialects. Dragon NaturallySpeaking is paid and uses voice recognition technology; it too can handle multiple languages and dialects. Express Scribe is a good option for journalists, researchers, and other professionals who need to transcribe audio files quickly and accurately. Transcription tools of technology majors Google, Microsoft and Amazon offer trials for free often for the first 60 minutes and then charge fees. Dragon Dictation by Nuance Communications, Voice Typing by Nazmain Apps, and Translate All by Asitis are free for life. Some apps have a one-time fee and others charge per audio file.
The speech-to-text industry is worth $2.2 billion and is expected to grow at a compound annual growth rate (CAGR) of almost 19 per cent, says Chandrashekar Mantha, partner, media & entertainment sector leader, Deloitte India. By 2026, its forecasted value is $5.4 billion. As voice-based devices and services expand, demand for transcription apps is growing. Transcribed data can be converted to actionable insights for business use.
“Generally, the accuracy of these apps is between 60-65 per cent but with further fine tuning based on the context to be derived from the voice recording, the accuracy can be improved to 80-90 per cent,” says Mantha.
Transcription apps usually have English as the default language and support German, Spanish, French, Italian, Portuguese, Mandarin and other languages.
The transcription process is simple: Open the app and "select audio/video file" from your phone or computer and upload it. Enter your email address. In a few minutes, you'll receive an email when your transcript is ready. You can then download the transcript in your preferred format (Word Doc, PDF, TXT, SRT, or VTT.)
“I can only rely on transcription apps if there is a seamless recording in one language by one person and if the quality of recording is also above average. But the minute there is background noise, or more than one speaker, I have to manually record it and there is no other way,” says Asmit Dagar, a Phd student at Delhi University.
Several factors shape accuracy in transcribing: Audio recording quality, number of speakers, background noise and accents. If the conversation is about a specialised field, some research may be required to confirm names and terms.
Companies engage professionals to make changes real time while an audio file is being transcribed by an app. Archana John, who has more than 12 years of experience in transcribing for companies and non-profits, says her work has become a "hybrid model".
“For one hour of recording, I spent four hours on transcribing it. The payment per hour was usually between Rs 3,000-4,000 from the corporates. But for the last two years, I have been working on a hybrid model of speech to text engine along with manual intervention,” she says.
The legal firm that engages John has three other persons for editing and correcting what a transcription app’s work. “A pure transcription app will not work because pronunciation may be different, there may be more than one speaker A, B, C and oftentimes the statements of speaker B are quoted in speaker A domain,” she says.
A language’s nuances and dialects shape the accuracy of transcription. “Language diversity in India is more than any other country and transcription apps face a bigger challenge here. We are home to over 30 regional languages with one million or more speakers with unique dialects across different states and regions. To add to that, Indians also combine two languages in their speech for example, Hindi and English (Hinglish), that further makes it difficult for the speech to text tools to give a reasonably accurate output,” says Deloitte’s Mantha.
Company | App | Subscription type | |
Google | Cloud Speech-to-Text API | Free for up to 60 mins and $0.004-0.009 per15 seconds after that | |
Amazon | Amazon Transcribe | Free for up to 60 minutes for 12 months. $0.0004 per second after that | |
Microsoft | Azure Cognitive Services AI platform 'Transcribe in Word' | 5 audio hours free monthly and after that $1-2.10 per audio hour | |
Nazmain Apps | Speech to Text Converter - Voice Typing | Free | |
Nuance Communications, Inc. | Dragon Dictation | Free | |
Speechlogger | Speechnotes - Speech to Text Notepad | $7.27 lifetime, $0.93 per month | |
|
Xenom Apps | Speech to Text | Free | |
Simple Seo Solutions | Voice Notebook | $3.30 for lifetime | |
|
APK Kajal | WhatsMic Keyboard: Voice to Text Converter App | $9.25 for lifetime or $1.85 per month | |
Otter.ai | Otter Voice Meeting Noyes (For English) | $108.37 annually, $8.59monthly | |
|
Pacific Fisher Group | Voice Notes | $2.51 lifetime for ads-free experience | |
SpeechTexter | Speech Texter - Speech to Text | Free | |
UX Apps | Write SMS by Voice | $2.25 lifetime for ads-free experience | |
Appezite Studio | Voice Typing Keyboard - Speech to Text | $1.12 lifetime for ads-free experience | |
|
Infinity Apps Sol | Translate All Text Voice Conversation | $4.76 for 1 month, $13.08 for 3 months, $25.11 for 6 months, $48.90 for 1 year | |
Source: TechSci Research