Journalist Amit Tyagi works for a leading digital news platform and has to engage with several people through the day over Zoom Calls, telephonic discussions and webinars. What makes his life simpler is a transcription service called Speechnotes. Says Tyagi, “I speak, it notes down as its name suggests. Whatever I speak gets written down automatically. In the process I save a great deal of time.”
Lawyer Lubna Yusuf uses 'live transcribe' by Google and 'Samsung note' app, both of which were pre-installed in her Galaxy note phone. Says Yusuf, “I use transcription services to record certain client meetings, especially the complex ones that I need to refer to again. It helps me track the details later. I also use it like a virtual typist or secretary to take down notes when I'm thinking aloud.”
People are aware of the power of voice and don’t want a single breath to go waste. According to Bobble AI Data Intelligence, 18 per cent of users use ‘speech-to-text' daily. Says Rahul Prasad, Co-founder, Bobble AI, a conversation media platform: “There has been a massive increase in the adoption of transcription in the recent past. Bobble voice API enables developers to convert speech-to-text by using neural network models across diverse accents and dialects. Our ASR engine works with nine major Indian Languages--English, Hindi, Bengali, Punjabi, Marathi, Gujarati, Kannada, Tamil, and Telugu.”
Man versus machine
Transcription services allow you to save time and energy. How much time does it take to manually transcribe an hour of audio? The industry standard is four hours of transcription time for one hour of clear audio, or a 4:1 ratio. That's four minutes of transcription time for every minute of recorded speech. But with voice-to-text technology, things change dramatically. Says Sri Lanka based-Inoka Dias, who runs coaching classes: “I use a transcription service called Temi to record my lectures which span 45 minutes to an hour and I get the lectures transcribed in less than an hour. A lot depends on the number of words. In coaching sessions, you do get spells of silence that can run up to a minute or two at a stretch, but you are charged per word. So far I’ve paid $7-9.” These apps are hugely popular with students and teachers globally. Says Dias: “I find them 90 per cent accurate. Only sometimes certain names and a few difficult words aren't transcribed properly, that too because of the accent. The best part is that Temi eliminates noise and gives a time stamp, so that makes life a lot easier.”
Table 1: Transcription voice-to-text application companies/developers
No. | Application | Company/developer | Type of subscription |
1 | Cloud Speech-to-Text API | Google | Up to 60 minutes free* |
2 | Amazon Transcribe | Amazon | After free trial upto 60 minutes for 12 months, $0.0004 per second |
3 | Azure Cognitive Services AI platform 'Transcribe in Word' | Microsoft | 5 audio hours free per month** |
4 | Dragon Dictation | Nuance Communications, Inc. | Free |
5 | Speech to Text Converter - Voice Typing App | Nazmain Apps | Free |
6 | Speechnotes - Speech to Text Notepad | Speechlogger | $7.27 lifetime or 7 days free then $0.93 per month (Only extra feature and ads-free experience) |
7 | Speech to Text | Xenom Apps | Free |
8 | Voice Notebook - Continuous Speech to Text | Simple Seo Solutions | $3.30 for lifetime (For premium features and ads-free experience) |
9 | WhatsMic Keyboard: Voice to Text Converter App | APK Kajal | $9.25 for lifetime or $1.85 per month (For premium feature and ads-free experience) |
10 | Otter Voice Meeting Noyes (For English) | Otter.ai | $108.37 annually or $8.59 monthly (For premium) |
11 | Translate All - Text, Voice & Camera Translator | Asitis | Free |
12 | Voice Notes | Pacific Fisher Group | $2.51 lifetime for ads-free experience |
13 | Speech Texter - Speech to Text | Speech Texter | Free |
14 | Write SMS by Voice | UX Apps | $2.25 lifetime for ads-free experience |
15 | Voice Typing Keyboard - Speech to Text Converter | Appezite Studio | $1.12 lifetime for ads-free experience |
16 | Translate All Text Voice Conversation Translator |
Infinity Apps Sol | $4.76 for 1 month or $13.08 for 3 months or $25.11 for 6 months or $48.90 for 1 year (For unlimited feature and ads-free experience) | 17 | Voice Notes - Speech to Text Notes | Innovative World | Free |
18 | English Voice Typing: Voice to Text Converter | Solace Apps | Free |
19 | Temi | Rev | After first free transcript up to 45 minutes, $0.25 per audio minute |
*See Table 2; **See Table 3; Source: TechSci Research
What's available
Technology majors such as Google, Microsoft and Amazon all have such an offering. Initial trials are mostly free and you are charged per audio file later. There are some that continue to be free such as Dragon Dictation by Nuance Communications, Nazmain Apps’ Voice Typing app, Xenom Apps' Speech to Text, Asitis’ Translate, and Speech Texter. Some, like WhatsMic and Speechlogger charge a one-time fee with unlimited access.
Says Prashanth Rao, Partner, Deloitte India: “Most operating systems within the phone have voice-to-text embedded in the OS. iOS calls this as dictation. Apple is further enhancing iOS 14, which has just been released, by bringing translation from one language to another. Google has a speech-to-text application programming interface (API) based on cloud which can be used by any of the app developers to embed this feature into their apps. Amazon has a similar API-based service on cloud called Amazon Transcribe. You also have a bunch of voice-to-text apps on Play Store and iOS store.” There are innovations expected in this space. Says Rao: “We may soon be seeing collaborative tools such as MS Teams, and Zoom coming up with a feature to enable users to capture the conversation as notes. A bunch of note-taking apps also are looking to add this feature to the app.”
Accuracy is paramount
The clearer the audio, the higher the accuracy. Imagine an audio clip with two speakers with distinctly different accents, coining a few technical terms, industry jargon and niche brand names. The accuracy of transcription can vary from 90 to 100 per cent but that makes all the difference. Generally, a few factors affect accuracy. These include audio clarity, audio recording quality, number of speakers, background noise and regional accents. It also depends on the “coherence” of the speaker, that is, do the speakers talk over each other? Do they speak quickly or slowly? Do they finish a thought before beginning the next sentence? If it’s a specialised field such as medical or legal, a certain amount of research may be required to double-check names, places and specialised terminology. Other challenges may be related to use of short forms and out-of-dictionary words.
Speed is another crucial factor. Given enough time, we could all transcribe audio with close to 100 per cent accuracy, but these services are designed to take the manual labour out of transcription. From the moment we hit “upload” to the second that the transcription was finished; the timer was running. Transcription apps are rated on how fast they can convert voice to text with maximum accuracy.
Table 2: Google Cloud speech-to-text API pricing Feature | Standard models (all models except enhanced video and phone call) | Enhanced models (video and phone call) |
0-60 Minutes | Over 60 Minutes, up to 1 million minutes | 0-60 minutes | Over 60 minutes, up to 1 million minutes |
Speech recognition (without data logging-default) | Free | $0.006/15 seconds | Free | $0.009/15 seconds |
Speech recognition (with data logging opt-in) | Free | $0.004/15 seconds | Free | $0.006/15 seconds |
Transcription process
The process of transcribing an audio file is simple and logical. Visit the app or website and "Select Audio/Video File" from your phone or computer and upload it. Enter your email address. In a few minutes, you'll receive an email when your transcript is ready. You can then download the transcript in your preferred format such as word doc, pdf, txt, srt, or vtt.
Many websites allow you to place your first order for free. For example, with Temi you will be able to place your first free trial for files spanning 45 minutes or less. After your first free transcript, Temi orders will cost $0.25 per audio minute. Says Sebastian Lanser, Temi Support: “Once your first order has been placed, you'll be prompted to set up your account when you open your transcript in the delivery email. Once your account is set up, you'll have full access to the Temi Editor to review and edit your transcript as well as our various file formats for download.”
What about video-to-text?
The good thing is that video platforms have started with this service, at least in their advanced versions.
Take the case of Zoom. The platform offers cloud recording transcripts in its business and enterprise plans. Says Lola Garcia Santos, Account Executive, Zoom Video Communications: “The business license which costs $199.90 per host licence a month or $1,999 per host license a year allows 300 participants per meeting and transcriptions on cloud recordings. Zoom has an ‘audio transcript option’ under cloud recording, to automatically transcribe the audio of a meeting or webinar recorded to the cloud. After this transcript is processed, it appears as a separate vtt text file in the list of recorded meetings. In addition, you have the option to display the transcript text within the video itself, similar to a closed-caption display.”
The transcript is divided into sections, each with a timestamp that shows how far into the recording that portion of the text was recorded. You can edit the text to more accurately capture the words, or to add capitalisation and punctuation, which are not captured by the transcript.
Table 3: Microsoft Azure Cognitive Services AI platform 'Transcribe in Word' pricing Feature | Feature | Pricing |
Free - Web/Container | Standard | 5 audio hours free per month |
One concurrent request | Custom | 5 audio hours free per month |
Endpoint hosting: 1 model free per month |
. | Conversation Transcription Multichannel Audio | 5 audio hours free per month |
Standard - Web/Container | Standard | $1 per audio hour |
20 concurrent requests | Custom | $1.40 per audio hour |
Endpoint hosting: $0.0538 per model per hour |
. | Conversation Transcription Multichannel Audio | $2.10 per audio hour |
For me-time too
These AI-powered transcription tools can be useful in one’s personal life apart from professional work. Take the case of IT professional Hanif Sohrab who uses voice-to-text apps when on the move especially while jogging. Says Sohrab: “I use Microsoft OneNote to record my thoughts during my morning walks. I talk to my phone on what all needs to get done such as urgent emails, or WhatsApp replies that require immediate attention--or general thoughts that come to my head while walking, related to an article I may be writing, or some software logic that I was thinking about. I use Microsoft Cortana while at work in front of my personal computer (PC). In my experience, Microsoft OneNote app is fairly accurate whereas Microsoft Cortana, which is used on my PC, needs editing.”
Some people use transcription services for storytelling and capturing interesting anecdotes that others randomly share. Take the case of techie Sami Iqram who shares in his blog how he was at his friend's place when the friend’s grandmother narrated a story from her childhood. Says Iqram: “I could see that she was excited about sharing it with everyone but there was a problem—she narrated the story in Spanish, a language I don’t understand. I pulled out Google Translate to transcribe the speech as it was happening. As she was telling the story, the English translation appeared on my phone so that I could follow—it fostered a moment of understanding that would otherwise have been lost.”
All it takes is your voice. Transcription services nudge you to harness the power of your voice. Now more than ever, we’re all very busy—juggling family, work, friends, and whatever else life throws our way. These are tools that allow us to immerse and revel in the conversation at hand, the idea, the thought and “live” in the moment. Transcription services allow us to live mindfully.