On 20th August, Microsoft announced in its blog post that Microsoft conversational speech recognition system reached a new low of 5.1% that is almost the same as professional speed recorder, or the same as the human transcribers that have listened to the content several times. As its post said, the conversational speech recognition system will be used as the virtual assistant of Cortana or used in the software to translating PowerPoint.
Microsoft’s conversational speech recognition system has reached the highest accuracy rate that never had before. The company said that it is now 5.1% error rate and the record made last year was 5.9% error rate.
This great achievement in lowering error rate of its conversational speech recognition system made Microsoft very confident and claimed that currently its conversational speech recognition system now can reach the accuracy that professional human transcribers make. Even without any human-related benefits like listening to the text multiple times, the system can also achieve that high accuracy, according to Microsoft Artificial Intelligence and Research.
In 2016 and this year, the company studied the recordings stored in the Switchboard corpus to test the conversational speech recognition. Actually, the Switchboard corpus is the collection of about 2,400 telephone conversations performed by the researchers in the 1990s.
Microsoft Artificial Intelligence and Research planned to make it match human transcribers’ accuracy. We all understand that human beings are given the opportunity to listen to the recordings several times, while it is a luxury that the conversation speech recognition system cannot afford at all.
When comparing to results made last year, Microsoft researchers said it has make the error rate lower by 12% around. To make this progress, the company enhanced its neural net-based acoustic as well as the language models of this system. By far, Microsoft’s conversational speech recognition system can listen to the whole conversations and then adapt results accordingly.
Microsoft mainly made the contextual as well as prediction improvements which enable the system to properly judge which phrase and words are likely to be included next. With these improvements, Microsoft machine learning capabilities can be improved as well, so it can mimic how human beings carry out conversations and predict the word flow:
Microsoft investing in the long-term research now is paying the dividends for its customers in services and products like Presentation Translator, Microsoft Cognitive Services and Cortana. Microsoft is very happy that its research results are using by a number of people.