Microsoft Bing’s large-scale multilingual spelling correction models, collectively called Speller100, are rolling out worldwide with high precision and high recall in 100-plus languages.Bing says about 15% of queries submitted by users have misspellings, which can lead to incorrect answers and suboptimal search results.
To address this issue, Bing has built what it says is the most comprehensive spelling correction system ever made.
In A/B testing queries with and without Speller100, Bing observed the following results:
The number of pages with no results reduced by up to 30%.
The number of times users had to manually reformulate their query reduced by 5%.
The number of times users clicked on spelling suggestion increased from single digits to 67%.
The number of times users clicked on any item on the page went from single digits to 70%.
How did Bing accomplish this? Keep reading to learn more about Speller100.
Improving Spelling Correction in Bing Search Results
Spelling correction has long been a priority for Bing, and the search engine is taking it a step further with the inclusion of more languages from around the world.
“In order to make Bing more inclusive, we set out to expand our current spelling correction service to 100-plus languages, setting the same high bar for quality that we set for the original two dozen languages.”
The launch of Speller100 represents a significant step forward for Bing and is made possible due to recent advances in AI.
The technology behind Speller100 is explained in the company’s recent blog post. Here are some key details of Bing’s new spelling correction technology.
Microsoft Bing’s Speller100 Technology
Bing credits zero-shot learning as an important advancement in AI which helps make Speller100 possible.
Zero-shot learning allows an AI model to accurately learn and correct spelling without any additional language-specific labeled training data. This is in contrast to traditional spelling correction solutions which have relied solely on training data to learn the spelling of a language.
Relying on training data is challenging when it comes to correcting the spelling of languages where there’s an inadequate amount of data. That’s the problem zero-shot learning is designed to solve.
“Imagine someone had taught you how to spell in English and you automatically learned to also spell in German, Dutch, Afrikaans, Scots, and Luxembourgish. That is what zero-shot learning enables, and it is a key component in Speller100 that allows us to expand to languages with very little to no data.”
Spelling Correction is Not Natural Language Processing
Bing makes the the distinction that, although significant advancements have been made in natural language processing, spelling correction is a different task altogether.
All spelling errors can be categorized into two types:
Non-word error: Occurs when the word is not in the vocabulary for a given language.
Real-word error: Occurs when the word is valid but doesn’t fit in the larger context.
Bing has developed a deep learning approach to correcting these spelling errors which is inspired by Facebook’s BART model. However, it differs from BART in that spelling correction is framed as a character-level problem.
In order to address a character-level problem, Bing’s Speller100 model is trained using character-level mutations which mimic spelling errors.
Bing calls these “noise functions”:
“We have designed noise functions to generate common errors of rotation, insertion, deletion, and replacement.
The use of a noise function significantly reduced our demand on human-labeled annotations, which are often required in machine learning. This is quite useful for languages for which we have little or no training data.”
Noise functions allow Bing to train Speller100 to correct the spelling of languages for which there is not a large amount of misspelled query data available.
Instead, Bing makes do with regular text extracted from web pages which is gathered through regular web crawling. There’s said to be a sufficient amount of text on the web to facilitate the training of hundreds of languages.
“This pretraining task proves to be a first solid step to solve multilingual spelling correction for 100-plus languages. It helps to reach 50% of correction recall for top candidates in languages for which we have zero training data.”
While this is a meaningful advancement, Bing says 50% of recall is not good enough. That’s where zero-shot learning comes in.
For languages with no training data Bing utilizes the zero-shot learning property to target language families. This is done based on the notion that most of the world’s languages are known to be related to others.
“This orthographic, morphological, and semantic similarity between languages in the same group makes a zero-shot learning error model very efficient and effective…
Zero-shot learning makes learning spelling prediction for these low-resource or no-resource languages possible.”
Launching Speller100 in Bing is the first step in a larger effort to implement the technology in more Microsoft products.
Source: Microsoft Research Blog