EnglishChinese

Thesaurus versus User Dictionary

By Ryan Weisenberger Manager, Software Development

Words have multiple meanings. People use different words to say the same thing. Nowhere is this more problematic than on your website. For example, while one user may enter a search query for "cell phone" another may type in "mobile phone," a third may use "cellular phone" while a fourth user may search for "wireless phone." This is called the vocabulary problem.

Since a concept may have one name on your website, but another name in the user's mind, you need a tool in your search engine to resolve the conflict.

Ultraseek has two ways of mapping one word to another. One is called the thesaurus, and the other is called the user dictionary. The method you choose to use should depend on what you are trying to accomplish. Synonyms and the Thesaurus
The thesaurus presents the user with the synonyms for their search terms at search time, like this. This allows the user to learn an alternate term for the concept which may be more appropriate for the site. Now you have educated the user on the proper term for the concept.

To generate your list of synonyms, you should run a Top Queries with No Results report and a Top Queries with No Clickthrough report under Activity > Reports. By carefully looking at the terms that either did not return results, or returned results but the user did not click on any of them, you can see which terms may need to be mapped to other terms in your corpus.

Now you can add those terms to the English language thesaurus. This is the thesaurus_en.xml file in the /language directory. Here is our example from above:

<set>
<show>blueberry</show>
<show>bilberry</show>
<show>whortleberry</show>
</set>

You can also use the <noshow> element if you do not want the term suggested as an alternative, but want it to display the other terms if searched for. After making these changes you'll need to restart Ultraseek.

Stemming and the User Dictionary The user dictionary, on the other hand, is used to tell the indexer that two words should be treated the same. While this sounds a lot like synonyms, there is a subtle difference. The user dictionary makes the connection between the terms invisible to the user, so you are in essence tricking them. They think they are searching for one term, but you give them the results from another. This can be a little confusing, and it should not be used instead of the thesaurus.

So when should you use the user dictionary? The user dictionary is meant to supplement the linguistic capabilities of Ultraseek. For example, the search engine can automatically map a plural word to its singular form. This way, a search for "geese" will match documents that contain "goose."

You may have a word on your site that can be plural, but does not appear in a standard dictionary. A good example of this is a product name. In that case, you can enter the plural form of the word in the user dictionary, along with its singular counterpart, so that Ultraseek knows to treat these words as the same.

To add a word to the English user dictionary, edit the en.usr in the /language directory. The correct format is WORD,ROOT:w. The entry below maps the plural of webserver to the correct singular form.

webservers,webserver:w

After making this change, you must restart Ultraseek, and reindex your content, before you will see it fully take effect.

Posted August 2, 2005 08:08 AM by editor
Category: Searching

Categories

Customizing

Indexing

Searching

Usability

User Stories

Archives

January 2006

November 2005

October 2005

September 2005

August 2005

July 2005

Recent Entries

More Quality Quick Links

Quick Links in Action

Tuning the Search Relevance on Your Site?

'Richer Suite of Functionality'

Fueling Your Business Search Engine to Find the Right Answers

Resources

DOWNLOAD ULTRASEEK NOW!

XML   RSS Feed