EnglishChinese

Archive: Searching

Using Spelling Suggestions

By Yusuf Mohsinally,
Sr. QA Engineer

Your users have probably misspelled a search query on more than one occassion. It shouldn't matter.

For example, an employee at a semiconductor company who types in "heaflouroethane" into an intranet search box should be asked if he meant to search for "hexafluoroethane."

Similarly, a user in the company's American manufacturing facility who searches for the same compound's "vapor pressure" should be asked if she also wanted to search for the term "vapour pressure", which is found in documents written at the company's R&D lab in the United Kingdom.

There's little doubt that spelling suggestions can be enormously helpful to users on your public website or intranet.

Ultraseek's spelling suggestions use dynamic context-sensitive algorithms that are based on documents that have been indexed. This means users receive alternate spellings that occur within the document base, which helps if a particular word has been misspelled within the documents themselves.

But, to avoid flooding a user with unhelpful suggestions, the suggested term must appear in at least three indexed documents.

If you would like to suggest known alternate terms that may be related to the user's search, you can use the thesaurus feature. For example, you may want to suggest the search term "Ultraseek" when a user searches for "Verity".

Or, use the QuickLinks feature to provide a link to the "best known" document for a particular search term.

Activating/Deactivating Spelling Suggestions

To activate (active by default) the spelling suggestion feature for a particular user interface style, select the "Show Spelling Suggestions" checkbox found under the Interface > Query tab in the admin interface.

Posted August 25, 2005 by editor

Thesaurus versus User Dictionary

By Ryan Weisenberger Manager, Software Development

Words have multiple meanings. People use different words to say the same thing. Nowhere is this more problematic than on your website. For example, while one user may enter a search query for "cell phone" another may type in "mobile phone," a third may use "cellular phone" while a fourth user may search for "wireless phone." This is called the vocabulary problem.

Since a concept may have one name on your website, but another name in the user's mind, you need a tool in your search engine to resolve the conflict.

Ultraseek has two ways of mapping one word to another. One is called the thesaurus, and the other is called the user dictionary. The method you choose to use should depend on what you are trying to accomplish. Synonyms and the Thesaurus
The thesaurus presents the user with the synonyms for their search terms at search time, like this. This allows the user to learn an alternate term for the concept which may be more appropriate for the site. Now you have educated the user on the proper term for the concept.

To generate your list of synonyms, you should run a Top Queries with No Results report and a Top Queries with No Clickthrough report under Activity > Reports. By carefully looking at the terms that either did not return results, or returned results but the user did not click on any of them, you can see which terms may need to be mapped to other terms in your corpus.

Now you can add those terms to the English language thesaurus. This is the thesaurus_en.xml file in the /language directory. Here is our example from above:

<set>
<show>blueberry</show>
<show>bilberry</show>
<show>whortleberry</show>
</set>

You can also use the <noshow> element if you do not want the term suggested as an alternative, but want it to display the other terms if searched for. After making these changes you'll need to restart Ultraseek.

Stemming and the User Dictionary The user dictionary, on the other hand, is used to tell the indexer that two words should be treated the same. While this sounds a lot like synonyms, there is a subtle difference. The user dictionary makes the connection between the terms invisible to the user, so you are in essence tricking them. They think they are searching for one term, but you give them the results from another. This can be a little confusing, and it should not be used instead of the thesaurus.

So when should you use the user dictionary? The user dictionary is meant to supplement the linguistic capabilities of Ultraseek. For example, the search engine can automatically map a plural word to its singular form. This way, a search for "geese" will match documents that contain "goose."

You may have a word on your site that can be plural, but does not appear in a standard dictionary. A good example of this is a product name. In that case, you can enter the plural form of the word in the user dictionary, along with its singular counterpart, so that Ultraseek knows to treat these words as the same.

To add a word to the English user dictionary, edit the en.usr in the /language directory. The correct format is WORD,ROOT:w. The entry below maps the plural of webserver to the correct singular form.

webservers,webserver:w

After making this change, you must restart Ultraseek, and reindex your content, before you will see it fully take effect.

Posted August 02, 2005 by editor

Relevance and User Satisfaction

Search relevance is usually thought of as a statistic that measures whether the search results match the query. That is useful in the lab, but not as useful for a search installation.

When search is part of a site, we need to understand how it helps the users of that site. Can they find things quickly? Are they comfortable with the search?

Focusing on user satisfaction helps avoid manager centered design, but you also need to know how the search engine helps your users. There are two main aspects of this: effectiveness and trust. You change different things to improve each of these.

In order to improve relevance, you must be very clear about what it is, and what it means to make it better. You might end up tweaking the engine, changing what content is indexed, adding editorial results (“Best Bets” or “Quick Links”), or changing the presentation.

I look at relevance two ways.

UI Effectiveness: Relevant results reduce the number of clicks before visitors reach their goal. With every click, you lose visitors, maybe as many as 10%.

Relevant results at the top mean fewer clicks. Ultraseek can measure the number of clicks per result page and report that. Fewer clicks is better, though zero clicks is not good, because it means the visitor left without visiting any results.

To put specific results at the top, use Quick Links. But make sure this is based on user behavior, not on the org chart or datasheets. Quick Links must be more relevant than the first result.

Transparency and Trust: When users have some clue about why the results are presented, they trust the engine more. This is a transparency issue, and I think it is the biggest advantage of passage-based summaries. The passages are the engine explaining, ‘this is why I’m showing you this document.’ It makes a huge difference in how comfortable visitors are.

Relevance also increases trust. Irrelevant Quick Links will decrease trust, so be careful.

By Walter Underwood Principal Software Architect

Posted June 22, 2005 by editor

Categories

Customizing

Indexing

Searching

Usability

User Stories

Archives

January 2006

November 2005

October 2005

September 2005

August 2005

July 2005

Recent Entries

Using Spelling Suggestions

Thesaurus versus User Dictionary

Relevance and User Satisfaction

Related Forum

How do I remove the Highlight link under the Find Similar link.

From frames to frameless

How can I obtain the value for number of search results (hits)?

Disabling Quick links

Limiting number of results displayed on results page

Resources

DOWNLOAD ULTRASEEK NOW!

XML   RSS Feed