<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Ultraseek Articles</title>
<link>/articles/</link>
<description></description>
<language>en</language>
<copyright>Copyright 2005</copyright>
<lastBuildDate>Fri, 24 Jun 2005 08:04:18 -0800</lastBuildDate>
<generator>http://www.movabletype.org/?v=3.17</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs> 

<item>
<title>Why Not Use &quot;All Terms&quot; Queries?</title>
<description><![CDATA[<p>Google, Yahoo!, and MSN all default to matching all of your search terms, but Ultraseek does not. Why? What do you say when your users want Ultraseek to &#8220;work like Google&#8221;?</p>

<p>In most cases, it is good for an enterprise engine to behave like the WWW engines because users can intuitively transfer their searching skills. But, this is a case where doing the right thing is more expensive for the WWW engines, and more reasonable for enterprises.</p>

<p>This is a scalability vs. usability tradeoff. Requiring all terms to match allows the engine to run faster, but risks giving no results at all if the user makes even a minor mistake. A spelling error on one word means no matches, and that means no hits (assuming that your pages have good spelling!). This can be a serious issue. About 10% of all search queries contain some sort of spelling error.</p>

<p><strong>Why is &#8220;all terms&#8221; matching faster?</strong>
It&#8217;s faster because the search engine needs to process significantly less data. For a multiple-term query, requiring all terms to match means fewer matching documents. The engine handles shorter lists of matches, which uses less memory and CPU. In a WWW search engine, defaulting to &#8220;all terms&#8221; requires less hardware for search, saving a lot of money in hardware costs.</p>

<p><strong>Why is &#8220;any term&#8221; matching better?</strong>
It&#8217;s better because people frequently misspell and mistype queries. With an &#8220;all terms&#8221; search, one mistake can mean no results at all. Most users are completely stymied when they don&#8217;t get results. They may try the same query again, or they may just leave.</p>

<p>With &#8220;any term&#8221; matching, a single error still gives results. It may even give the right results. A search for Watler Underwood will still match Underwood even though Watler is a misspelling. If there is only one Underwood on the site, the error doesn&#8217;t even need to be corrected.</p>

<p><strong>What if your manager insists on &#8220;all terms&#8221;?</strong>
First, look at your search logs and reports. Use your percentage of &#8220;no hits&#8221; queries as a baseline (it should be around 10% or below). You can run an experiment, changing the default to &#8220;all hits&#8221; then check the new percentage of no hits. Or, you can run an experiment with your top 100 or 200 queries. Choose the ones which are obvious misspellings, and try them in &#8220;all hits&#8221; mode to see if the results are worse.</p>

<p>If you do change to &#8220;all hits&#8221; mode, you will be very dependent on the quality of the spelling suggestions. You will need to actively manage those, adding Quick Link suggestions for common misspellings which show up in your query logs.</p>

<p>To change the default to &#8220;all hits&#8221; mode, go to the Interface>Query page in the admin UI, choose the Style you want to modify and change the &#8220;Default required search terms&#8221; setting.</p>

<p><em>Walter Underwood
Ultraseek Principal Software Architect</em></p>
]]></description>
<link>/articles/archives/2005/06/why_not_use_all.html</link>
<guid>/articles/archives/2005/06/why_not_use_all.html</guid>
<category>Usability</category>
<pubDate>Fri, 24 Jun 2005 08:04:18 -0800</pubDate>
</item>
<item>
<title>Relevance and User Satisfaction</title>
<description><![CDATA[<p>Search relevance is usually thought of as a statistic that measures whether the search results match the query. That is useful in the lab, but not as useful for a search installation.</p>

<p>When search is part of a site, we need to understand how it helps the users of that site. Can they find things quickly? Are they comfortable with the search?</p>

<p>Focusing on user satisfaction helps avoid manager centered design, but you also need to know how the search engine helps your users. There are two main aspects of this: effectiveness and trust. You change different things to improve each of these.</p>

<p>In order to improve relevance, you must be very clear about what it is, and what it means to make it better. You might end up tweaking the engine, changing what content is indexed, adding editorial results (&#8220;Best Bets&#8221; or &#8220;Quick Links&#8221;), or changing the presentation.</p>

<p>I look at relevance two ways.</p>

<p><strong>UI Effectiveness:</strong> Relevant results reduce the number of clicks before visitors reach their goal. With every click, you lose visitors, maybe as many as 10%.</p>

<p>Relevant results at the top mean fewer clicks. Ultraseek can measure the number of clicks per result page and report that. Fewer clicks is better, though zero clicks is not good, because it means the visitor left without visiting any results.</p>

<p>To put specific results at the top, use Quick Links. But make sure this is based on user behavior, not on the org chart or datasheets. Quick Links must be more relevant than the first result.</p>

<p><strong>Transparency and Trust:</strong> When users have some clue about why the results are presented, they trust the engine more. This is a transparency issue, and I think it is the biggest advantage of passage-based summaries. The passages are the engine explaining, &#8216;this is why I&#8217;m showing you this document.&#8217; It makes a huge difference in how comfortable visitors are.</p>

<p>Relevance also increases trust. Irrelevant Quick Links will decrease trust, so be careful.</p>

<p>By Walter Underwood
Principal Software Architect</p>
]]></description>
<link>/articles/archives/2005/06/relevance_and_u.html</link>
<guid>/articles/archives/2005/06/relevance_and_u.html</guid>
<category>Searching</category>
<pubDate>Wed, 22 Jun 2005 07:22:21 -0800</pubDate>
</item>
<item>
<title>Less is More: Why fewer documents means better search results</title>
<description><![CDATA[<p>When the right answer is not returned by the search engine, people tend to believe the engine has not found all of the content available on the network. The truth is that it may have discovered too much content.</p>

<p>For each valuable page of information on your network, there are at least 10 pages of useless information, such as log files, zip archives, and dynamic pages.</p>

<p>For example, do you need to index every page of an on-line calendar that goes to the year 2023? Probably not.</p>

<p>In fact, with the inclusion of this low-value content, the user can be overwhelmed by the number of results that match their query term. Especially if it is a common one.</p>

<p>One of our customers had 4.5 million pages in their index. They adjusted their spider to skip a lot of low-quality content, and ended up with 1.5 million pages. The quality of the search results improved dramatically. A smaller index, but better results.</p>

<p>To choose what should be in or out of your index, try some of your most popular queries. For each irrelevant result, ask yourself "should this be in the index?"</p>

<p>It is easy to control what gets indexed: adjust the filters on the Collections > Filters tab in the admin console. You can disallow content by matching its URL against a wildcard or regular expression pattern. The filters are matched from the top down, so make sure your specific disallow filters come before your more general allow filters.</p>

<p>by Ryan Weisenberger<br />
Manager, Software Development</p>]]></description>
<link>/articles/archives/2005/06/less_is_more_wh.html</link>
<guid>/articles/archives/2005/06/less_is_more_wh.html</guid>
<category>Indexing</category>
<pubDate>Mon, 20 Jun 2005 07:39:48 -0800</pubDate>
</item>
<item>
<title>Welcome to the New Ultraseek.com</title>
<description><![CDATA[<p>Do you want to learn more about enterprise search? Would you like to improve your Ultraseek installation? Then you've come to the right place. This is a dedicated resource for the proud and growing Ultraseek user community. </p>

<p>Here, you'll find tips, tricks and proven best practices from those who know search engines best&#8212;the team of developers who build Ultraseek, the sales engineers and technical support managers who have worked with you to optimize your implementation, fellow customers and independent enterprise search engine experts.</p>

<p>Content on <a href="http://www.ultraseek.com">www.ultraseek.com</a> will be updated regularly with stories on how you can:</p>

<ul>
<li>Implement better search for your enterprise</li>
<li>Design sites to be searchable</li>
<li>Improve the quality of your Ultraseek search</li>
<li>Better understand your search users</li>
</ul>

<p>If you have a question or suggestion about the site, or would like to submit a story on how you are using Ultraseek, please email <a href="mailto:webmaster@ultraseek.com">webmaster@ultraseek.com</a>. We welcome your feedback and contributions to our site.</p>

<p>The Verity Team</p>
]]></description>
<link>/articles/archives/2005/06/welcome_to_the.html</link>
<guid>/articles/archives/2005/06/welcome_to_the.html</guid>
<category></category>
<pubDate>Thu, 16 Jun 2005 13:31:42 -0800</pubDate>
</item>


</channel>
</rss>