Archive: July 2005Don't Reindex Every Week!Walter Underwood, Principal Software Architect, Verity If you have used other search engines, you probably had to manually configure your indexing schedule to make sure new content was found and indexed. This is not necessary with Ultraseek. Ultraseek has "continuous spidering with real-time indexing and adaptive revisit intervals." It sounds complicated, but it means that Ultraseek will automatically spider most pages at the right times. The Ultraseek spider is always on, always ready to find URLs and index documents. This is called continuous spidering. When a new or changed document is found, it is immediately indexed and is available for the next query. This is called real-time indexing. How does the spider decide when to revisit a URL to check for changes? It measures page change rates and adjusts to match them. This is called adaptive revisit intervals. For every URL it visits, the spider tracks how often it changes. It uses that information to choose a revisit interval. If a page changes every day, it is visted every day. If another one changes every week, it is visited weekly. Let's think about a sample site, with press releases and a page listing recent press releases. The page listing the press releases will change frequently, and Ultraseek will visit it often, finding the new press releases promptly. Individual press releases won't change, so Ultraseek will adjust their revisit interval to the maximum, about one visit per month. So, if you are planning to set up regular revisit schedules, don't do it right away. Let Ultraseek run for a while and adjust to your website. Then, when you want a really fresh index, integrate your publishing system with Add URL. That will get new pages into the index in a few seconds, not just once a week.
Posted July 21, 2005 by editor
Keeping Your Index Fresh with Add URLWalter Underwood, Principal Software Architect, Verity Everyone wants their search index to be an accurate, timely reflection of their content. Ultraseek automatically revisits pages to find new URLs, and that is very effective, but some sites have even stronger reqirements for how quickly documents need to be available in search results. This is called "index freshness." A stale index frequently misses new pages and has old information including pages that have already been deleted, and old copies of pages that have changed since they were indexed. For maximum index freshness, use Ultraseek's Add URL feature for notifications of deleted, changed, or new URLs. Add URL does what it says and a little more. You pass in a URL, and the spider adds it to the URLs it will crawl, putting it on the highest-priority queue. It also forces a revisit and reindex if the URL is already known to the spider. If a URL is new, it will be visited and indexed (if it isn't a duplicate). If the document at that URL is in the index and has changed, it will be reindexed. If there is no longer a document at that URL, the old document will be removed from the index. Add URL has both a user and a programmatic interface. Users can access it from the help pages. Administrators can access it there, or on the URL tab of a spider collection. Programs can do exactly what the UI form does, send an HTTP POST or GET with the URL, or they can use the dedicated Java API or SOAP Web service. Doing Add URL with HTTP is straightforward. Assuming that Ultraseek is installed at search.example.com, and that you are notifying it about changes at http://example.com/new.html, access this URL with a GET: http://search.example.com/help/addurlgo.html?url=http://example.com/new.html For Add URL with the XPA Java API, see SpiderCollection.addURL in the Javadoc shipped with the library. For the SOAP Web service see the VisitURL operation in the Web services documentation. The URL to be added is checked against the collection URL filters first, so the search administrator still has control over what is allowed in the collection. Web page authors and site webmasters can use Add URL to keep their pages fresh in the search index without making requests to the search administrator. This saves time for everyone. Add URL usually updates the index very quickly, often in a few seconds. Sometimes, the document will have been visited and reindexed by the time the URL Status page is shown. Make sure that the website is updated before you send the Add URL notification. We had one customer do the Add URL before pushing changes to the site, and even though it was only a two second delay, Ultraseek visited the URL before the content arrived and got a 404 response from the Web server. A spider curfew or a suspended spider will prevent Add URL from taking effect immediately. So allow the spider to run and make sure the content is pushed first. For the best possible index freshness, integrate Add URL into your publishing system. Right after publishing any change to your website, notify Ultraseek of all the URLs with changes through Add URL. Moments later, the search index will be updated and your changes will be published to both the site and the index, so that visitors can find the new pages through browsing or through search.
Posted July 19, 2005 by editor
'Implementation Went So Smoothly'Marsha Luevane, Search Engine Manager at U.S. Department of Energy's National Renewable Energy Laboratory, sits down to discuss her organization's selection, use and longtime loyalty towards Ultraseek. How long has the U.S. Department of Energy's National Renewable Energy Laboratory been using Ultraseek technology? Where is Ultraseek used today in your organization? Why did you choose Ultraseek? What are some of your best Ultraseek search examples? Ultraseek's Spell Suggest feature is also widely used because there are so many technical terms on www.nrel.gov and www.eere.energy.gov. For example, a user who types in the search query 'photovoltiacs' automatically gets the correct spelling suggested, 'photovoltaics.' This is an excellent feature. How do you use Ultraseek's Reporting Manager capabilities? How much work goes on behind the scenes to keep Ultraseek running smoothly at NREL?
Posted July 14, 2005 by editor
Learn Python in 10 Minutes or LessRyan Weisenberger Python is the primary language used throughout Ultraseek. Python is only a few years old, but has quickly become robust and powerful. Having an understanding of Python can help you perform advanced customizations of the user interface, and easily augment Ultraseek’s behavior through Step 1. Download the Python interpreter
Step 2. Run PythonAfter you have installed Python on your system, you should run it according to the Python installation instructions. As Python starts, it will display something like the following:
That For example, enter Step 3. Take the Python tutorialNow that you have a fully functioning Python interpreter on your system, you should take the Python tutorial. This tutorial will guide you through the basics of Python, such as the syntax, flow control, and the use of modules. Step 4. Read a book or twoAt this point, you should have a strong enough grasp of the basic Python programming concepts to go meddle around in the Ultraseek user interface, or maybe even
Posted July 08, 2005 by editor
My Favorite Customer ProblemBy Walter Underwood We are concerned about all of the problems reported by our customers, but there is one problem I don't mind hearing about. At least once a month, we hear from a customer that has not had to touch their Ultraseek installation for months, and has forgotten the password. Sometimes, the search administrator changes jobs, and Ultraseek runs with no attention for a year or more. We immediately explain how to set a new admin password, but I'm always happy that Ultraseek has been so reliable that it has needed no attention at all for months, maybe even years. We don't know the record for unattended operation, but one university ran for five years without any support calls or upgrading to a new version, so we assume it didn't have any serious problems.
Posted July 05, 2005 by editor
|
CategoriesArchivesRecent EntriesKeeping Your Index Fresh with Add URL 'Implementation Went So Smoothly' Learn Python in 10 Minutes or Less Resources |