Release Notes for Inktomi Search Software 4.1 --------------------------------------------- Database Indexing ----------------- Inktomi Search now supports a database collection to index content stored in a relational database. On Windows NT or 2000, Inktomi Search can access any ODBC datasource. On Solaris and Linux, Oracle7 and Oracle8 are currently the only supported databases. ATTENTION: Before database content can be indexed, the administrator of Inktomi Search must satisfy the following requirements: On Windows NT or Window 2000, an ODBC data source needs to be configured in the ODBC Administrator available through the Control Panel. On Solaris or Linux, the Oracle Client provided by Oracle must be installed, and the ORACLE_HOME environment variable must be set in database/oracle_config under the Inktomi Search program directory. Please read the comments in database/oracle_config for more information. Korean ------ Optional language support for Korean is now available. This includes linguistic support, and localized search and help pages. Thesaurus Improvements ---------------------- The single thesaurus.txt file has been replaced by multiple files in XML. thesaurus_en.xml is for English, thesaurus_de.xml is for German, and so on. The "la" parameter in the query pages sets the language for the page and the thesaurus for that language is used. The XML files may use encodings of ISO-8859-1 or UTF-8. The thesaurus now allows terms that will match a term in the query, but will never be shown as a term. For example, the thesaurus might have "Inktomi" and "Ultraseek" as terms which should match queries and should be shown as matches. It might also have the misspelling "Inkotmi" as a term which should match queries, but never be shown as an alternative. To include that in the thesaurus_en.xml file, do: Inktomi Ultraseek Inkotmi Thesaurus files are loaded as part of language initialization. To reload them, restart Inktomi Search. Import Sitemap -------------- Import sitemap under topics now creates suggested starred URLs automatically. XML Mappings for Attributes --------------------------- Values of XML attributes can be searched now. To map an attribute to a fieldname for search, enter an ampersand (&) in the last field of the mapping table. To map the title attribute in this element to the "title:" field:
use this mapping in the XML mappings page of the server section of the admin UI: Fieldname Element Attr. Name Attr. Value title article title & The string "Content in Attributes" will be mapped to the title of the document. Wireless Markup Language ------------------------ Special support has been added for Wireless Markup Language (WML) documents. WML documents do not have a reliable title. Heuristics are used to build a title from various elements of the document. The MIME media types (content-types) for WML have been added to the allowed MIME types. Upgrading Customizations ------------------------ Patches.py - If you are currently running Ultraseek Server 3.X or older and have made customizations to your patches.py file, you will not be able to use your old customized patches.py file verbatim. Changes have been made to some parameters and return values of the procedures patched in the patches.py file. If you have customized your patches.py file, you should re-do your customizations in the patches.py file included in this release. The new patches.py file is located in the lib/python2.0 directory under the install directory. An existing patches.py file located in lib/python1.6 will not be read. If you are currently running InktomiSearch 4.0.X, you can use your existing customizations to patches.py. To do so, you must copy your patches.py file from the lib/python1.6 directory to the lib/python2.0 directory. Changes ------- New Python Version - Inktomi Search 4.1 is based on Python 2.0. If your customizations have made use of code or features from an earlier version of Python, we recommend updating your customizations. When scheduling a new period of allowed operation, you will receive a warning if you enter a time period with frequency 'once' that has already passed. The initial startup screens will now prompt the user for email configuration information. Handle a ENOBUFS error in the http server. Known Bugs ---------- Adobe Acrobat files with multibyte fonts (composite or CMap fonts for Chinese, Japanese, or Korean) are not supported on any Unix platforms. The Unix versions of the Adobe PDF Library cannot extract text from those files. If those files are indexed, the error log may contain the message "The encoding (CMap) specified for a font is missing or corrupted." Adobe Acrobat files with single byte encodings that are not one of the Adobe standard encodings may have some misinterpreted characters. Central European fonts use Latin-2, which is not an Adobe standard encoding. The Adobe PDF Library does not properly handle these encodings. RELEASE NOTES FOR THE LINUX RELEASE SUPPORTED OS VERSION -------------------- Inktomi Search has been tested on RedHat Linux 6.0, with a kernel version of 2.2.5 and glibc 2.1.1. C++ LIBRARY REQUIRED -------------------- The C++ runtime library libstdc++-libc6.1 is required to run Inktomi Search. If you are using RedHat Linux, this file is part of: Version RPM Package ------- ----------- 6.X libstdc++-2.9.0 7.X compat-libstdc++-6.2-2.9.0 REPORTING A BUG --------------- When reporting a problem, be sure and include your Linux distribution, the version of your kernel (uname -r), and the version of your glibc. Send problem reports to software-bugs@ultraseek.com. SUPPORT FOR POSTSCRIPT ---------------------- Inktomi Search will index postscript files if you have installed ghostscript and it is on your path. Patchlevel 1 Add Merant copyright notice to copyrights page. Configure e-mail before the initial collection. Don't blow up if we delete a collection that has been deleted. When installing 4.X, shutdown Ultraseek 3.X properly when doing an upgrade. Define a magic hostname, thisserver, that will be replaced with the actual hostname:port of the built-in web server when serving database search results. Keep track of our progress when scanning a database so we don't always restart from the beginning when pyseekd restarts. Index database views in addition to tables. Restart will now make new oracle_config take effect. Print contents of oracle_config file in the os status. Patchlevel 2 Always use Korean language support to process Korean queries. Japanese and Korean linguistic support upgraded to version 3.3. These produce different stems, so Japanese and Korean content must be re-indexed. Kanji compound breaking enabled for Japanese. Pass when database driver cannot return an encoding. Distribute dbrec.html in the docs/db directory. Ignore spaces in Japanese text. Obey Content-Language in HTTP header and HTTP-EQUIV meta tag. Decode ASCII-compatible encodings in URL params, for example, ISO-2022. In EUC-JP, use ASCII instead of JIS-Roman for code set 0. Character 0x5C is now backslash instead of Yen sign, and 0x7E is tilde instead of overbar. Update translations for Japanese and Norwegian localizations. Enable term highlighting for Japanese and Korean. Do not highlight stopwords in Japanese or Korean. Do not highlight terms or show word scores for find similar queries. CCE related topics are now calculated from documents that match rules and individual URLs marked with stars. Previously, only rule matches were used for related topics. In CCE, add more options for showing number of related topics and subtopics (10, 20, 50, all). Join database fields with ';' to create body text. Upgrade to OutsideIn 7.0 filters. Support double-byte PDF 1.2 files. Minor HTML updates for handicapped-accessibility: alt text for spacer images, and content-language header in header.html. Accept cookies whose value contain "=". In CCE, when importing topics, handle URLs missing final slash. Fix rare crash when handling documents in the HZ character encoding. Update to Rosette 3.0.5. Clean up summaries for MS Word documents. Patchlevel 3 Fix handling of database records after rescan. Patchlevel 4 Fix relevance bug in double pipe queries Added option to set indexing weights for remote anchor text in spider collections. This was inadvertantly removed in 4.0. Allow mirroring of localization files from server. Remove obsolete "quality weight" parameter. Reset quality weight to 1 when upgrading from a pre-4.0 installation. Add spaces between contents of XML elements, even if the document doesn't include spaces: AB will be indexed as "A B". Handle illegal language specs in Content-language HTTP headers and HTTP-EQUIV meta tags. Ignore illegal Unicode characters in UTF-8 encoding. Check values in URL parameters, remove and log illegal ones. Correct HTML markup in "special searches" help pages. Add merge button to merge collection status page. Catch OSErrors, and specially handle ENAMETOOLONG in built-in web server