Release Notes for Inktomi Search Software 4.1
---------------------------------------------
Database Indexing
-----------------
Inktomi Search now supports a database collection to index
content stored in a relational database. On Windows NT or
2000, Inktomi Search can access any ODBC datasource. On
Solaris and Linux, Oracle7 and Oracle8 are currently the
only supported databases.
ATTENTION: Before database content can be indexed, the
administrator of Inktomi Search must satisfy the following
requirements:
On Windows NT or Window 2000, an ODBC data source needs to
be configured in the ODBC Administrator available through
the Control Panel.
On Solaris or Linux, the Oracle Client provided by Oracle
must be installed, and the ORACLE_HOME environment variable must
be set in database/oracle_config under the Inktomi Search program
directory. Please read the comments in database/oracle_config for
more information.
Korean
------
Optional language support for Korean is now available. This
includes linguistic support, and localized search and help pages.
Thesaurus Improvements
----------------------
The single thesaurus.txt file has been replaced by multiple
files in XML. thesaurus_en.xml is for English, thesaurus_de.xml
is for German, and so on. The "la" parameter in the query pages
sets the language for the page and the thesaurus for that language
is used. The XML files may use encodings of ISO-8859-1 or UTF-8.
The thesaurus now allows terms that will match a term in the
query, but will never be shown as a term. For example, the
thesaurus might have "Inktomi" and "Ultraseek" as terms which
should match queries and should be shown as matches. It might
also have the misspelling "Inkotmi" as a term which should
match queries, but never be shown as an alternative. To include
that in the thesaurus_en.xml file, do:
Inktomi
Ultraseek
Inkotmi
Thesaurus files are loaded as part of language initialization.
To reload them, restart Inktomi Search.
Import Sitemap
--------------
Import sitemap under topics now creates suggested starred URLs
automatically.
XML Mappings for Attributes
---------------------------
Values of XML attributes can be searched now. To map an
attribute to a fieldname for search, enter an ampersand
(&) in the last field of the mapping table. To map the
title attribute in this element to the "title:" field:
use this mapping in the XML mappings page of the
server section of the admin UI:
Fieldname Element Attr. Name Attr. Value
title article title &
The string "Content in Attributes" will be mapped to the
title of the document.
Wireless Markup Language
------------------------
Special support has been added for Wireless Markup Language
(WML) documents. WML documents do not have a reliable title.
Heuristics are used to build a title from various elements
of the document.
The MIME media types (content-types) for WML have been added
to the allowed MIME types.
Upgrading Customizations
------------------------
Patches.py - If you are currently running Ultraseek Server 3.X or
older and have made customizations to your patches.py file, you
will not be able to use your old customized patches.py file
verbatim. Changes have been made to some parameters and return values
of the procedures patched in the patches.py file. If you have
customized your patches.py file, you should re-do your customizations
in the patches.py file included in this release. The new patches.py
file is located in the lib/python2.0 directory under the install
directory. An existing patches.py file located in lib/python1.6 will
not be read.
If you are currently running InktomiSearch 4.0.X, you can use
your existing customizations to patches.py. To do so, you must
copy your patches.py file from the lib/python1.6 directory
to the lib/python2.0 directory.
Changes
-------
New Python Version - Inktomi Search 4.1 is based on Python 2.0. If your
customizations have made use of code or features from an earlier version
of Python, we recommend updating your customizations.
When scheduling a new period of allowed operation, you will receive a warning
if you enter a time period with frequency 'once' that has already passed.
The initial startup screens will now prompt the user for email
configuration information.
Handle a ENOBUFS error in the http server.
Known Bugs
----------
Adobe Acrobat files with multibyte fonts (composite or CMap fonts for
Chinese, Japanese, or Korean) are not supported on any Unix platforms.
The Unix versions of the Adobe PDF Library cannot extract text from
those files. If those files are indexed, the error log may contain
the message "The encoding (CMap) specified for a font is missing
or corrupted."
Adobe Acrobat files with single byte encodings that are not one
of the Adobe standard encodings may have some misinterpreted
characters. Central European fonts use Latin-2, which is not
an Adobe standard encoding. The Adobe PDF Library does not
properly handle these encodings.
RELEASE NOTES FOR THE LINUX RELEASE
SUPPORTED OS VERSION
--------------------
Inktomi Search has been tested on RedHat Linux 6.0, with
a kernel version of 2.2.5 and glibc 2.1.1.
C++ LIBRARY REQUIRED
--------------------
The C++ runtime library libstdc++-libc6.1 is required
to run Inktomi Search.
If you are using RedHat Linux, this file is part of:
Version RPM Package
------- -----------
6.X libstdc++-2.9.0
7.X compat-libstdc++-6.2-2.9.0
REPORTING A BUG
---------------
When reporting a problem, be sure and include your
Linux distribution, the version of your kernel (uname -r),
and the version of your glibc.
Send problem reports to software-bugs@ultraseek.com.
SUPPORT FOR POSTSCRIPT
----------------------
Inktomi Search will index postscript files if you have
installed ghostscript and it is on your path.
Patchlevel 1
Add Merant copyright notice to copyrights page.
Configure e-mail before the initial collection.
Don't blow up if we delete a collection that has been deleted.
When installing 4.X, shutdown Ultraseek 3.X properly when doing
an upgrade.
Define a magic hostname, thisserver, that will be replaced with
the actual hostname:port of the built-in web server when
serving database search results.
Keep track of our progress when scanning a database so we don't
always restart from the beginning when pyseekd restarts.
Index database views in addition to tables.
Restart will now make new oracle_config take effect.
Print contents of oracle_config file in the os status.
Patchlevel 2
Always use Korean language support to process Korean queries.
Japanese and Korean linguistic support upgraded to version 3.3.
These produce different stems, so Japanese and Korean content
must be re-indexed.
Kanji compound breaking enabled for Japanese.
Pass when database driver cannot return an encoding.
Distribute dbrec.html in the docs/db directory.
Ignore spaces in Japanese text.
Obey Content-Language in HTTP header and HTTP-EQUIV meta tag.
Decode ASCII-compatible encodings in URL params, for example,
ISO-2022.
In EUC-JP, use ASCII instead of JIS-Roman for code set 0. Character
0x5C is now backslash instead of Yen sign, and 0x7E is tilde
instead of overbar.
Update translations for Japanese and Norwegian localizations.
Enable term highlighting for Japanese and Korean.
Do not highlight stopwords in Japanese or Korean.
Do not highlight terms or show word scores for find similar queries.
CCE related topics are now calculated from documents that match
rules and individual URLs marked with stars. Previously, only
rule matches were used for related topics.
In CCE, add more options for showing number of related topics
and subtopics (10, 20, 50, all).
Join database fields with ';' to create body text.
Upgrade to OutsideIn 7.0 filters.
Support double-byte PDF 1.2 files.
Minor HTML updates for handicapped-accessibility: alt text for
spacer images, and content-language header in header.html.
Accept cookies whose value contain "=".
In CCE, when importing topics, handle URLs missing final slash.
Fix rare crash when handling documents in the HZ character encoding.
Update to Rosette 3.0.5.
Clean up summaries for MS Word documents.
Patchlevel 3
Fix handling of database records after rescan.
Patchlevel 4
Fix relevance bug in double pipe queries
Added option to set indexing weights for remote anchor text in spider
collections. This was inadvertantly removed in 4.0.
Allow mirroring of localization files from server.
Remove obsolete "quality weight" parameter.
Reset quality weight to 1 when upgrading from a pre-4.0 installation.
Add spaces between contents of XML elements, even if the document
doesn't include spaces: AB will be indexed as "A B".
Handle illegal language specs in Content-language HTTP headers
and HTTP-EQUIV meta tags.
Ignore illegal Unicode characters in UTF-8 encoding.
Check values in URL parameters, remove and log illegal ones.
Correct HTML markup in "special searches" help pages.
Add merge button to merge collection status page.
Catch OSErrors, and specially handle ENAMETOOLONG
in built-in web server