Release Notes for Verity Ultraseek 5.4 -------------------------------------- May 2005 Metadata from a Database ------------------------ A spider collection can be configured to automatically query a database for additional meta data for a document. This can be configured on the Spider Collection > Tuning tab. The following databases are supported: Oracle 8i, 9i, 10g (with or without client) DB2 Universal Database (UDB) 6.1, v7.x, and v8.x Sybase Adaptive Server (ASE) 11.5, 11.9, 12.0, 12.5, 12.5.1 Microsoft SQL Server 7, SQL Server 2000 ODBC drivers must be supplied by the database vendor for Windows support. Report data in TSV format ------------------------- Reports generated on the Activity > Reports tab can now be downloaded in TSV (tab separated values) format. These can easily be imported into any database or spreadsheet application, such as Microsoft Excel. To get the TSV format, check the box labeled "download as tab separated values" before generating the report. Automatic check for newer version --------------------------------- Ultraseek can periodically connect to the Internet to check the Verity support site for newer versions of the software. A notification will appear at the bottom of the administration interface when a newer version is available. This feature can be enabled/disabled on the Server > Parameters > Main tab in the administrative interface. Topics Export as Verity TAX file -------------------------------- CCE topics can be exported as a Verity TAX file, allowing for import into other compatible Verity products, such as Verity Collaborative Classifier. To export topics as a Verity TAX file, click the "Export topics as Verity TAX file" link on the Topics > Edit tab. Highlight/View PDF as HTML -------------------------- PDF results can now be converted to HTML before highlighting. If you want to highlight PDF results as HTML, set the application/pdf document type to "Parse as" Adobe Acrobat (Key View) on the Server > Doc types pane. If you want to continue to highlight PDF results within the Acrobat Reader, set it to "Parse as" Adobe Acrobat. Also, for searches with no terms to highlight, (for instance fielded searches or "Find Similar" searches), a "View as HTML" link now appears for results with certain document types. You can use this link to have Ultraseek convert the result document to HTML and display it in your browser. This feature is also available for PDFs if you set them to "Parse as" Adobe Acrobat (Key View) as described above. Improved Wildcard Queries ------------------------- Wildcard queries will now match terms that exist in a single document. Previously, wildcard would only match terms that existed in at least 3 documents. If a wildcard pattern matches too many terms, then the terms existing in less than 3 documents are removed. If there are still too many terms, then the list of matches is truncated. Wildcard now applies separately to the "field:" and "term" portions of a query. This means "titl*" will match "title" or "titles", but not "title:Document". Wildcard now applies to +required terms. The query "white +hou* cat" is automatically transformed into: "hour OR house OR houses | white cat". Ultraseek no longer attempts to expand url:, link:, and imagelink: phrase queries (example: url:www.site.com/path?query=value) where "*" and "?" must be interpreted as URL characters, instead of wildcards. Single term queries using these fields are expanded (example: url:htm*). Override robots.txt and robots meta tags ---------------------------------------- The Filters page now allows URL patterns to override the robots.txt file or robots meta tag. If either of those would prevent a page from being visited or indexed, this override will allow the page. In previous versions of Ultraseek, override of a robots.txt file was possible through a configuration file setting. Those settings will be transferred to this override on the Filters page. The Filters page should be used for any changes to robots.txt overrides. In previous versions of Ultraseek, overriding a robots meta tag required a customization in patches.py. Those customizations should be removed and replaced with settings on the Filters page. Implementation Changes ---------------------- Topic reports have moved from Topics > Reports to Activity > Reports. Lists of Collection or Style names are now sorted in the Admin UI. Related topics now display counts for the number of document under each topic that match the search term. Also, browsing of related topics are now restricted to the original search term, instead of a generic topic browse. Topic counts and browse behavior can be modified on the Interface > Topics tab under the CCE Search Pages section. Interface Changes ----------------- Highlighting of result titles and summaries can now be enabled and disabled in the Administrative Interface. The setting is specific to each style and can be controlled via the Interface > Query pane. Default collection encoding for database collections has been added to the Database Collection > Root tab, and removed from the New Table wizard. Patches.py Changes ------------------ Added patch to metabase, which allows customization of the metadata from a database feature. Updated Components ------------------ Datadirect Connect for ODBC drivers upgraded to 5.0, adding support for Oracle 10g to database collections. Keyview document filters upgraded to 8.3. PDF Filter now supports PDF 1.5. Supported Platforms ------------------- Microsoft Windows NT, Windows 2000, Windows 2003 (Intel compatible) SunOS 5.8, 5.9 Suse Linux 9.0 Red Hat Enterprise Linux 3.0 Deprecation Notification ------------------------ Microsoft Windows NT support. Obsolete Platforms ------------------ Redhat Enterprise Linux 2.1 Suse Linux 8.0 Suse Linux 8.1 SunOS 5.7 Bug Fixes --------- [BZ0105] Date range reports did not include data from "to" date. [BZ0108] Indexing BLOBs from Oracle generated error: "mxODBC.InterfaceError: ('HY000', -1, '[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]LobFileExists Failed.', 3262)" [BZ0123] Indexing CLOBs from Oracle generated error: "mxODBC.InterfaceError: ('HY000', 0, '[DataDirect][ODBC Oracle Wire Protocol driver]CLOBs are not supported on unicode servers.', 4616)" [BZ0233] Usage summary report was formatted poorly [BZ0252] Replacing a CCE license key with a non-CCE license key generated the following error: "exceptions.AttributeError: 'NoneType' object has no attribute 'orphanize_topics' [BZ0398] Users could not modify collections for which they had administrative rights. [BZ0440] Admin UI is now more robust in detecting invalid HTTP parameters (example: deleted collection name) [BZ0453] Dynamic documents were always revisited at the default interval. [BZ0495] Connections to Oracle using the Oracle client failed with the following error: "Connection failed. Either the credentials you provided are invalid, or the datasource name does not exist. Please correct your input." [BZ0637] Word 95 documents were truncated when indexed. [BZ0698] Users could not modify collections for which they had administrative rights. [BZ0731] The strings in signin.html were not translated. [BZ0757] Server failed to restart while CCE had queued rules, or during an Add URL. [BZ0769] Referring URL was not HTML quoted. [BZ0835] Trailing single quote affected spell suggestion [BZ0853] Do not attempt to match wildcards in url: phrases (url:path?query=value) [BZ0866] Allow rare terms to be matched in wildcard queries [BZ0872] Increased the size limit for receiving an HTTP Request to match default limits for Apache 2.0 [BZ0882] access.log timestamp is now the time the HTTP response completed. [BZ0884] HTTP server threads now stop processing when remote client has terminated. [BZ0902] HTTP server chose the wrong character set when serving pages based on the browser's accept-language header. [BZ0907] Group by topic was inconsistent when ht was set to 0. [BZ0910] "seek.error: IOError in termstring_find" problem during collection Merge or wildcard query. [BZ0912] Poor recovery from some types of out-of-memory errors on Windows [BZ0915] International content in NNTP subject fields was not being decoded correctly. [BZ0918] Documents ending in "/" were being duplicated on revisit and Add URL. [BZ0921] The error log reported a document was a duplicate of itself. [BZ0925] The values of the checkboxes for Document Title Replacement on the Collections > Tuning tab were ignored. [BZ0940] Topic browses did not work correctly with ALL terms mode. [BZ0950] Retry file delete and file rename on NT to allow for virus scanners. [BZ0952] Topics tab was unresponsive if the server had indexed a large number of sites. [BZ0982] System information on Windows now correctly identifies Windows 2000, Windows XP, and Windows 2003. [BZ0991] The spider did not share cookies between HTTP and HTTPS connections to the same host. [BZ0994] Error in help pages when only the SSL server was configured. [BZ1014] Creating a new collection with the same name as a deleted collection immediately after restart could cause an initialization error. [BZ1054] Some binary documents with internally specified date before 1 Jan 1970, or after 2038 would cause the keyview document parser to crash. [BZ1056] On Solaris, date values from the keyview document parser were corrupt. [BZ1057] On NT, startup after a collection was cleared would sometimes fail due to open files. Correction to Documentation --------------------------- Known Issues ------------ RELEASE NOTES FOR THE LINUX RELEASE WARNING FOR INKTOMI SEARCH 4.1 and 4.2 LINUX USERS -------------------------------------------------- Due to a change in the URL database format, users of Inktomi Search 4.1 and 4.2 on Linux will notice an error message after upgrading the server. Users upgrading from versions 4.0 or previous will not notice this. If 4.1 or 4.2 URL databases are detected, the following log message will appear: Version 4.1 or 4.2 URL database file detected. This is incompatible with this version. The URL database is being backed up, then it will be erased. The site will be revisited immediately. Be aware that this may cause extraneous documents to appear in the index that have actually been removed from the site. If this condition is detected, clear the collection to remedy it. SUPPORTED OS VERSION -------------------- Ultraseek has been tested on RedHat Linux 7.1 and later, with a kernel version of 2.2.5 and glibc 2.2.4. C++ LIBRARY REQUIRED -------------------- The C++ runtime library libstdc++-libc6.1 is required to run Ultraseek. If you are using RedHat Linux, this file is part of: Version RPM Package ------- ----------- 7.X compat-libstdc++-6.2-2.9.0 REPORTING A BUG --------------- When reporting a problem, be sure and include your Linux distribution, the version of your kernel (uname -r), and the version of your glibc. Send problem reports to software-bugs@ultraseek.com. SUPPORT FOR POSTSCRIPT ---------------------- Ultraseek will index postscript files if you have installed ghostscript and it is on your path. RELEASE NOTES FOR THE SOLARIS RELEASE Ultraseek requires Solaris 8 or above. On Solaris 8, update 7 with the T2 threading library is recommended, particularly for SMP hardware. From Sun's support library: http://developers.sun.com/solaris/articles/alt_thread_lib.html#question6 Solaris 8 Update 7 (2/2002) release is recommended for use of the T2 library since it contains all the performance enhancements and patches. If your release level is lower than Update 7, then you can apply the maintenance update 7 patch cluster or the following patches related to T2: * 108528-13: SunOS 5.8: kernel update patch * 108827-17: SunOS 5.8: /usr/lib/libthread.so.1 patch If Ultraseek is installed as a suid(root) program, you must install /usr/lib/lwp/ as a secured directory (see Question 11 in the above document). OS FILE HANDLE REQUIREMENT -------------------------- Ultraseek requires at least 1024 file handles be available. If your system's hard file-descriptors limit is set to less than 1024 the server will report an error message and refuse to startup. You can find out your hard file-descriptors limit as follows: sh ulimit -Hn exit If ulimit returns less than 1024, you will need to manually increase the hard limits in your system configuration. In Solaris 2.4+, this can be accomplished by adding the following lines to /etc/system: *Set hard limit on file descriptors set rlim_fd_max = 4096 Save the changes and reboot Solaris. See http://access1.sun.com/technotes/01406.html for more information on file descriptor limits. Patchnotes for 5.4 ------------------ May 16, 2005 Initial release Patchnotes for 5.4.1 -------------------- August 2, 2005 Enhancements: [BZ0846] The license limit no longer counts deleted documents. [BZ1081] "Junk Titles" are removed, even if no URL title is available. Bug Fixes: [BZ0323] Quoted single terms not matched in CCE rules [BZ0498] For documents from a DatabaseCollection, guess the document type from the URL in order to allow document highlighting. [BZ0529] Do not generate stack dump for ssl.timeout error [BZ0655] Remote anchortext is now truncated on a word boundary. [BZ0733] Double click on collection name now works when "search the internet" active [BZ0778] Security fix: protection for CRLF injection [BZ0834] When grouping by location, the displayed "number of groups" was incorrect [BZ0860] HTML lang attribute was ignored [BZ0869] Content Assistant debug messages generated stack dump on incorrect input [BZ0880] < | > and similar characters are now % encoded in URLs [BZ0889] QueryUI: Values of st beyond 500 generated inaccurate messages [BZ0910] Discard a corrupt TMS file [BZ0916] Better error message for indexing XML documents with unknown encodings. [BZ0922] Spider would fail to go idle after suspend and resume, staying in a "urldb_reading" state. [BZ0950] On NT, retry file operations that might fail due to a virus checker. [BZ0996] RFC 2047 MIME text in USENET headers incorrectly decoded [BZ1000] Corrupt RFC 2047 data caused binascii.Error: incorrect padding [BZ1008] "Highlight" and "View as HTML" were not translated in the search UI [BZ1033] When importing collections from another Ultraseek server, ensure the proposed names do not conflict with other proposed names [BZ1026] AdminUI: handle empty query parameters [BZ1042] Mirrored Quick Links would become active for "all_collections". [BZ1070] Handle malformed topic rules [BZ1076] Characters allowed in a fieldname: now allows _ - and . [BZ1080] Stack dump is no longer logged when HTTP Client resets connection [BZ1082] AdminUI: Query parameter error during edit of URL Quality factors. [BZ1093] NameError in saquery.xml when st > 1 and there are no hits. [BZ1096] Indexing could take 10 minutes to start when the DNS server could not resolve address for search-service.ultraseek.com [BZ1101] Reduced the number of terms indexed for an RFC822 document (Mail and News) Patchnotes for 5.4.2 -------------------- October 14, 2005 KeyView has been updated to version 8.3.1.9 Enhancements: [BZ1214] The Admin UI now issues a warning when the browser has JavaScript off. Bug Fixes: [BZ0416] Help screens poorly formatted under IE, Firefox, and Opera. [BZ0960] AdminUI: Interface "Test" button now works with Firefox browser. [BZ1035] AdminUI: "Load Logo" now works with Firefox browser. [BZ1147] Files failed to download from the administrative interface with Internet Explorer 6.0. [BZ1177] AdminUI: "topics" security level no longer displayed when topics unlicensed. [BZ1187] Orphan child processes are now killed when Ultraseek shuts down. [BZ1194] Incorrect
&