Oracle Text and Search Summaries

Oracle provides for full-text indexing of both text and binary document formats via the Oracle Text feature. Oracle Text allows for document filtering, search term highlighting, themed searches and all sorts of other things; this blog concerns getting a small section of the doc out into the search results, similar to google and other search engines, in an efficient way.

OK, the docs are all in BLOBS in a table, you've set up your index and you can search them with CONTAINS.

SELECT SCORE(1) AS SCORE, DOCUMENTID FROM DOCUMENTS WHERE CONTAINS(filecontent, 'some search text', 1) > 0

You can even display the highlighted search terms in an HTML version of the chosen document:

ctx_doc.markup('MYTEXTINDEX',SELECTEDDOCIDHERE,'some search text',tmpVar...);

which performs well enough for one doc. But filtering each doc in the search results to get a snippet for display takes forever...

The solution is to create a new CLOB column with the plaintext already extracted via ctx_doc.filter (perform the filter when the doc is inserted). Create a context index on this and specify NULL as the filter type. Now you can generate the highlights (for the search text) on the plaintext version in far less time (we are talking a factor of ten less). Then use the highlights info to get a few characters to the left and right of the first one or two highlights from that plaintext column using DBMS_LOB.SUBSTR(). Do this for each set of n search records the user gets in a page (10, 25, whatever) when they request the results.

Comments
BlogCFC was created by Raymond Camden. This blog is running version 5.5.1.