Oracle method for indexing binary files

  • 2020-05-15 02:26:12
  • OfStack

The ORACLE tutorial you're looking at is :Oracle's method for indexing files in base 2. Oracle text is the full-text retrieval technology of Oracle and is part 1 of the standard and enterprise versions of 9i. Oracle text USES the standard sql language to index, find, and analyze text and documents stored in oracle databases, files, or networks. Oracle text can perform language analysis on documents and find documents using a variety of methods, including keyword, context query, logical operation, pattern matching, mixed topic query, HTML/XML paragraph search, and so on. Oracle text is superior for mixed queries that contain both textual and structured relational properties. Here's an example.

The existing document table ZYCONTENT_TABLE is the storage table of uploaded files. The base 2 files are stored in the BLOB_CONTENT column as BLOB type. The oracle text technology provides a method to index the base 2 text files in the BLOB column.

1. Preparation

The schema for the document table is ZYFILEUP, and the document table is ZYCONTENT_TABLE. The document table is defined as:


2. Establish text index authorization for the schema that owns the document table

Connect to the database as an system user for ZYFILEUP authorization.


3. Index the document table on the BLOB_CONTENT column

Connect to the database as ZYFILEUP user to create a text index reference item


indexing


4. Index synchronization and deletion

Two ways to synchronize indexes:


and


Method of index deletion:


5. Introduction of index function

1. The file type of the index

Oracle text can filter and extract content from documents in different formats. It supports more than 150 document formats. The most common MS OFFICE documents, PDF documents, and so on can be retrieved by Oracle text.

2. Introduction to filters

For plain text file formats such as TXT, HTML, XML, etc., use an empty filter, NULL_FILTER, for binary files, INSO_FILTER filters. If the BLOB column of the document table contains both binary files and plain text files, INSO_FILTER filters are also used, but it is better to store and index the plain text and binary files separately. Oracle text also provides packages to extract text from base 2 files into plain text files.

3. The dictionary

You can customize dictionaries in different languages that contain consent words and word level relationships. Oracle text provides the best features in multiple languages to support finding documents written across western, Japanese, Korean, traditional, and simplified Chinese.

6. Text query statement format


The contains function provides a powerful query function, "and", "or" relationship, similar (near;) And exclude (not ~) and other functions, more convenient is that it can also query the text of different languages according to a certain key word, of course, this has to set the dictionary in advance.

7. Problems in practical application

In my practical application, the most commonly used method is to use Oracle text to index Chinese documents in Word, Excel, PowerPoint, HTML, PDF and other formats. However, I find that no text document in RTF format can be retrieved whether INSO_FILTER or NULL_FILTER is used, and no text document in BASIC_LEXER or CHINESE_LEXER is used in win2000. The test of XP did not succeed, I do not know why. Overall, Oracle text's text retrieval capability is quite good. It doesn't even require text files to be stored in the database. Even if they are stored in the operating system's file directory, Oracle text can index files in the database.


Related articles: