Detailed Explanation of informatical lookup

  • 2021-07-03 01:00:55
  • OfStack

Lookup is a common operation in ETL, such as conversion from product key to surrogate key, conversion from ID to name, etc., all of which can be realized through lookup. lookup transformation component in Informatica can not only realize this common transformation, but also be used to update slowly changing dimensions, etc., which is powerful. According to the online document of Informatcia8.1, this paper briefly introduces lookup and transformation of informatica under 1.
Chinese and English noun correspondence:
• ES 15EN: Conversion
• ES 17EN: Connected
• ES 19EN: Unconnected
• cache: Cache

1. Functions of lookup
Obtain relevant values: for example, look up name according to ID
Execute calculation: For example, get a calculation formula and get a result
Update slowly changing dimensions: decide whether to insert or update records according to lookup conditions, etc.

2. rational lookups vs flat file lookups
The source of lookup can be a table in a relational database or a flat file. For relational tables, you can choose from either source or target, or you can use the import wizard to complete the import wizard.

3. connected lookups vs unconnected lookups
informatica transformations can be divided into connected and disconnected types.
The so-called connected conversion means that the conversion is in the data stream of the whole ETL, and its input ort is directly converted from another one, but not connected, so the input data is obtained independently of the main data stream through expressions in other conversions.
The connected lookup transformation will process every piece of data in the data stream, output pre-specified default values for those that do not meet lookup conditions, and update the dynamic cache. The output values are all output/lookup port. You can use static or dynamic caching.
The unconnected lookup transformation only processes data that meets the lookup criteria and returns only 1 value. For those that do not meet the requirements, output NULL. Unconnected lookup transformations can be called multiple times. return port with an output value of only 1. Only static caching can be used.

4. cache
informatica uses the cache mechanism for lookup. The server's processing flow for cache is roughly as follows:
When processing the first data, the server will create a cache in memory, and the size of the cache is determined by one attribute of lookup transformation. For the lookup condition, an index cache is created, and for the output value, it is placed in the data cache.
If the memory cache size is not enough, the overflow cache will be put into the file. After the session ends, the cache is cleared unless the lookup cache is set to be persistent.
For static cache, the lookup transformation is not allowed to update it. For dynamic cache, if unqualified values are found in lookup, cache can be inserted or updated.
Of course, you can also choose not to use any cache.

5. lookup transmation Component
lookup has 5 components, that is, right-click on the lookup transformation and select the 5 tabs after editing. In fact, basically informatica all transformation are almost five components.
The first transformation tab, the second ports tab and the fifth metadata extensions tab are basically the same. Only port of lookup has L (lookup) and R (return) in addition to the usual I (input) and O (output). There can only be one return port, and it cannot be directly connected with other transformations, but can only be obtained through LKP: expression.
The fourth condition tab specifies the conditions for lookup, in effect setting the association conditions for the two tables.
The third properties is the most important, where you can rewrite SQL, customize lookup, set the processing mode when returning multiple records when meeting the conditions, set whether to use dynamic cache, and the size of cache, etc.

6. lookup tips
Create an index on an lookup conditional column
Try to use = condition. If you have more than one condition, try to put the = condition first
For small tables, try to use cache and set the size of cache so that the whole table can be cache into memory
• Use join instead of lookup if the lookup table and the source table are in the same database and cache is not large enough
• For static lookup, try to use permanent cache (persistent cache) so that multiple session can be reused.


Related articles: