Nesting ODI substitution method calls. Part 1.

Posted: March 29th, 2010 | Author: Uli Bethke | Filed under: Oracle Data Integrator (ODI), Oracle Warehouse Builder | Tags: , , , | 1 Comment »

I am not getting around doing too much blogging on ODI these days. I have recently started a new project on OBIEE and ODI and it must be two years or so since I have last used OBIEE. I can tell you that there is a lot of catching up to do. Anyway, today’s post is about why I love ODI and one of the features that makes it the best ETL tool on the planet.

First we will have a look at nesting ODI methods at the same parse level. This is fairly straightforward:

What we do here is create a procedure with just one step. For this procedure we create an Option SESSION_PARAM. We use this option to make a call to the getSession method. As value for SESSION_PARAM we could pass in SESS_NO or SESS_NAME.

--<%=odiRef.getSession("" + odiRef.getOption("SESSION_PARAM") + "")%>

odi_nesting1

We just use double quotes and the plus sign to append the second substitution method call. The first double quote is used to as part of getSession and the second pair is used to allow us make a nested call to the substitution API.

Note: The double minus — is used to comment out the output of our API calls in the Oracle technology as otherwise we would throw an error.

Output of the above step is as follows ( I passed SESS_NO as the value of the option).

odi_nesting2

Next we look at some more interesting stuff and I’ll show you how you can nest calls to the substitution API at different levels of the ODI multi pass parser.

Recently I’ve had a requirement to create indexes on the fly for temporary interfaces. I was able to solve this by using the UD5 marker and nesting of ODI substitution calls.

We will first modify the IKM SQL Control Append and insert a step at the end of the knowledge module.

odi_nesting3

<? i=0; ?>

BEGIN

<%=odiRef.getColList("\t", "\tEXECUTE IMMEDIATE 'CREATE INDEX IDX_" + odiRef.getInfo("TARG_NAME") + "_" +

"<? i=i+1;out.print(i);?>" +

" ON " + odiRef.getInfo("TARG_NAME") + "([COL_NAME])" +

"'", ";\n", ";","UD5")%>

END;

In the above step we are nesting calls to the API at two different levels. Calls at the <% %> level are always executed before calls to the API at level <? ?>. As you can see we are using the UD5 marker to filter out columns where the UD5 checkbox has been selected.

Note: Inside the delimiters we are using the Java BeanShell scripting language.

When we use the knowledge module in a temp interface, the result of the first parse pass at runtime is as follows:

<? i=0; ?>
BEGIN
 EXECUTE IMMEDIATE 'CREATE INDEX IDX_SALES_TARGET_<? i=i+1;out.print(i);?> ON SALES_TARGET(PROD_ID)';
 EXECUTE IMMEDIATE 'CREATE INDEX IDX_SALES_TARGET_<? i=i+1;out.print(i);?> ON SALES_TARGET(CUST_ID)';
 EXECUTE IMMEDIATE 'CREATE INDEX IDX_SALES_TARGET_<? i=i+1;out.print(i);?> ON SALES_TARGET(TIME_ID)';
 EXECUTE IMMEDIATE 'CREATE INDEX IDX_SALES_TARGET_<? i=i+1;out.print(i);?> ON SALES_TARGET(CHANNEL_ID)';
 EXECUTE IMMEDIATE 'CREATE INDEX IDX_SALES_TARGET_<? i=i+1;out.print(i);?> ON SALES_TARGET(PROMO_ID)';
END;

This is then passed on to the next level where at runtime this results in:

BEGIN
	EXECUTE IMMEDIATE 'CREATE INDEX IDX_SALES_TARGET_1 ON SALES_TARGET(PROD_ID)';
	EXECUTE IMMEDIATE 'CREATE INDEX IDX_SALES_TARGET_2 ON SALES_TARGET(CUST_ID)';
	EXECUTE IMMEDIATE 'CREATE INDEX IDX_SALES_TARGET_3 ON SALES_TARGET(TIME_ID)';
	EXECUTE IMMEDIATE 'CREATE INDEX IDX_SALES_TARGET_4 ON SALES_TARGET(CHANNEL_ID)';
	EXECUTE IMMEDIATE 'CREATE INDEX IDX_SALES_TARGET_5 ON SALES_TARGET(PROMO_ID)';
END;

Eventually this is executed by the Oracle technology as dynamic SQL. You need to be careful when you nest different parse levels with escaping quotes. I will deal with this in the next part of this series.

Once you get your head around this stuff you can literally meet any requirement that is thrown at you. What a great tool.

Don’t you feel the love yourself?

In order to master scripting in ODI I recommend the following books.

Java BeanShell

Scripting in Java: Languages, Frameworks, and Patterns

Jython

The Definitive Guide to Jython: Python for the Java Platform.

Jython Essentials (O’Reilly Scripting)


Book Review – Oracle Warehouse Builder 11G Getting Started – OWB for beginners

Posted: October 17th, 2009 | Author: Uli Bethke | Filed under: Oracle Warehouse Builder | Tags: , , , , , | No Comments »

Bob Griesemer’s Oracle Warehouse Builder 11G – Getting Started is the first and so far (as of Oct 2009) only book published on Oracle Warehouse Builder.

It aims to introduce data warehousing in general and Oracle Warehouse Builder in particular, to absolute beginners in the field.

In chapter 1 the author guides us through the installation process for OWB 11G. A typical installation of Warehouse Builder in a client/server environment is explained. More complex architectures with the Control Center service on a different server are not explained.

In chapter 2 we are introduced to the data model of the source system for the data warehouse. We are shown how to import or create the metadata for our source system. What I found useful here is that the author explains how we can use a non-Oracle RDBMS (in this case SQL Server) as a source system via heterogenous services. This can be a pain to set up so it is helpful to have a step by step walkthrough for this.

Chapter 3 gives an introduction to the basics of data warehouse design. It then explains how we can implement such a design as a target structure in Oracle Warehouse Builder.

In chapter 5 the author gives an overview on the basics of ETL processes and introduces us to some of the more important OWB operators that will allow us to implement ETL process in Warehouse Builder. I found that more space should have been given to explain the operators in detail. Some more detailed examples for each would have been useful also.

Chapters 6 and 7 then show us how we can bring the source metadata from chapter 2, the target metadata from chapter 3, and the operators from the previous chapter together in a Warehouse Builder mapping to extract, transform, and load data from source to target. What is completely missing here though is an introduction to process flows. Process flows are fundamental to glue ETL mappings in Warehouse Builder together and should have been part of an introduction to the subject.

Chapter 8 gives an overview on the deployment and execution of objects via the OWB Control Center. It also includes a good troubleshooting section with regards to deployment. As expected from a beginner’s book, more advanced topics such as deployment via OMB+ and TCL scripts is not explained. What is also missing is an explanation on how to schedule objects once they have been deployed.

In chapter 9 the author introduces us to various OWB features. The book gives a good overview on version management via snapshots and export/import of metadata.

The book does pretty much what it says on the tin and gives a good introduction to novices in the area of data warehousing and Oracle Warehouse Builder. So if you have never used Oracle Warehouse Builder this book is for you. If you have used OWB before you will not learn anything new here. Also this book can only be a starting point for your OWB career. A lot more OWB features than are outlined in this book need to be learned to become a master in the area (if you think about it the OWB user manual in PDF format has about 1000 pages and in some areas only scratches at the surface). Also the timing of the publication of the book is a bit unfortunate as only recently OWB 11GR2 was released with a lot of important new features and a redesigned User Interface. Hopefully there will be a 2nd edition soon that addresses this shortcoming.


Oracle Data Integrator vs. Oracle Warehouse Builder: What to do, what to do?

Posted: July 13th, 2009 | Author: Uli Bethke | Filed under: ETL, Oracle, Oracle Data Integrator (ODI), Oracle Warehouse Builder | Tags: , , , | 12 Comments »

One of the questions that regularly comes up in the forums, or during presentations on Oracle data integration products, is on the future of Oracle Warehouse Builder and Oracle Data Integrator. People wonder which tool to use for new projects? What should be done with existing implementations? In this article we will first compare the features and functionalities of both products and then I will answer these questions.

Oracle Data Integrator Enterprise Edition

In January 2009, Oracle released a statement of direction for ODI and OWB: “Today Oracle includes Oracle Data Integrator (ODI) and Oracle Warehouse Builder (OWB) as the two components of ODI-EE and will merge them into a single unified data integration technology platform. This strategy fully preserves any existing development investments of all Oracle data integration customers and will provide a seamless, easy upgrade path from the current components to the unified platform and beyond.”

Both data integration tools will be licensed as one product, i.e. you don’t have to buy a separate license for Oracle Data Integrator and the OWB Enterprise ETL features. The Oracle Warehouse Builder core ETL features are installed with the Oracle RDBMS and remain free to use (as long as you are licensed for the RDBMS). From what I hear from Oracle they will also offer a free data integration component going forward even when the two products have merged into a unified platform.

Google Trends

Apparently Google Trends can predict the outbreak of a flu virus more accurately than traditional methods. Why not let it have a go at predicting the future of ODI and OWB. As you can see from the Google Trends chart below, we are dealing with a slow but steady increase in searches for Oracle Data Integrator. On the other hand we are seeing a slow and steady decline in queries for Oracle Warehouse Builder. Not really unexpected.

google_trends_odi_owb

Declarative Design vs. Traditional ETL

ODI follows a declarative design approach. This is the key differentiator between the two products and the one that gives Oracle Data Integrator a clear advantage over Oracle Warehouse Builder.  Declarative design expresses the logic of a computation without describing its control. What exactly does this mean? In data integration projects loading and integration strategies are very similar, e.g. in a data warehouse project 80%-90% of fact tables follow the same loading strategy: first we disable foreign key constraints and indexes, then we look up surrogate keys and finally we load the data into the fact table. In a traditional ETL approach we would create a mapping module for each individual fact table. This is very time consuming. In ODI we simply define the loading strategy in a template (this is called a knowledge module in ODI) and with the click of a mouse apply it to every fact table load. Knowledge modules are created using a mix of programming languages, such as SQL, PL/SQL, Jython, the ODI substition API, OS commands, or Java. This data integration approach is very quick to implement and extremely flexible. It essentially gives you the flexibilty of manual scripting in combination with the advantages of an ETL tool (process flows, data lineage, scheduling, impact analysis, metadata etc.).

In a project where most data flows follow a similar pattern, the declarative design approach of ODI will be able to cut down the development time for your transformations by a factor of 30-50%. In a project where each transformation is unique you will not benefit too much from the declarative design approach in terms of productivity. However, from my experience this situation is extremely rare.
data_flow_owb

Data flow in Oracle Warehouse Builder

data_flow_odi

Data flow in Oracle Data Integrator

Data servers

One of the reasons why Oracle Warehouse Builder never made it into the top spot in the Gartner Magic Quadrant for ETL tools was its limitation on Oracle as the target server. You don’t have this limitation with ODI. In ODI you can use any technology that can be accessed via JDBC as either your data source or your target.

The ELT approach

Both products are ELT tools. They don’t need a separate transformation server but use the power of the target RDBMS server for transformations. This is a key differentiator to traditional products such as Business Objects Data Integrator.

Change data capture (CDC)

Oracle Data Integrator ships with change data capture knowledge modules. It supports near real time data warehousing out of the box.

Even though ODI allows you to quickly set up CDC, you will not get far with your project without a sound understanding of the underlying change data capture technology. In an Oracle world this is Oracle streams. Oracle Warehouse Builder has no interfaces to CDC. Of course, you can use it on top of Oracle CDC as this article by Mark Rittman shows . However, most of the work needs to be done at database level.

Data Modeler

Both products ship with a basic data modeller. The ODI Common Format Designer (CDF) is ideal for prototyping as you can quickly click together your target model from your source model. This then can be forward engineered to your target data server, e.g. a data warehouse. This saves you from creating a separate source to target map document as all the relationships are stored in the repository. Once you have established the relationship between the source and target attributes, ODI automatically generates data flow interfaces for you. Another brilliant feature. The functionality of the CDF for data modelling purposes, however, is rather limited. While you can define basic functionality and logical models (attributes, constraints, table relationships etc.) ODI won’t allow you to define the physical attributes of your tables, e.g. tablespaces, partitions etc. All of this functionality is available in OWB.

OLAP

In OWB you can create multidimensional OLAP cubes directly from a relational star schema. This is done through analytic workspaces and Oracle OLAP. You can’t load directly into Essbase cubes. ODI on the other hand can directly load your data into Essbase cubes. Various knowledge modules are provided out of the box for Essbase. ODI also allows you load into Oracle OLAP cubes.

Scripting Objects

Oracle Warehouse Builder relies on OMB+ and TCL for scripting objects. This is useful for updating a lot of objects in one go, e.g. if you want to set the tablespace for a group of tables. One drawback here is that you can’t create TCL scripts from your objects. It would be a nice feature to create a script from the mappings you create in OWB. ODI relies on XML to import and export objects. In theory you can script your objects via XML and then import into ODI. Unfortunately, no documentation exists on this so this will not be straight forward. One of the new features in Oracle Data Integrator 11 will be a proper API for scripting objects. Looking forward to this.

Misc.

ODI allows you to encrypt your objects. This could be useful for third party vendors of Knowledge Modules. With ODI you also have proper version and source control built into the product. OWB allows you to take snapshots of your objects. However, this feature does not allow for version control in the traditional sense. Personally, I have found the snapshot feature awkward to use.

Available resources

When I started out on ODI in January 2008 there was very little documentation available. Available documents I found poor. This has changed completely. In my opinion the best starting point to learn about ODI is the ODI Best Practices in a Data Warehouse guide. A quick search on LinkedIn revealed that there are currently 1,469 people registered with Oracle Warehouse Builder skills. On the other hand we just have 334 users with ODI skills. On Metalink there are now three times more documents on ODI than on OWB. There is also a lot of ODI related activity on the Oracle forums. While there is more OWB knowledge and skills out there, ODI is catching up fast.

What to do then?

Over the next two years Oracle Data Integrator and Oracle Warehouse Builder will merge into one unified product. In calendar year 2011 we will have a unified platform. The comparison above has shown that out of the two products ODI has more to offer. You can expect that a lot of the functionality for the unified platform will come from ODI.

In my opinion, only tactical data integration projects that solely rely on the Oracle Warehouse Builder core ETL functionality (the stuff you get for free with the RDBMS) should be implemented using Oracle Warehouse Builder. If you are licensed for Oracle Data Integrator EE, you should use Oracle Data Integrator for any new data integration projects you start. While Oracle will offer upgrade paths for both products you can use third party tools to start your migration from OWB to ODI today. Don’t expect that this tool will completely automate a migration. It also seems that migration of process flows is not supported currently.

Any Oracle data integration consultant will need to focus and upskill on ODI functionality going forward. Why not start this exercise with my ODI Knowledge Module article on OTN.

In order to master scripting in ODI I recommend the following books.

Java BeanShell

Scripting in Java: Languages, Frameworks, and Patterns

Jython

The Definitive Guide to Jython: Python for the Java Platform.

Jython Essentials (O’Reilly Scripting)


OWB Best Practice Part 1: Logically organise your DW project with Collections

Posted: November 12th, 2008 | Author: Uli Bethke | Filed under: Best Practice, Oracle Warehouse Builder | No Comments »

Once you have gone beyond the implementation of more than one data mart for your Enterprise Data Warehouse project in Oracle Warehouse Builder, it becomes harder and harder to find a particular object for editing amongst the hundreds or thousands of objects.

OWB Collections come to the rescue. They allow you to logically organise your project into virtual folders by creating shortcuts to the actual objects.

In my projects, I typically create three collections per data mart. One that contains all the objects (mappings, transformations, tables etc.) that relate to extract phase of the ETL for the data mart. One for the lookup phase, e.g. lookup tables and mappings that load these. And finally one for the load phase. This last phase contains the fact and dimension tables and any transformations, mappings etc. that load these.

For a Sales data mart I would create three collections:

sales_extract
sales_lookup
sales_load

Unfortunately, you can’t nest Collections. It would be a nice feature to have one subcollection per object type. So within the sales_extract collection I would like to have a sales_extract_tables, sales_extract_mappings etc. subcollection. If I get around I will log this as an enhancement request, or maybe someone from OWB product management reads this.


Run Procees Flow from Command Line or SQL+

Posted: July 25th, 2008 | Author: Uli Bethke | Filed under: Oracle Warehouse Builder | Tags: , | No Comments »

You can run a process flow from the command line. This is useful if you want to call a process flow from another application, e.g. if you want to use a 3rd party scheduler like AT or Cron etc. or if you want to script the execution of your process flows.

sqlplus user/pwd@connect_identifier C:\oracle\products\10.2.0\owb_1\owb\rtp\sql\sqlplus_exec_template.sql <rep_owner>
<location process> <flow name> <process flow> <sys params> <custom params>

You need to call the sqlplus_exec_template.sql from sqlplus and pass five parameters to the script as in the example above:

rep_owner: repository owner
location: for a process flow this is the location for the Oracle Workflow Manager as defined in the OWB Design Center
process flow: Just pass in PROCESSFLOW as arg
name process flow: The name of your process flow
sys params: “,” if you don’t have any
custom params: P_ENABLE_TRACE=0 as an example


Automate OWB Metadata export with OMB+

Posted: July 21st, 2008 | Author: Uli Bethke | Filed under: Oracle Warehouse Builder | Tags: , , , | No Comments »

I regularly make exports of Warehouse Builder Metadata to files. I do this at least once a day. When there are a lot of changes to OWB objects I make the export multiple times a day. I prefer this over snapshots as these exports to file do not fill up the OWB repository. Regular exports are useful to re-import changed objects and to keep track of changes to your objects.

I have written a script to automatically export Warehouse Builder Metadata. The script can be scheduled via Windows AT command.

Adapt the following to your environment and save as OWB_export.tcl

OMBCONNECT owb rep_owner/pwd@host:port:service
set OMBLOG c:\\temp\\log.txt
set systemTime [clock seconds]
set systemTimeFormat [clock format $systemTime -format %Y%M%d-%H%M%S]
OMBEXPORT MDL_FILE 'c:\\temp\\my_project_$systemTimeFormat.mdl' \
FROM PROJECT 'MY_PROJECT' \
OUTPUT LOG 'c:\\temp\\my_project_$systemTimeFormat.log'
OMBDISC

Now open a command prompt and type:

AT 23:30 /EVERY:m,t,w,th,f C:\oracle\products\10.2.0\owb_1\owb\bin\win32\OMBPlus.bat c:\OWB_Export.tcl

This will schedule to run the script every weekday at 23:30.


Create and Maintain Source to Target Map in OWB

Posted: July 11th, 2008 | Author: Uli Bethke | Filed under: ETL, Oracle Warehouse Builder | Tags: , , | No Comments »

One of the tasks during the ETL design process is to create a Source to Target Map. This maps fields from the source system to fields in the target data warehouse. Very often this is done in Excel.

This approach has two disadvantages. First of all the definitions in the source to target map need to be transfered to your target data warehouse. At best this is a script based effort that takes the definitions in your map and automatically creates target tables, columns, indexes, comments/transformation rules etc. in your data warehouse. The other disadvantage is that all the source to target meta data is stored externally to the warehouse in Excel. Typically the source to target map is not updated once the target tables have been created in the DW.

User Defined Properties (UDP) in Warehouse Builder come to the rescue. UDPs allow you to extend the OWB repository, i.e. you can define additional properties on the base objects in Warehouse Builder.

owb udp

As per documentation: “To define custom objects and properties in Warehouse Builder, you must use the scripting utility Oracle Metabase (OMB) Plus. After you specify the UDOs and UDPs through scripting, you can then create instances of those objects either through scripting or using the graphical user interface (GUI).”

For the source to target map you can create custom properties on the column object, e.g. the source of the column, the source data type, transformation rules etc..

This is done via OMB+, e.g. to define a new property on the column object named Source Table you issue the following command

OMBSWITCHMODE SINGLE_USER_MODE
to switch to single user mode


OMBREDEFINE CLASS_DEFINITION 'COLUMN' \
ADD PROPERTY_DEFINITION 'UDP_b_SRC_TBL' SET PROPERTIES \
(TYPE, BUSINESS_NAME) VALUES \
('STRING', 'Source Table')

OMBSAVE
to persist your changes in the OWB repository

OMBSWITCHMODE MULTIPLE_USER_MODE
to go back to multi user mode.

You can now enter values for your source to target map through the Design Center: Right click on a table > Properties > User Defined tab > Expand Columns > Click on Column… et voila, enter your values. The values you enter are stored in the OWB repos and can also be retrieved from there.

In order to retrieve the Source to Target Map from the OWB repository I have written some SQL that you can adapt to your needs.

SELECT
   entity_name,
   column_name,
   position   ,
   MIN(CASE WHEN property_name = 'UDP_a_AREA' THEN property_value END) AS AREA,
   MIN(CASE WHEN property_name = 'UDP_b_SRC_TBL' THEN property_value END) AS SRC_TBL,
   MIN(CASE WHEN property_name = 'UDP_c_SRC_TBL_COMMENT' THEN property_value END) AS SRC_TBL_COMMENT,
   MIN(CASE WHEN property_name = 'UDP_d_SRC_COL' THEN property_value END) AS SRC_COL,
   MIN(CASE WHEN property_name = 'UDP_e_SRC_COL_COMMENT' THEN property_value END) AS SRC_COL_COMMENT,
   MIN(CASE WHEN property_name = 'UDP_f_JOINS' THEN property_value END) AS JOINS,
   MIN(CASE WHEN property_name = 'UDP_g_SRC_COL_DATATYPE' THEN property_value END) AS SRC_COL_DATATYPE,
   MIN(CASE WHEN property_name = 'UDP_h_SRC_COL_LENGTH' THEN property_value END) AS SRC_COL_LENGTH,
   MIN(CASE WHEN property_name = 'UDP_i_TRANSFORMATION_RULE' THEN property_value END) AS TRANSFORMATION_RULE,
   MIN(CASE WHEN property_name = 'UDP_j_NULL_REPLACEMENT' THEN property_value END) AS NULL_REPLACEMENT,
   MIN(CASE WHEN property_name = 'UDP_k_TRG_TBL' THEN property_value END) AS TRG_TBL,
   MIN(CASE WHEN property_name = 'UDP_l_TRG_COL' THEN property_value END) AS TRG_COL,
   MIN(CASE WHEN property_name = 'UDP_m_TRG_COL_DATATYPE' THEN property_value END) AS TRG_COL_DATATYPE,
   MIN(CASE WHEN property_name = 'UDP_n_TRG_COL_LENGTH' THEN property_value END) AS TRG_COL_LENGTH,
   MIN(CASE WHEN property_name = 'UDP_o_ISSUES' THEN property_value END) AS ISSUES
FROM
   ALL_IV_COLUMNS a
   JOIN ALL_IV_OBJECT_PROPERTIES b
   ON (a.column_id = b.object_id)
WHERE
   a.entity_name LIKE 'D_%' OR
   a.entity_name LIKE 'F_%'
GROUP BY
   entity_name,
   column_name,
   position
ORDER BY
   entity_name,
   position

You could use this to create a derived table in a Business Objects Universe to create your source to target map via Web Intelligence.


Complex Schedule OWB

Posted: June 29th, 2008 | Author: Uli Bethke | Filed under: Oracle Warehouse Builder | Tags: | No Comments »

OWB has the By Clause to create complex schedules. Imagine you have a near real time ETL process that needs to execute every 15 minutes between 7 am and 0 am. You don’t want this schedule to execute during the night because other maintenance operations such as stats gathering etc. need to run.

Step 1:

Create a schedule with your start date and a start time of 7:00:00 am. As repeat expression for this job specify minutely and set the frequency to every 15 minutes.

Step 2:

Specify the By Clause to create the complex schedule: click on By Hour and enter 7 am – 0 am.

owb audit_id


Global Variables OWB

Posted: June 21st, 2008 | Author: Uli Bethke | Filed under: Oracle Warehouse Builder | Tags: , | 1 Comment »

Several global variables can be accessed in a process flow. How this is done is not documented well. In the example below I have created two variables and assigned them values of the global variables Audit_ID and Number_of_Errors.

Step 1:

Create Process Flow and add two variables: l_audit_id and l_number_of_errors. For both variables select Integer as the data type.
owb audit_id

Step 2:

Deploy process flow and execute. After execution the Output Parameters will have the values for the audit_id and the number_of_errors global variables. The variables could be assigned to output parameters of the process flow, input parameters to transformations, mappings or other process flows etc.

owb audit_id