Posted: March 8th, 2010 | Author: Brendan Tierney | Filed under: Amazon EC2, Training, analytic functions, data mining | Tags: installing Oracle Data Miner, ODM, Oracle Data Miner, Oracle Data Mining | No Comments »
As the awareness of Oracle Data Miner increases so does the interest in trying out and playing with Oracle Data Miner (ODM). The following set of instructions steps you through what is required to download and install the database, configure a data mining user in the database, populate the data mining user with some data. The instructions finish with details of how to download and configure the Oracle Data Mining tool. The following instructions cover 11g R1 of the database and tool.
Download and install the 11g R1 Database
- The first step is to download the Oracle 11g Enterprise Edition database software. The data mining functionality is only available in the enterprise edition. You need to download Oracle 11g (Release1 – 11.1.0.6.0) Enterprise Database software
- Create a working directory and save download to your PC
- UnZip the software into the working directory
- Before you run the install on your PC you need to log in as Administrator of the machine
- Double click on the Setup.exe file
- Follow the on screen instructions
- Install the full Enterprise edition
- Click here for a tutorial on installing Oracle 11g
- When you get to step 3 in the installation tutorial, make sure you write down the Database Password. This is very important as you will need this later.
NB. Make sure that you complete all the installation steps correctly and that you did not receive any errors. If you did you may need to start again.
HINT : You may need to do the installation as Administrator on your PC.
Set up the Sample Data for the Data Mining User
- You will need to download the Examples files as these will contain the sample data sets need by ODM
- Download the Oracle 11g Examples file to you working directory
- UnZip the Examples into you working directory
- See the installation guide for installing the Examples
Create and Configure the Oracle Data Mining User
To create the a database user for data mining, log into the database as the
SYSTEM user (and the password you gave during the installation process) and type a command like the following:
ALTER USER SH IDENTIFIED BY SH;
Download the files DMSHGRANTS.SQL and DMSH.SQL
Logon into the database as SYSTEM and run the DMSHGRANTS.SQL file
@<File Location>dmshgrants sh dmuser
Logon into the database as your DMUSER (password = DMUSER) and run the DMSH.SQL file
@<File Location>dmsh.sql
If everything has worked (no errors) then you should have Oracle Data Mining setup in the Database
Download, Setup and Login to Oracle Data Miner
- Download Oracle Data Miner software
- UnZip the software into C:\Oracle\ODM
- To start ODM run C:\Oracle\ODM\odminerw.exe
- Click on the New button to create a new connect.
- Enter a connection name, the username and password of the ODM user created above plus the name of the pc/machine/server, port and SID/Service Name
- When a successful connect is made the ODM tool will be opened.
- To view the sample data created above, expand the Data Sources branch in the Navigator panel, select your ODM user. You can now see the Views and Tables with the sample data.
Two useful books in the documentation are, ODM Administrators Guide 11g and ODM Concepts Guide 11g
If you don’t want to go to the trouble of downloading and installing the enterprise edition of the database then you can use it on the Amazon cloud. The ODM team have provided a set of instructions for you to follow on how to setup yourself up on the Amazo cloud and how you can connect the Oracle Data Mining tool to the database on th Amazon cloud. Link to instructions.
Posted: February 14th, 2010 | Author: Uli Bethke | Filed under: Oracle Data Integrator (ODI) | Tags: hsqldb, odi, oracle data integrator, Sunopsis Memory Engine | No Comments »
This is the final part in our series on the HSQLDB in memory engine. In the first part of this series we’ve had a look at the Sunopsis Memory Engine (SME) from a high level. In the second part we then looked at two use cases namely using Java methods directly in SQL transformations and using the in memory database for simple, low volume transformations in memory. Craig Stewart commented on this last post and mentioned it would be useful to set a JDBC property that would clean up memory after a disconnect, similar to the drop_on_disconnect JDBC property of the ODI XML driver. I had a look in the HSQLDB documentation and came across the SHUTDOWN property. This property shuts down the database after the last session has been disconnected. The problem is that it’s not available yet in version 1.7.3 (the one that ODI uses). As a result we need to clean up objects ourselves or upgrade the HSQLDB to a higher version. If I find some spare time I will blog on how to do this. Also the other day I came across a corrupted HSQLDB. Another subject I may blog about some time soon.
Ok. Let’s have a look at today’s topic. The SME can be very useful to act as a temporary data store, e.g. you can store a Jython variable in the SME, from there assign it to an ODI variable and then pass it on to another scenario. In today’s post I will show you how to achieve this.
We will first create a procedure using Jython technology. In this procedure we will take a Jython variable and store it in a table in the Sunopsis Memory Engine. In this example we grab the OS username of the logged on user and store it in a table named TTEMP in the in memory database. Thanks to Maciej Kocon for the code.
#matshyeq:20100207
import os, re, time
import java.sql as sql
import java.lang as lang
import os
currentuser = os.environ['USERNAME']
lang.Class.forName("org.hsqldb.jdbcDriver")
url = "jdbc:hsqldb:."
myCon = sql.DriverManager.getConnection(url,"sa","")
mySQL="create table TTEMP (osuser VARCHAR(30), val TINYINT, ts VARCHAR(30))"
try:
prepStmt = myCon.prepareStatement(mySQL)
prepStmt.execute()
except:
pass
mySQL="INSERT INTO TTEMP VALUES('%s',1,NOW())" % currentuser
prepStmt.close()
prepStmt = myCon.createStatement()
prepStmt.executeQuery(mySQL)
prepStmt.close()
myCon.close()
Next we create an ODI variable V_OSUSER that gets its value from TTEMP.OSUSER.
SELECT osuser FROM ttemp

We take this variable and pass it into another scenario.

In this scenario we print the variable to the Operator log to verify that the value of the variable was passed on.

From the screenshot we can see that the OS user uli was printed out, which means that we successfully passed on our Jython variable value to another scenario.
Posted: February 4th, 2010 | Author: Brendan Tierney | Filed under: Best Practice, Business Intelligence, Data Warehouse, analytic functions, data mining | Tags: data mining, Data Mining Projects, Predictive Analytics | No Comments »
SPSS have recently posted and article called “10 Reasons you really need predictive analytics“. I thought it would be interesting to post the main points from this article to illustrate that not all predictive analytic projects involve Data Mining, but involve a number of different techniques and looking the the business data in a different way. Yes data mining can be a very important element in some of the following
1. Get a higher return on your data investment
Your organization has a significant investment in data – data that contains critical information about every aspect of your business. Today more than ever, you need to get the best return on the data you have collected–and predictive analytics is the most effective way to do this. Predictive analytics combines information on what has happened in the past, what is happening now, and what’s likely to happen in the future to give you a complete picture of your business.
2. Find hidden meaning in your data
Predictive analytics helps you maximize the understanding gained from your data. It enables you to uncover hidden patterns, trends, and relationships and transform this information into action.
3. Look forward, not backward
Unlike reporting and business intelligence solutions that are only valuable for understanding past and current conditions, predictive analytics helps organizations look forward. By leveraging sophisticated statistical and modeling techniques, you can use the data you already have to help you anticipate future events and be proactive, rather than reactive.
4. Deliver intelligence in real time
Your business is dynamic. With predictive analytics, you can automatically deploy analytical results to both individuals and operational systems as changes occur, helping to guide customer interactions and strategic nd tactical decision making.
5. See your assumptions in action
Advanced analytical methods give you the tools to develop hypotheses about your organization’s toughest challenges and test them by creating predictive models. You can then choose the scenario that is likely to result in the best outcome for your organization.
6. Empower data-driven decision making
Better processes help people throughout your organization make better decisions every day. Predictive analytics enables your organization to automate the flow of information to match your business practices and deliver the insights gained through this technology to people who can apply them in their daily work.
7. Build customer intimacy
When you know each of your customers or constituents intimately—including what they think, say, and do—you can build stronger relationships with them. Predictive analytics gives you a complete view of your customers, and enables you to capture and maximize the value of each and every interaction.
8. Mitigate risk and fraud
Predictive analytics helps you evaluate risk using a combination of business rules, predictive models, and information gathered from customer interactions. You can then take the appropriate actions to minimize your organization’s exposure to fraudulent activities or highrisk customers or transactions.
9. Discover unexpected opportunities
Your organization can use predictive analytics to respond with greater speed and certainty to emerging challenges and opportunities, helping you to keep pace in a constantly changing business environment.
10. Guarantee your organization’s competitive advantage
Predictive analytics can drive improved performance in every operational area, including customer relations, supply chain, financial performance and cost management, research and product development, and strategic planning. When your organization runs more efficiently and profitably, you have what it takes to out-think and out-perform your competitors
So what is Predictive Analytics. Check out the description on Wikipedia
Let me know you views and comments on the above.
Brendan Tierney
Posted: February 1st, 2010 | Author: Uli Bethke | Filed under: Oracle Data Integrator (ODI) | Tags: hsqldb, Oracle Data Integrator and hsqldb, Sunopsis Memory Engine | 3 Comments »
In the first part of this series we’ve had a high level look at the ODI HSQLDB in memory databasase aka the Sunopsis Memory Engine (SME).
As promised we’ll today have a look at one of the use cases for the SME, in-memory low data volume transformations. This could be particularly useful for temp (yellow) interfaces where you don’t want to physically set down your data.
Before we do this, however, I wanted to introduce another use case for the SME that came to my mind the other day. As HSQLDB allows you to use Java methods as function calls in SQL transformations you could make use of this in ODI interfaces to extend functionality not readily available in your target RDBMS. Data encryption and other security related use cases, e.g. generating MD5 hashes come to my mind.
Below is a screenshot of how this can look like in ODI. You need to make sure to either execute the call to the Java method on the Staging Area, or if you don’t reference a source column on the Target.

For more information on how to achieve this refer to the HSQLDB documentation. I believe it is possible to use your own Java methods this way. You need to compile them in class/jar files and reference them in the odiparams.bat classpath. I haven’t tried this out myself but I believe that’s the way it should work.
Let’s move on to our other use case now though. Below I have created a simple temp interface that uses the Oracle sh.sales table to load data into a table in the Sunopsis Memory Engine.

We then execute this interface a couple of times.
On first execution the memory allocated to the ODI agent JVM jumps to 81.6 MB.

The next execution brings this up to 123.7 MB

Yet another execution brings memory usage to 178.6 MB. From the Private Bytes Memory graph you can see that the memory allocation grows linearly as we insert into the s_sales table in the SME.

We execute the interface a few more times until we finally hit the memory limit specified in odiparams.bat.

Subsequent executions of the interface will throw an error along the lines of the figure below.

As you can see we need to clean up after ourselves when using the in memory engine. A simple DROP TABLE in a procedure will drop the in memory table and release memory.

As you can see from the figure below memory is allocated when the table is loaded and then de-allocated when it is dropped

Posted: January 27th, 2010 | Author: Brendan Tierney | Filed under: Best Practice, Business Intelligence, Data Warehouse, Training, data mining | Tags: Brendan Tierney, Data Mining Resource, examples of data mining, ODM, Oracle Data Miner | No Comments »
Over the past couple of weeks a couple of new web resources have appeared on Oracle Data Miner
The first one is that Charlie Berger, the director of Oracle Data Mining Product Management, has started a blog specifically for Oracle Data Miner. Check it out,
http://blogs.oracle.com/datamining/
If you are already using Oracle Data Miner or are interested in following its developments why not join the Oracle Data Miner Facebook group
http://www.facebook.com/pages/Oracle-Data-Mining/287065104533?ref=mf
Posted: January 23rd, 2010 | Author: Uli Bethke | Filed under: Oracle Data Integrator (ODI) | Tags: hsqldb, odi, Oracle Data Integrator and hsqldb, Sunopsis Memory Engine | 2 Comments »
Oracle Data Integrator not only uses hsqldb for its demo environment. It is also used as the in memory database for the so called Sunopsis Memory Engine.
“It is possible to run HSQLDB in a way that the database is not persistent and exists entirely in random access memory. As no information is written to disk, this mode should be used only for internal processing of application data, in applets or certain special applications. This mode is specified by the mem: protocol”. (hsqldb documentation)
ODI uses version 1.7.3 of hsqldb. You can download this version together with the documentation from here.
When you use the Sunopsis Memory Engine in an ODI interface or procedure an hsqldb in-memory database is created inside the JVM of the agent. The memory that is allocated to the hsqldb is taken from the JVM. As a result the maximum memory size that can be allocated to the in memory database is determined by the heap size parameter in the odiparams.bat file.
The JDBC Url syntax that is used in ODI is actually the old syntax as can be seen from the figure below.

The correct syntax for the above JDBC Url should be jdbc:hsqldb:mem:. As you can see you can specify the name of the database. This allows you create multiple hsqldb databases inside the same JVM.
“Version 1.7.2 introduces a uniform method of distinguishing between different types of connection, alongside new capabilities to provide access to multiple databases. The common driver identifier is jdbc:hsqldb: followed by a protocol identifier (mem: file: res: hsql: http: hsqls: https:) then followed by host and port identifiers in the case of servers, then followed by database identifier” (hsqldb docs).
When you create an hsqldb in-memory database it persists over the lifetime of the JVM, until the agent is stopped, or if you use the local agent until you close down Designer. Alternatively you can issue a Shutdown command against the hsqldb database. Any of the above will destroy objects that were created and will release the allocated memory. It is important to understand that memory is not released upon execution of an interface or when an ODI session ends. Tables or other objetcs that were created in an ODI package persist across the lifetime of this package. This is quite different to the lifecycle of say a Jython/Java BeanShell variable. You need to make sure that you clean up after yourself by e.g. dropping any objects you create.
As far as I know you can’t connect from an hsqldb mem database inside a JVM to one in another JVM.
“In-process (standalone) mode supports connections from the client in the same Java Virtual Machine, while server modes support connections over the network from several different clients” (hsqldb docs).
This means that you can’t use a client such as Squirrel SQL to connect to tables that you create in an interface. What you can do however, to debug is to manually create datastores in a Sunopsis Memory Engine model and use the local agent to execute interfaces that create and populate this table. You can then use the View Data functionality to view the content of the datastore. This is also documented in note 424597.1.

When does it actually make sense to use the hsqldb in mem database? I can see two scenarios where this feature may be useful.
First, it can be used for simple low volume transformations (less than 100K records; exact number really depends on the parameters of your own environment such as JVM heap size, concurrency etc.). This can be particularly useful for temporary interfaces if you don’t want to physically materialize the data on disk. Remember, however, that it is not the purpose of the ODI agent that runs the in mem database to do a lot of heavy lifting. This would not make any sense in an ELT tool where all the heavy duty stuff is done by the target RDBMS.
Note: You need to be careful not to run out of memory when you use the in mem database for this purpose. I will show you in the second part of this series what best practices you can use to achieve this. I am not sure how efficient hsqldb is in executing complex multi-table joins with subqueries. Currently it also does not support analytic functions, which in my opinion is a big disadvantage.
Secondly, the Sunopsis Memory Engine can be used to persist variables across sessions and scenarios or to populate ODI variables from Jython or the Java BeanShell. This is handy as all your logic is executed inside the JVM of the agent and you don’t have to switch back and forth between different technologies. As a result you will keep overhead to a minimum. We will have a look at how this can be achieved in the third part of this series.
Posted: January 14th, 2010 | Author: Uli Bethke | Filed under: BI Magazines | Tags: BI Magazines, business intelligence magazines, data warehouse magazines, data warehousing magazines, database magazines | No Comments »
I did some research around business intelligence and data warehousing magazines. If you think I have missed one let me know.
Magazines – print version
Business Intelligence Journal (TDWI) “The Business Intelligence Journal is an in-depth, unbiased information resource for business intelligence and data warehousing professionals. This quarterly publication provides fresh ideas and perspectives to help organizations operate more intelligently” (quote TDWI website).
In my opinion the best BI publication out there. As a member of the TDWI you will receive this publication on a quarterly basis. Membership is required. It is available in an online (PDF) and offline version.
What Works: Best Practices in Business Intelligence and Data Warehousing (TDWI) “is a comprehensive collection of industry case studies, Q&A, and lessons from the experts. Each edition provides a snapshot of some of the most innovative business intelligence and data warehousing teams and implementations in the world today” (quote TDWI website). Another TDWI publication. You need to register to view content. The magazine is published twice a year.
The Ten Mistakes to Avoid (TDWI) : This “series, published quarterly, addresses the 10 most common mistakes managers and teams make—from data modeling to building an operational data store—and gives you inside knowledge on how to avoid these common pitfalls”. This is only for TDWI members.
Best of Business Intelligence (TDWI): “Each year we select a few of TDWI’s best, most well-received, impact-full articles, research, and information, and present them to you in this publication”. (quote TDWI website)
Information Management. Formerly known as DM Review this one is published eight times a year.
Oracle Magazine: As an Oracle practitioner you have to subscribe to Oracle Magazine. As it’s free you don’t have any reasons not to. Good mix of technical and general Oracle product information. In my opinion, the highlight is Tom Kyte’s column.
Another greate Oracle magazine is Oracle Scene. It is published three times a year. You need membership of the UKOUG to receive it.
SQL Server Magazine Unlike Oracle Magazine, SQL Server Magazine is not for free. There is both a print and online edition.
IBM Data Management magazine Formerly known as DB2 magazine this quarterly magazine comes in a print and online edition. Good mix of IBM data management and general data management content. You can suscribe to it for free. In my opinion another must read.
Business Intelligence Magazine This one is in German. As far as I know it is published quarterly from Switzerland. Individual magazines cost 9.70 Euro and you can subscribe for a trial. Focus is on management and business processes with a lot of BI case studies. You won’t find any technical content here.
BI-Spektrum: This magazine is edited by TDWI Germany. Its content is not related to the TDWI Business Intelligence Journal. Typically it is published five times a year and you can subscribe for a trial https://www.ips-d.de/order-Sigs-datacom-abo_de/ . Focus is on business processes in the various industries. Expect a lot of case studies.
Database Pro Great German magazine on anything database. I’m a subscriber since issue one. Has excellent features on DW/BI. The magazine has improved a lot since its first edition. If you speak German you have to subscribe to this one.
Online Magazines
Teradata magazine This one is for the Teradata gurus. It’s not all Teradata here, but you will also find general DW/BI content such as case studies etc. It’s a pity that you can’t download the magazine in PDF format as a whole. I’d prefer to put this on my e-book reader rather than read from screen.
Open Source Database Magazine Technically orientated magazine about open source databases such as PostgreSQL, MySQL and others.
What Works This magazine is published by TDWI Germany. It’s similar in nature to the original. Different content though and not just a translation.
Database magazine This one is in Dutch. Unfortunately, I don’t speak the language because the content looks interesting.
Online portals with newsletters
The best online portals in my opinion are:
BeyeNetwork
The Data Administration Newsletter
Intelligent Enterprise
Kimball Forum
Posted: January 7th, 2010 | Author: Uli Bethke | Filed under: Oracle Data Integrator (ODI) | Tags: odi subqueries, oracle data integrator subqueries, temporary interface | No Comments »
Oracle Data Integrator does not have any built in functionality for subqueries in interfaces. This is one of the reasons why you will find people claiming that ODI has shortcomings when dealing with complex transformations.
Shortcomings of existing workarounds
In the ODI community various workarounds have been suggested to address this. The following discussion thread on OTN summarises these efforts.
1) Use of WHERE EXISTS in the predicate
2) Use of a View that contains the complex subquery transformation logic as the source datastore.
3) Use of a series of temporary (yellow) interfaces.
While all of these are valid workarounds each of the above approaches has drawbacks of its own.
The use of WHERE EXISTS only addresses a small subset of requirements. Using views has one big disadvantage. You will lose data lineage. By using views you will not be able to trace mappings from source to target and as a result lose one of the advantages that an ETL tool offers over scripting. In theory performance should not suffer as Oracle uses predicate pushing and view merging. Tom Kyte has written an excellent article about this for Oracle Magazine a while back if you require more information (look for the section on Views and Merging/Pushing at the bottom of the article). The third option is to use a series of temporary interfaces. However, this has the disadvantage of physically setting down and materialising the resultset for each subquery, rather than processing everything in memory. Depending on the circumstances, you may experience degradation in performance.
Proposed workaround
I propose another workaround here that addresses all of the above shortcomings. We will use a combination of a temp (yellow) interface and a custom knowledge module to address the issues. Each subquery in our complex transformation requirement will be assigned to a temporary interface. I have written a custom integration knowledge module that will store the SQL of the subquery for each temporary interface in a table. We then stick all of our temp interfaces with the subqueries into a package and then let the knowledge module combine the individual subqueries into one complex query before loading the data into our target table. Take the following as an example. We have a requirement to rank the Top 10 purchasing customers and insert them into a target table. In SQL this can be done by using the rank() analytic function in a subquery and then filtering on this subquery. In our solution this will translate into two interfaces. One temp interface for the subquery with the analytic function. Our custom KM will store the SQL for this subquery in a table. We will then need a second (non-temp) interface that filters on the top 10 purchasing customers, combines the two queries, executes the resulting complex query, and inserts the resultset into the target table.
You can download the custom subquery knowledge module and the other scripts from here.
Note: This is is just a prototype at the moment to demonstrate that it is possible to execute complex transformations in ODI. At the moment the solution is specific to Oracle. I intend to rewrite parts of the knowledge module using Jython arrays to store the subqueries over the next couple of weeks to make the solution technology agnostic.
The subquery knowledge module
Before I give a working example I want to give a brief overview on the custom knowledge module that is our work horse.
Step 1. Create table that stores the SQL for the subqueries.
Nothing spectacular here. We create a table that will hold the SQL for our subqueries. The data type for the sql query column is a CLOB (this will become relevant in a later step). We also have a session number column to store the ODI session number for each subquery. This will allow us to concurrently use the knowledge module.

CREATE TABLE SQL_QUERY
(
QUERY CLOB,
IDENTIFIER VARCHAR2(30 BYTE),
SESS_NO NUMBER,
CREATE_DATE TIMESTAMP(6)
)
Step 2: Store SQL pieces for subquery in Jython variables
In step two we store the SQL pieces for the subquery in Jython variables (note Jython as the technology). We retrieve the individual SQL components via the ODI substitution API.

v_collist = """<%=snpRef.getColList("", "\t[EXPRESSION]", ",\n", "", "")%>"""
v_query = """select <%=snpRef.getPop("DISTINCT_ROWS")%>
<%=snpRef.getColList("", "\t[EXPRESSION] AS [CX_COL_NAME]", ",\n", "", "")%>
from <%=odiRef.getSrcTablesList("", "([TABLE_NAME])", ", ", "")%>
where (1=1)
<%=snpRef.getJoin()%>
<%=snpRef.getFilter()%>
<%=snpRef.getJrnFilter() %>%s"""
if v_collist.find('OVER') == -1:
v_group_by = """<%=snpRef.getGrpBy()%><%=snpRef.getHaving()%>"""
else:
v_group_by =""
v_query = v_query % v_group_by
Note: Make sure that you use the tab key for indentation. This seems to be a peculiarity of Jython.
Note also that I have built in a simple check for the presence of analytic functions into the code. If we find an analytic function in the source to target column mappings we do not generate the GROUP BY SQL piece as otherwise an error will be thrown. This simple check is missing from all of the out of the box knowledge modules. As a result analytic functions such as SUM() OVER, MAX() OVER etc. can not be used in interfaces. I consider this a bug in the getGrpBy() API substitution method in ODI.
Note: There is a clever workaround for the above described in Metalink note 807527.1
Step 3: Debug subquery
In this step we build a debugging mechanism into the knowledge module. We use the Jython raise function to throw an exception and print out the value of the v_query variable to the Operator. This will allow us to quickly review the value of the generated SQL in the Operator module and if necessary allow us to debug it.

Note that you have to select the Ignore Errors checkbox. Otherwise execution of the interface will terminate at this step (after all we are forcefully raising an error).
Step 4: Store SQL for subquery in table
In this step we will use the Jython technology to store the SQL for the subquery together with the ODI session number in a table. We use a Jython stored procedure call to accomplish this.

import java.sql as sql
import java.lang as lang
import java.sql.Types as types
myCon = snpRef.getJDBCConnection("SRC")
try:
inVar1 = v_query
inVar2 = '<%=odiRef.getTargetTable("TABLE_NAME")%>'
inVar3 = <%=odiRef.getSession("SESS_NO")%>
myStmt = myCon.prepareCall('begin LOAD_SQL_QUERY (?,?,?); end;');
## Bind parameters (IN parameters)
myStmt.setString(1, inVar1);
myStmt.setString(2, inVar2);
myStmt.setInt(3, inVar3);
## Execute the callable statement
myStmt.execute();
finally:
inVar1 = 'myVar1';
Note that you have to use indentation using your tab key exactly as in the screenshot as otherwise an error will be thrown. Have a look at 424207.1 for more information on calling stored procedures from Jython.
Code for the stored procedure load_sql_query is as follows:
CREATE OR REPLACE PROCEDURE load_sql_query (p_query_value IN CLOB,p_identifier IN VARCHAR2,p_sess_no IN NUMBER)
IS
BEGIN
INSERT INTO sql_query VALUES (REPLACE(p_query_value,CHR(10)||CHR(10),''),p_identifier,p_sess_no,SYSTIMESTAMP);
COMMIT;
END;
/
Step 5: Create complex query from individual subqueries
We only execute this step once we have run all of our subqueries. This step takes all of the subqueries and creates our final complex query. Condition for executing this step is that we are dealing with a non-temporary table that is a normal (non-temp) interface.
Command on Source

SELECT NVL('<%=odiRef.getTargetTable("TABLE_TYPE")%>','Yellow') AS table_type FROM dual
The getTargetTable method returns ‘T’ for a non temp target table and NULL for a temp target table.
Command on Target

BEGIN
-- INSERT INTO target_table VALUES ('#table_type', <%=odiRef.getSession("SESS_NO")%>);
--COMMIT;
-- Only execute if target table is not of type temporary
IF '#table_type' = 'T' THEN
sql_substitute (<%=odiRef.getSession("SESS_NO")%>);
END IF;
END;
Note: We only execute this step if the variable table_type is of type non-temp, that is equals T.
If we are dealing with a non-temp table and this is our final step in our complex transformation we generate the final complex query from the underlying subqueries by executing a stored procedure.
CREATE OR REPLACE PROCEDURE sql_substitute (p_sess_no IN NUMBER)
-- exec target_sql;
IS
final_raw_query CLOB;
CURSOR c_iterator_outer IS
SELECT
query,
identifier,
create_date
FROM
sql_query
WHERE
sess_no = p_sess_no
ORDER BY
create_date DESC;
CURSOR c_iterator_inner IS
SELECT
query,
identifier,
create_date
FROM
sql_query
WHERE
sess_no = p_sess_no
ORDER BY
create_date;
BEGIN
SELECT query INTO final_raw_query from sql_query where create_date = ( select MAX(create_date) FROM sql_query);
FOR r_iterator_outer IN c_iterator_outer
LOOP
dbms_output.put_line ('outer query:' || r_iterator_outer.query || ' identifier:' || r_iterator_outer.identifier);
FOR r_iterator_inner IN c_iterator_inner
LOOP
final_raw_query := REPLACE(r_iterator_inner.query,'('||r_iterator_outer.identifier||')','('||r_iterator_outer.query||')' || ' ' || r_iterator_outer.identifier);
UPDATE sql_query SET query = final_raw_query WHERE create_date = r_iterator_inner.create_date;
dbms_output.put_line ('inner query' || final_raw_query);
COMMIT;
END LOOP;
END LOOP;
END;
/
There may be a more elegant way of doing this but this seems to work for the moment. Originally I had thought of using the model clause for this but then noticed that this does not support CLOBs.
Step 6: Execute complex query and insert transformed data into target table
In this step we grab the transformed and substituted complex query from our table and execute it.
Command on Source

SELECT
NVL('<%=odiRef.getTargetTable("TABLE_TYPE")%>','Yellow') AS table_type,
-- Metalink 423909.1 Edit odiparams or use jdbc driver 5 or 6
query AS query
FROM
sql_query
WHERE
sess_no = <%=odiRef.getSession("SESS_NO")%> AND
create_date = ( SELECT MAX(create_date) FROM sql_query WHERE sess_no = <%=odiRef.getSession("SESS_NO")%>)
In the Command on Source we retrieve the complex SQL query from our database table.
As we are dealing with a CLOB here we first need to make some changes to the odiparams.bat file. Alternatively you can use version 5 or 6 of the JDBC driver (Metalink 423909.1). I have outlined in a previous post how to achieve this.
If we don’t want to install a new JDBC driver we need to add the following line to the odiparams.bat
set ODI_ADDITIONAL_JAVA_OPTIONS= “-Doracledatabasemetadata.get_lob_precision=false”
You will need to restart the Designer module for this to take effect.
Command on Target

BEGIN
IF '#table_type' = 'T' THEN
INSERT INTO <%=odiRef.getTable("L","TARG_NAME","A")%>
(<%=odiRef.getColList("", "[COL_NAME]", ",\n\t", "", "((INS and !TRG) and REW)")%>
<%=odiRef.getColList(",", "[COL_NAME]", ",\n\t", "", "((INS and TRG) and REW)")%> )
#query;
COMMIT;
END IF;
END;
Note that we need to select the Ignore Errors checkbox. Otherwise the KM will fail at this step for the subqueries stage. A better approach for this may be to use Jython and pass the #query variable into a stored procedure that executes it as dynamic SQL.
Currently I use an INSERT statement for loading the target table. In a future version of this KM I also intend to add a MERGE statement as an option.
Step 7: Write complex query to Operator module
In a final step we write the complex query to the Operator. We can then easily debug any issues with it.
Command on Source

SELECT
NVL('<%=odiRef.getTargetTable("TABLE_TYPE")%>','Yellow') AS table_type,
-- Metalink 423909. Edit odiparams or use jdbc driver 5 or 6
TO_CHAR(query) AS query
FROM
sql_query
WHERE
sess_no = <%=odiRef.getSession("SESS_NO")%> AND
create_date = ( SELECT MAX(create_date) FROM sql_query WHERE sess_no = <%=odiRef.getSession("SESS_NO")%>)
Command on Target

raise("""INSERT INTO <%=odiRef.getTable("L","TARG_NAME","A")%>(<%=odiRef.getColList("", "[COL_NAME]", ",\n\t", "", "((INS and !TRG) and REW)")%> <%=odiRef.getColList(",", "[COL_NAME]", ",\n\t", "", "((INS and TRG) and REW)")%> ) """ + """#query""")
The subquery knowledge module in action – Two examples
Let’s now have a look at our new Knowledge Module in action. For this purpose we will use tables from the SH sample schema.
Example 1: Subquery with analytic function
In this example we want to retrieve the date on which the amount sold for a customer was the greatest. One way of achieving this is to use the ROW_NUMBER() analytic function in a subquery to rank the amount sold and then filter the Top 1 record from this subquery. So the query would look similar to the following
SELECT
CUST_ID AS CUST_ID,
TIME_ID AS TIME_ID,
AMOUNT_SOLD AS AMOUNT_SOLD,
row_number() OVER (PARTITION BY CUST_ID ORDER BY AMOUNT_SOLD DESC, TIME_ID DESC) AS RN
FROM
sales
) SQ1
WHERE SQ1.rn = 1
We will now generate this query with our new knowledge module.
First of all we will generate a temporary interface for the subquery with the analytic function in it.

Select the Staging Area Different From Target checkbox and from the dropdown select the schema you want to execute this in.
In the Target Datastore click on Untitled. Type SQ1 as the Name for the target datastore and select the Data schema radiobutton.

Drag the sales table from the SH model to the Sources area (you first need to reverse engineer this from the SH schema).

Drag and drop the columns CUST_ID, TIME_ID and AMOUNT_SOLD from the source datastore to the target datastore.

Right click inside the Target Datastore area and select Add a column

Name the new column rn and type in the value for the analytic function:
row_number() OVER (PARTITION BY SALES.CUST_ID ORDER BY SALES.AMOUNT_SOLD DESC, SALES.TIME_ID DESC)

Go to the Flow tab and select the custom subquery knowledge module.

That’s it, we have created the yellow interface for our subquery.

In a next step we now need to create the interface that takes the sales_amount yellow interface as its source and loads the target table.
Create an interface, name it sales_amount_filter, drag and drop the sales_amount interface to the Sources area, and create a filter as per screenshot below.

We need to create a table for our target datastore. Use the script below to create this table, reverse engineer it and drag and drop it to the target datastore area.
CREATE TABLE SALES_AMOUNT
(
CUST_ID NUMBER,
TIME_ID DATE,
AMOUNT_SOLD NUMBER
);
Next move on to the Flow tab and select our custom subquery knowledge module.
We now need to bring the two interfaces together in a package and then we are ready to load our target.

Execute the package and then switch to the Operator module. You should get something similar to the screenshot below.

Let’s have a look at the execution of the individual steps. We’re particularly interested in the subqueries and the final complex query. So let’s have a look at the debug steps where these will show up.
Below we see the query for the temp interface that contains the analytic function.

Next we’ll have a look at the query that is generated for the second (non-temp) interface.

As you can see from the figure above, the query selects from the SQ1 target table of the temporary interface.
We substitute SQ1 in step 12 of the package with the SQL query of the analytic function and then use this complex query to insert into the target table.

Example 2: Joining two subqueries
In the next example we take this a step further and join another subquery to the package from example 1.
We first create another subquery using a temp interface. As you can see from the figure below we aggregate the amount_sold by cust_id.

We then take this temp interface and the temp interface from example 1 and join these together on the cust_id in yet another interface.

The script for the target table sum_sales_amount is as follows
CREATE TABLE SUM_SALES_AMOUNT
(
CUST_ID NUMBER,
TIME_ID DATE,
AMOUNT_SOLD NUMBER,
SUM_AMOUNT_SOLD NUMBER
)
Next we stick the three interfaces into a package.

… and execute it.
From the Operator module we can see the generated SQL.

Conclusion
In this article I have shown you how you can execute complex queries and subqueries in ODI without loss of data lineage and without suffering performance degradation. So far I have not used the knowledge module in a production environment. So I am sure there is still room for improvement. However, from now on I will use and further develop this approach to meet complex transformation requirements in ODI.
Any input highly welcome. Also let me know of bugs that you come across or if you find that any of the above is unclear.
Posted: December 7th, 2009 | Author: Uli Bethke | Filed under: Data Warehousing Books | Tags: business intelligence books, data warehosuing books, oracle books, oracle business intelligence books, oracle data warehosuing books | No Comments »
Back from holidays. Finally wired up to the world again. Today I continue in my series on Data Warehousing books. We will focus on Oracle this time. If you know of any other good Oralce business intelligence books let me know and I will add them here. Also let me know if you diasgree with any of the evaluations of the books.
Must have
In my opinion there aren’t really any brilliant Oracle data warehousing books out there. So what I recommend first of all is to read the Oracle Database Data Warehousing Guide.
I also recommend all of Tom Kyte’s books. Even though they are not specific to Oracle data warehousing they are a must read for anyone working on the Oracle RDBMS. What I love about all of Tom’s books and articles is the scientific no nonsense approach.
For similar reasons I recommend to get your hands on anything by Cary Millsap. I really recommend Optimizing Oracle Performance if you want to understand extended trace in Oracle. It’s a bit older but brilliant. And you can get your hands on a used copy for a couple of quid.
Two other books I have come across recently and found to be quite useful are (1) Troubleshooting Oracle Performance by Christian Antognini from Swiss consulting firm Trivadis. While this does not offer many new insights into the subject it is a valuable overview and reference book. (2) Oracle Performance Survival Guide: A Systematic Approach to Database Optimization. The chapters I found particularly useful here are on minimizing contention, optimizing memory, and optimizing IO. The last one has very useful information on SSDs (in my opinion one of the emerging trends in Data Warehousing for the next decade) and Exadata.
Should Have
If you are implementing a data warehouse in Oracle Oracle 10g Data Warehousing will give you a good technical overview. It touches on all the relevant areas you need to take into consideration when building your data warehouse on Oracle. It will also serve you well as a refresher or lookup for individual areas.
However, don’t expect too much detail. You will need to consult other books, the Oracle documentation, and the blogosphere for advanced topics and more detail.
As a technical introduction and reference to Oracle data warehousing this book has done me a good service over the years.
The Oracle DBA Guide to Data Warehousing and Star Schemas is a bit older, but contains some useful formulas for hardware sizing of an Oracle data warehouse. Does what it says on the tin, but fairly limited in scope.

Oracle Warehouse Builder 11g: Getting Started does pretty much what it says on the tin and gives a good introduction to novices in the area of data warehousing and Oracle Warehouse Builder. So if you have never used Oracle Warehouse Builder this book is for you. If you have used OWB before you will not learn anything new here. Also this book can only be a starting point for your OWB career. A lot more OWB features than are outlined in this book need to be learned to become a master in the area (if you think about it the OWB user manual in PDF format has about 1000 pages and in some areas only scratches at the surface). Also the timing of the publication of the book is a bit unfortunate as only recently OWB 11GR2 was released with a lot of important new features and a redesigned User Interface. Hopefully there will be a 2nd edition soon that addresses this shortcoming. Mark Rittman has also a review of this book on his blog.

Oracle BI Enterprise Edition Dashboard & Report Best Practices is the only book currently out there on OBIEE. Mainly around dashboard design. I haven’t read this myself so if anyone of my readers has I would be grateful for a comment.
I am not too familiar with Essbase myself and have not read the following two books. But from what I hear and looking at the credentials of the authors, both books should be a good read. The first one is Oracle OLAP and Essbase and has only been published recently. The other one is The Multidimensional Modelling Toolkit. Mark Rittman has a review of this on his blog.
Could Have
I had a very disjointed reading experience with Oracle Data Warehousing and Business Intelligence Solutions. There isn’t really a central a central theme in this book. This book is a patchwork of data warehouse related stuff at a very high level. Oracle Data Warehouse Tuning for 10g really is a rehash of the Oracle Performance Tuning for 10g by the same author. The book covers mostly very basic stuff here. I found that everything you find here is better explained in the documentation.
kn
Posted: November 20th, 2009 | Author: Brendan Tierney | Filed under: Business Intelligence, Data Warehouse, Oracle, data mining | Tags: Bill Inmon, data mining, data mining discussion, Data Mining Project, data mining training | No Comments »
Bill Inmon has recently posted an article on “Why has Data Mining struggled so much?”
The article discusses 7 diferent reasons why data mining has struggled, as it has been around for a very long time.
The main points are
1. We have been waiting a long time for it to become available in a usable way
2. Data mining is considered an academic focused with very few practitioners. But this is become less so
3. Data mining requires a different set of skills. Yes you need data management skills but you also need some data mining skills. I will be making a posting focusing on the skill sets required for data mining in the coming weeks.
4. Some industries and application areas are more suited to data mining than others. The difficult is in identifying suitable projects.
5. Data for Data Mining is unclean. Not if you use a data warehouse. Idealy an organisation who has a matur-ish BI infrastrucure will benefit must from a Data Mining project
6. Data is incomplete. Yes you may need to enrich the data from various sources. But again if you have a Data Warehouse you will have most of these
7. Approaches to data mining inadequate. Alot of the approches to data mining projects as based on its statistical history. New problem areas are evolving all the time and we can use data mining in lots of different way.
To view Bill Inmon’s article – click here.
To view our 2 training courses on data mining – click here
Brendan Tierney