Data Warehousing Books: Design and architecture

Posted: October 31st, 2009 | Author: Uli Bethke | Filed under: Books, Business Intelligence, Data Warehouse, Data Warehousing Books | Tags: , , , , | 2 Comments »

In another post I have covered data warehousing books in the world of Oracle. We’ve also had a look at data warehousing and business intelligence books for project management and business analysis. Today we will look at data warehousing and business intelligence books that look at the technical design and architecture of a data warehouse solution.

Must Have

DW 2.0: The Architecture for the Next Generation of Data Warehousing: Bill Inmon revisits his data warehouse architecture. Addresses the following issues: Real-time BI, unstructured data, the enterprise data warehouse and change, the data life cycle, time variance of data. Very useful from a conceptual point of view, but not enough detail.

The Data Warehouse Toolkit- The Complete Guide to Dimensional Modelling. My first book on data warehousing. Still valuable today. Great for dimensional modelling data marts or small non-realtime Enterprise Data Warehouses based on Kimball’s conformed dimensions. It also has a good overview on industry specific data model patterns in a dimensional context. A must have.

The Data Model Resource Books Vol 1-3: The books describe fundamental data modeling patterns that can be applied and reused across the enterprise. If you are assigned the task of modelling an Enterprise Data Warehouse, these books give you great insight into best practices in data modelling. Volume 2 offers industry specific data model patterns and provides invaluable information to better understand the issues at hand in a particular industry. Personally I find it that you should actually start with volume 3 as this is the most generic of the three books. Also if you only get one of the books get volume 3.

If you have a requirement around near-real time data warehousing and operational business intelligence I recommend to look into Dan Linstedt’s data vault modelling techniques. The Business of Data Vault Modeling will get you started.

Some more recent additions to the data warehouse architecture league of books includes Building and Maintaining a Data Warehouse and Advanced Data Warehouse Design. The first of these walks us through all the technical areas of a data warehouse project: source system analysis, database design, bi reporting, data quality, metadata. In my opinion, the best chapter is on data integration and ETL. There are very few dedicated ETL books out there and this is one of the few that touches on the subject, albeit from a high level. In Advanced Data Warehouse Design the authors discuss the shortcomings of existing data warehouse implementations focusing mainly on spatial and temporal data, e.g. the shortcomings of slowly changing dimensions when capturing changes over time. They propose a truly temporal and spatial data warehouse. Examples are given in MS SQL Analysis Service (temporal) and Oracle OLAP (temporal and spatial).

To my knowledge the only book out there dedicated to the physical design of databases is Physical Database Design: the database professional’s guide to exploiting indexes, views, storage, and more. Most of the stuff covered here is for advanced users. It covers Oracle, DB2, SQL Server, and for some of the MPP stuff Teradata. Personally I found the chapter on physical design for a shared nothing architecture, and the chapter on hardware (CPU architecture, disks, server sizing etc.) the most useful.

               
               
               

Dr. Ronnie Abrahiem, Software Engineer at CIBER has recently published a book on combining SOA and data warehousing in a near-real time environment. This looks quite interesting but I haven’t read the book myself. It has the rather long title Data Warehousing with Service-oriented Architecture: Designing and Implementing Prototype Models For an Integration of Near-Real-Time Data Warehousing Architecture with Service-oriented Architecture. I am currently working on a project where we want to integrate a SOA based MDM solution with the data warehouse. The book may offer some interesting insights around this.

Should Have

If you have a lot of aggregate tables in your warehouse I recommend to have a look at Mastering Data Warehouse Aggregates for a formalised methodology and some really useful tips and tricks around an aggregate navigator.

Another recent addition to data warehouse design books is Data Warehouse Design: Modern Principles and Methodologies. Very useful chapter on ETL and quite affordable.

Could Have

Data Warehouse Design Solutions. This is useful as a second reference for industry specific dimensional models. However, it can not replace Kimball’s original book on the subject.

Clickstream Data Warehousing. If you are implementing a data warehouse for web analytics you should have a look here. However, in light of the explosion of data volumes and with Hadoop and MapReduce at hand this one is slightly obsolete.


TIME dimension script Oracle

Posted: October 30th, 2009 | Author: Uli Bethke | Filed under: Data Warehouse, ETL, SQL for Analysis | Tags: | No Comments »
SELECT
   n AS time_id,
   TO_CHAR(to_date(n,'SSSSS'),'HH24') AS hour,
   TO_CHAR(to_date(n,'SSSSS'),'MI') AS minute,
   TO_CHAR(to_date(n,'SSSSS'),'SS') AS second
FROM (
   SELECT
      level-1 n
   FROM
      DUAL
   CONNECT BY LEVEL <= 86400
)

What is Data Mining ?

Posted: October 28th, 2009 | Author: Brendan Tierney | Filed under: Business Intelligence, Training, Uncategorized, data mining | Tags: , , , , , , , , , | No Comments »

In this weeks topic I will explore what is Data Mining, the different meanings, how the term is used, etc. I will give you my interpretation of what it is and how other descriptions of data mining can be categorised.

Every article you read, every presentation you hear, etc. you get a slightly different description, or should it be that they hint to a description of how they use data mining in the products or their applications. By giving this hint at what data mining is they try to claim that they are using it, as it gives their products, applications and services a higher degree of sophistication compared to others. There is also the idea that it is a one of those trendy terms that is thrown out without them really knowing what it is really about.

Data Mining Definition

One of the most commonly cited definitions of what data mining is, “..it is the non-trivial extraction of previously unknown and potentially useful information from data” by Usama Fayyad et al (Chief Data Officer, Yahoo Inc) in their landmark paper back in 1996.

Based on this definition data mining is does not involve some basic analytics, decision making based on some defined rules, being able to identify events based on current data, etc. But these type of scenarios are typically talked about as being data mining. If we go back to the definition by Fayadd above, by say the “non-trivial” it means that we cannot write some code/queries to pull data out of our data that answers some simple questions. Another important part of the definition is “potentially useful information”, tells use that some times and may in a lot of cases, data mining does not give use anything useful. But it can give us useful information only if we have a good understanding of the data, the business rules of the data, the meta-data, how the rules and the data relate to each other, etc. All of this requires extensive experience of working with the data. Who is best at doing this, but database designers and developers. People with a statistics background (typical what you see in data mining roles) have to go and learn all about the data, the business rules, the meta-data etc. This can be a huge waste of time and resources as the database people are generally ignored.

Some examples

I was at an IT conference last week (I was co-author of a paper on Opinion Data Mining). One of the key note talks was given by a technical lead in IBM (one of two thousand in the company). He gave some good examples of how Business Intelligence (BI) could be used to manage the energy needs of a new city being build out in the middle east. He also gave another example of how BI is being used in and around Galway city and coast line. There were several mentions of data mining during his talk, but I don’t think any of his examples reflected what data mining is. Yes he did give examples of how you can intelligently use your data. For example, if an object is spotted out in Galway bay then you can predict where this object will come to shore. But data mining is not the technique that is used in this case. Instead it is a rules based type system, that takes into account a number of factors, link the size of the object, the current position, currents, wind direction, etc. Using these rules (and not data mining) they can identify the landing position and let all the necessary bodies know this (like the coast guard, Galway county council, environmental control, etc).

Generally data mining can be used when you have a mature BI environment in your organisation that includes not just transactional and business reporting, but also data warehousing, data analytics, prediction systems (based on rules), etc. Data mining allows you to explore for and identify patterns in your data (and you need lots of data really). Going back to the definition of data mining a lot of the results from a data mining project may not be of any value. What you are looking for are the nuggets of gold that exists in the data and you may take some time to fine these, if they exist at all.

One of the aims of this weeks posting was to explore what data mining really is. At this point I haven’t really talked much about what it is, but what I hope you have gotten so far is that the term data mining is overly used in the IT world and can be seen as one of those trendy words that organisation like use (and use incorrectly). Data mining is used as an umbrella term that covers any processing of your data that involves a bit a processing, applying some rules and some analytics.

Over the coming weeks we will explore what Data Mining really is and what are the different stages of a Data Mining project.

The next posting will be about CRISP-DM, which is a industry neutral, product neutral data mining life cycle.


Data warehousing for free! Terabyte sized data warehouse and business intelligence without license costs

Posted: October 26th, 2009 | Author: Uli Bethke | Filed under: Business Intelligence, Data Warehouse | Tags: , , , | No Comments »

Greenplum

This is no joke. Greenplum on 19 October announced a free single node edition of its analytical database.

For those of you who haven’t heard about Greenplum, they are a provider of an MPP database software that runs on commodity hardware (unlike some its competitors such as Teradata, Netezza, or recently Oracle with Exadata). The database is based on open source database software PostgreSQL, however, is closed source itself.

Features of the database include Massively Parallel Processing, redundancy, compression, row-level or column oriented data storage, compression, partitioning, SQL standard including SQL 2003 OLAP (analytic functions etc.), MapReduce support, ODBC & JDBC support.

So what restrictions are there for the single node edition. Obviously you are only allowed to run it on a single node. Below is an extract from the Greenplum datasheet:

  • Unlimited production usage on a single commodity x86 server using up to 2 CPU sockets (and unlimited CPU cores), or in a single virtual machine using up to 8 virtual CPU cores.
  • Fully parallel SQL and MapReduce processing leverages multi-core parallel-processing engine for every query.
  • No storage capacity cap: from GBs to 10s of TBs.
  • Hybrid row and column-oriented processing.
  • Free community support as well as a low-cost, paid support option.

Of course, the full power of Greenplum’s shared nothing architecture only materialises with multiple nodes. But the company says that you can expand seamlessly from a single-node to multi-node architecture.

Documentation is installed when you install the single-node edition. Couple of thousand pages long but tiny compared to the beast you get with the Oracle database.

Use cases

I can see two immediate use cases for this:

(1) Greenplum themselves promote this offering as part of their Enterprise Data Cloud. They have a vision of self service data marts. Based on this, data analysts can go to the Enterprise Data Warehouse and via interfaces create their own data marts for in depth analysis outside the EDW. Have a look at Curt Monash’s excellent article on the future of data marts.

(2) I can see another use case for departmental solutions. You could set up your first couple of subject areas or data marts on a single node machine and if you reach limits on this single node, add more nodes to scale out. Or if you don’t reach this limit just stay on this setup forever.

So why are they giving away data warehouses for free? In another article, Curt Monash gives the following reasons:

  • Adding value to its Enterprise Data Cloud story
  • Seeding the market for future enterprise sales
  • Depriving competitors of revenue, perhaps at enterprises too small to ever be paying Greenplum customers

Microstrategy

Combine the Greenplum offering with Microstrategy’s free Reporting Suite, and you have a best of breed departmental solution for zilch.

The following restrictions apply to the Microstrategy BI tool:

- 100 named users for the frontend of the BI tool and the BI server
- Two named users for the semantic layer module
- Limited to one CPU. I presume it is limited to one CPU core, but this is not clear from the website
- Two named users for the other modules in their BI suite, e.g. OLAP reporting etc.

Have a look at their website for a full set of features and conditions.

For the right set of requirements the above is an attractive and very cost-effective combination. On top of that it is scalable. So if you grow out of it just scale out and add on.


Getting Started with Oracle Data Mining

Posted: October 20th, 2009 | Author: Brendan Tierney | Filed under: Business Intelligence | Tags: , , | No Comments »

This blog posting is my first in the area of Data Mining, in particular performing Data Mining using Oracle.  Over the coming months I will be making regular postings in the areas of data mining and associated activities.  The approach that I take to data mining is slightly different to what would be considered the main stream approach.  Last year I gave a presentation at the UKOUG annual conference in Birmingham titled “You don’t need a PhD to do Data Mining”. What I have discovered over the past 12 years from working in this area is that it is dominated by people with a background in statistics (and many of them have PhDs). But having worked on many data mining projects, most of the work that needs to be done, can be done by and is generally done by Database People.

If you are interested in Data Mining then hopefully my postings over the coming months will help you gain a better understanding of what it is about, what you can do, what kind of outputs you get and how the outputs can be used. 

The following list of books and links will give you a good start in gaining some idea of what it entails

Oracle Data Mining Concepts Book – Parts A & B

Getting Started with Oracle 11g Data Mining

Why & What is Data Mining

As I add each posting I will add some additional links and resources, so over the next few months we will have a comprehensive list of resources for Data Mining in Oracle.  We will put together a separate web page to contain all these resources.  This will complement the list of resources that Uli Bethke is putting together on ODI.

We also offer two training courses on data mining. The first one of these will be aimed at managers and novices in the area. It will give a high level overview on key data mining concepts and issues. The second course will specifically deal with data mining on Oracle using PL/SQL and the Oracle data mining tool. Have a look at our training pages for details.

Let me know if you have any suggestions on topics that you would like me to cover.


ODI Resources for Beginners: Getting Started with Oracle Data Integrator

Posted: October 19th, 2009 | Author: Uli Bethke | Filed under: Oracle Data Integrator (ODI) | Tags: , , , , , , | No Comments »

I have put together a link list of all the ODI resources I know off on the web. If you know any other relevant resources contact me and I will add it to the list below.

In my opinion the best place to get started with is the Data Warehouse Best Practices using Oracle Data Integrator Enterprise Edition guide. This explains the concepts very well and also has some detailed examples on Knowledge Modules, the Substitution API, setup, deployment etc.

ODI OTN Discussion Forums

I monitor the ODI forum on OTN at the end of each day to see if any relevant items have been discussed.

Oracle By Example: Oracle Data Integrator

The Oracle By Example (OBE) series should be one of the starting points for learning any new Oracle technology. The OBE for ODI series is a must.

Documentation

Download all relevant PDFs on Knowledge Modules, the Substituion API, Jython, ODI Tools etc. from the documentation page.

ODI homepage on the Oracle website

Blogs

Rittman/Mead

ODI Experts

John Goodwin

My own blog Uli Bethke

The official Oracle Data Integration Blog

ODI webcasts

No new webcasts for a while now, but still useful stuff.

Oracle Data Integration (ODI) Resource Kit

LinkedIn Groups

Oracle Data Integration: Warehouse Builder (OWB), Data Integrator (ODI), Data Profiling and Quality

Oracle Data Integrator (ODI)

Metalink

Metalink has very useful information on Oracle Data Integrator. To get an overview what is available, log on to Metalink and search for ODI FAQ.
This will return a list of valuable documents

Oracle Data Integrator, Change Data Capture (CDC), and (near-)realtime data integration requirements

Mark Rittman – An Introduction to Real-Time Data Integration

Gartner: ODI and CDC for Activity Monitoring

Improve Data Integration with Changed Data Capture (PDF)

Installation

Complete installation guide including Lightweight Designer, Axis 2 etc.

Knowledge Modules

Have a look at my own article on OTN for an introduction to knowledge modules

Uli Bethke – Developing a Knowledge Module in Oracle Data Integrator

Oracle Data Integrator documentation library: XML guide, LDAP guide, Java APIs Reference etc.

No link here. This is installed when you install ODI. Access via \ oracledi\doc

ODI Data Quality

Data Quality Tutorial

Misc.

Import Export Best Practices PDF: Metalink document 549318.1

ODI 10.1.3 Work Repository Documentation: Note 579751.1 on Metalink

Craig Stewart has put together a superb set of ODI video tutorials.

How to define a PostgreSQL in ODI – then reverse engineer in the Designer to access the data
Defining a PostgreSQLserver in ODI

ODI has some automatic features which will generate the Group By Statement for you when you use any of the aggregation functions, this is a short demo of how to use it
How to use Aggregation Functions in ODI

What is the Common Format Designer (CDF), and what can we do with it? Short demo of the features, generating schemas and generating interfaces automatically.
ODI’s Common Format Designer

Short demo on the Metadata Navigator of ODI
ODI’s Metadata Navigator

OdiZip is a useful tool and this short screencam illustrates its use
OdiZip How to use

Sybase ASE to IQ knowledge module
Sybase ASE to IQ KM demo

How to use custom Java classes in your ODI procedures
Using custom Java code in ODI

Using Excel in ODI, including getting round the problem with the limitation of fixed named ranges
Using Excel in ODI

How to define and use Flexfields to extend ODI’s metadata
Using FlexFields in ODI

Defining XML file in Topology and reverse engineering
XML 1 Defining

What does the XML structure look like in ODI?
XML 2 The Rendered Data Structure

When the definition doesn’t work, how do you find out what has gone wrong?
XML 3 Topology

How to use ODI to populate a simple XML structure
XML 4 Populating Simple XML

Populating a more complex XML structuire – requiring the use of multiple interfaces. Writes to a file with a dynamic name – in this case the session_id.xml
XML 5 Populating Complex XML

How to configure external database storage for the JDBC Driver for XML- useful when dealing with large XML files
XML 6 External Database Storage

How to set up the file name for the XML file so that it accommodates the use of multiple different file names
XML 7 Reading a Dynamically named XML file

In order to master scripting in ODI I recommend the following books.

Java BeanShell

Scripting in Java: Languages, Frameworks, and Patterns

Jython

The Definitive Guide to Jython: Python for the Java Platform.

Jython Essentials (O’Reilly Scripting)


Book Review – Oracle Warehouse Builder 11G Getting Started – OWB for beginners

Posted: October 17th, 2009 | Author: Uli Bethke | Filed under: Oracle Warehouse Builder | Tags: , , , , , | No Comments »

Bob Griesemer’s Oracle Warehouse Builder 11G – Getting Started is the first and so far (as of Oct 2009) only book published on Oracle Warehouse Builder.

It aims to introduce data warehousing in general and Oracle Warehouse Builder in particular, to absolute beginners in the field.

In chapter 1 the author guides us through the installation process for OWB 11G. A typical installation of Warehouse Builder in a client/server environment is explained. More complex architectures with the Control Center service on a different server are not explained.

In chapter 2 we are introduced to the data model of the source system for the data warehouse. We are shown how to import or create the metadata for our source system. What I found useful here is that the author explains how we can use a non-Oracle RDBMS (in this case SQL Server) as a source system via heterogenous services. This can be a pain to set up so it is helpful to have a step by step walkthrough for this.

Chapter 3 gives an introduction to the basics of data warehouse design. It then explains how we can implement such a design as a target structure in Oracle Warehouse Builder.

In chapter 5 the author gives an overview on the basics of ETL processes and introduces us to some of the more important OWB operators that will allow us to implement ETL process in Warehouse Builder. I found that more space should have been given to explain the operators in detail. Some more detailed examples for each would have been useful also.

Chapters 6 and 7 then show us how we can bring the source metadata from chapter 2, the target metadata from chapter 3, and the operators from the previous chapter together in a Warehouse Builder mapping to extract, transform, and load data from source to target. What is completely missing here though is an introduction to process flows. Process flows are fundamental to glue ETL mappings in Warehouse Builder together and should have been part of an introduction to the subject.

Chapter 8 gives an overview on the deployment and execution of objects via the OWB Control Center. It also includes a good troubleshooting section with regards to deployment. As expected from a beginner’s book, more advanced topics such as deployment via OMB+ and TCL scripts is not explained. What is also missing is an explanation on how to schedule objects once they have been deployed.

In chapter 9 the author introduces us to various OWB features. The book gives a good overview on version management via snapshots and export/import of metadata.

The book does pretty much what it says on the tin and gives a good introduction to novices in the area of data warehousing and Oracle Warehouse Builder. So if you have never used Oracle Warehouse Builder this book is for you. If you have used OWB before you will not learn anything new here. Also this book can only be a starting point for your OWB career. A lot more OWB features than are outlined in this book need to be learned to become a master in the area (if you think about it the OWB user manual in PDF format has about 1000 pages and in some areas only scratches at the surface). Also the timing of the publication of the book is a bit unfortunate as only recently OWB 11GR2 was released with a lot of important new features and a redesigned User Interface. Hopefully there will be a 2nd edition soon that addresses this shortcoming.


Now offering data mining training courses and a dimensional modelling training module

Posted: October 14th, 2009 | Author: Brendan Tierney | Filed under: Oracle, Training | Tags: , , , , | No Comments »

I am proud to announce a couple of new training courses on data mining, master data management, and dimensional modelling at competitive rates.

The data modelling training will cover the core concepts of Data Warehousing and Dimensional Modelling. We will also offer two training courses on data mining. The first one of these will be aimed at managers and novices in the area. It will give a high level overview on key data mining concepts and issues. The second course will specifically deal with data mining on Oracle using PL/SQL and the Oracle data mining tool. We have also put together a Master Data Management training course. Have a look at our training pages for details.

The courses can be delivered either on-site at our client’s premises or off-site in a classroom environment at a location of choice.

Contact us about details and competitive rates.

All courses will be delivered by Brendan Tierney . Brendan has over 16 years experience working in data design and architecture, with the last 13 years working in Business Intelligence, Data Warehouses, training, and lecturing. He was the first consultant at fraud detection software company Norkom (link to website) where he was heavily involved in projects in Ireland, Belgium and USA. Before Norkom he worked as a consultant with Deloitte Management Consultants and Oracle.


ODI: Automating deployment of scenarios to production in Oracle Data Integrator

Posted: October 12th, 2009 | Author: Uli Bethke | Filed under: Best Practice, Oracle Data Integrator (ODI) | Tags: , , , , | No Comments »

In this post I will show you how you can automatically deploy scenarios in ODI.

It is rather cumbersome to manually deploy scenarios from a test/development to a production environment.

Typically it involves the following steps:

- Manually (re-)generate all scenarios that need to be deployed and any child scenarios referenced.
- Manually export the (re-)generated scenarios
- Log in to the Operator module for the production environment and manually import the scenarios.

You do the above once or twice manually before getting extremely fed up. Actually I was rather patient (unlike my normal self) and did this about ten times before I got very, very annoyed.

In this post I will show you how you can automate the deployment process. The proposed solution will allow you to logically group scenarios together via marker groups for automatic deployment, thus giving you more flexibility and allowing you to just deploy a subset of scenarios in your project.

To achieve our goal we will make use of the following Oracle Data Integrator features:

- Marker groups
- ODIGenerateAllScen tool
- ODIExportScen tool
- OdiImportScen tool
- Meta-information in the ODI work repository
- Execution of scenarios from a Windows batch file

In a first step we create a new marker group. We will use the marker group to logically group together scenarios we want to deploy. We will subsequently use the marker group in the ODIGenerateAllScen, ODIExportScen, and OdiImportScen tools.

Log on to Designer > Expand Markers > Right click Markers > Select Insert Marker Group

odi_marker_group

As you can see from the figure above I have named the marker group Scenario and added three markers. One of the markers is named XLS (short for scenarios that load data from Excel sheets). Flagging these scenarios (or rather their packages) via a marker will allow us to deploy them separately.

Next we will add the XLS marker to those packages that we wish to logically group together for deployment. In our particular case, all of the Excel related packages.

Right click package > Add Marker > Scenario XLS

odi_marker_xls

In the next step we will create a package that will (re-)generate all packages marked with XLS.

Create a new package, name it GENERATE_SCENARIOS_XLS, add the ODIGenerateAllScen tool to it, and use the following parameters for the tool:

Project: The name of your project
Mode: Replace. This will overwrite the latest version of your scenario.
Marker Group: Scenario
Marker: XLS

Leave the default values for the other parameters.
gen_scen_pack

In a next step we need to export the (re-)generated scenarios to XML. Unfortunately, the OdiExportScen tool does not allow us to make use of marker groups to logically group together scenarios for export. To achieve our goal we need to make use of a workaround.

As ODI is a meta-driven ELT tool we can retrieve information about the marker groups from the ODI work repository.

The query below will exactly do this. It returns alls packages that are marked as XLS.

SELECT
   scen_name AS pack_name
FROM (
   SELECT
     LAST_VALUE(d.scen_name) OVER
 (PARTITION BY c.pack_name ORDER BY scen_version) AS scen_name,
     MAX(scen_version) OVER
 (PARTITION BY c.pack_name ) max_scen_version,
     scen_version,
     c.pack_name
   FROM
      snp_obj_state a
      join snp_state2 b on (a.i_state = b.i_state)
      JOIN snp_package c ON (a.i_instance = c.i_package)
      JOIN snp_scen d ON (c.i_package = d.i_package)
   WHERE
      state_code = 'XLS'
)
WHERE
  scen_version = max_scen_version

We will employ this query as an implicit cursor in an ODI procedure at the Command on Source. You will first need to create a data server to the ODI work repository in Topology Manager for this to work. As per figure below, set the Technology to the technology of your work repository (in my case Oracle) and set the schema to the logical schema of your work repository (in my case ORCL_ODIWORK_SRC).

odi_export_scenario

We will then use the resultset together with the OdiExportScen tool to create XMLs for our scenario and write them to disk.

OdiExportScen "-SCEN_NAME=#pack_name" "-SCEN_VERSION=-1"
"-FILE_NAME=D:\ODI\SCEN_#pack_name Version 001.xml"
"-FORCE_OVERWRITE=YES" "-RECURSIVE_EXPORT=YES"
 "-XML_VERSION=1.0" "-XML_CHARSET=ISO-8859-1"
 "-JAVA_CHARSET=ISO8859_1"

odi_export_scenario_target

Important parameters here are

SCEN_NAME: #pack_name. This is the bind variable from our Command on Source.
SCEN_VERSION: -1. This means that we will export the scenario that was generated last.
FILE_NAME: This is the path on your file system where the XMLs will be generated.

Make sure that you have set the Technology to Sunopsis API.

In a next step add the ODI procedure we just created to our package.

Finally, manually create a scenario for this procedure via ODI Designer.

Now we are in a position to import the exported scenarios from their XML files.

Once again we will use a procedure to achieve this

For command on source use the following query:

SELECT
   scen_name AS pack_name
FROM (
   SELECT
     LAST_VALUE(d.scen_name) OVER
 (PARTITION BY c.pack_name ORDER BY scen_version) AS scen_name,
     MAX(scen_version) OVER
 (PARTITION BY c.pack_name ) max_scen_version,
     scen_version,
     c.pack_name
   FROM
      snp_obj_state a
      join snp_state2 b on (a.i_state = b.i_state)
      JOIN snp_package c ON (a.i_instance = c.i_package)
      JOIN snp_scen d ON (c.i_package = d.i_package)
   WHERE
      state_code = 'XLS'
)
WHERE
  scen_version = max_scen_version

odi_import_scen_source

For Command on Target use the following command

OdiImportScen "-FILE_NAME=D:\ODI\SCEN_#pack_name Version 001.xml"
"-IMPORT_MODE=SYNONYM_INSERT_UPDATE"

Name the above procedure IMPORT_SCEN_XLS and generate a scenario for it.

We now have all the components that we need for automated scenario deployment. We will just have to glue them together. We will do this via a batch file. In my particular case this will be a Windows batch. Of course, you can achieve the same in a Linux etc. environment.

ODI allows you to execute scenarios via the startscen.bat. You can find this batch file in the oracledi\bin folder in your ODI home. Three parameters are mandatory for executing this batch:

%1: The name of the scenario. In our case these are the GENERATE_SCENARIOS_XLS and the IMPORT_SCEN_XLS scenarios.
%2: The version number of the scenario. In our case -1, as we want to export the last version of the marked scenarios.
%3: The execution context. In our case we deploy the scenarios from UAT to PRD. So for the export of our scenarios we will use the UAT context, as this is where we (re-)generate and export the marked scenarios. For the import of the marked scenarios we will use the PRD context as we want to import the exported XMLs into the production environment.

You will need to either have agents installed on the same server (one for each environment). I explain how you can install multiple agents on one server in a separate post. Alternatively, you should be able to use a shared folder that both agents can access (I haven’t tried this out).

@echo off
cls
d:
echo generate xls scenarios

cd D:\app\oracle\product\odi_home\oracledi\bin\
call startscen.bat GENERATE_SCENARIOS_XLS -1 UAT

echo import xls scenarios

cd D:\app\oracle\product\odi_home\oracledi\agent_prd\
call startscen.bat IMPORT_SCEN_XLS -1 PRD

I would like to hear from you how you deploy your scenarios.

In order to master scripting in ODI I recommend the following books.

Java BeanShell

Scripting in Java: Languages, Frameworks, and Patterns

Jython

The Definitive Guide to Jython: Python for the Java Platform.

Jython Essentials (O’Reilly Scripting)


ODI Snippets: Oracle Data Integrator, Excel data types, and limitations of the MS Excel ODBC driver.

Posted: October 10th, 2009 | Author: Uli Bethke | Filed under: Oracle Data Integrator (ODI) | Tags: , , , , , | 1 Comment »

I have come across two serious issues when loading data from Excel into Oracle Data Integrator. These are not related to ODI, but are shortcomings of the Microsoft Excel ODBC driver. Both issues are explained on Metalink:

(1) Excel Returns NULL Values When Numeric And Alphanumeric Datatypes Are Intermixed Within The Same Column (424419.1)

Extract from Metalink

“Having defined a Datastore based on a Microsoft Excel spreadsheet. One of the columns in the Datastore contains both numeric and NULL values.
In the Datastore definition, the column datatype has been set to NUMERIC, which is correct.
However, when consulting the Datastore content from Oracle Data Integrator Designer, all numeric values are returned as Null, while the underlying value is not a Null (for example 11,325).
If changing the column datatype in the Datastore definition to VARCHAR, all alphanumeric values will be now returned as Null, while the underlying value is not a Null (for example, AZERTY).”

(2) Issues loading numeric data types (424471.1)

Extract from Metalink

“After successfully setting up an ODI Source Datastore on an Excel file named zone, it appears that only a subset of a numeric column is loaded to the Target Datastore. When consulting the original Excel file, certain cells of the numeric field are indicated with a small triangle in the upper left corner.”

For the above issues I propose to use Excel macros that are triggerd when the user saves the Excel sheet as an automated solution.

For issue 1 we use a macro that prefixes the values in the alphanumeric columns with an apostrophe

Sub apost()
  For r = 2 To 15556
    Cells(r, 11) = "'" & Cells(r, 11)
  Next r
End Sub

The above function prefixes cell 11 in your spreadsheet with an apostrophe for rows 1 to 15556

For issue 2 we use a macro that sets the formatting of the cell to General and subsequently refreshes it.

Sub Format_to_General()
  For Each c In Range("J2:J1614")
    c.NumberFormat = "General"
    v = c.Value
    c.Value = v
  Next c
End Sub

The above function sets the format of cell J to General for rows 2 to 1614.

To insert the above functions into your spreadsheet do the following:

In MS Excel Tools > Macro > Visual Basic Editor

In Visual Basic Editor > Insert > Module > Copy & Paste the following code:

Sub Format_Changes()
  apost
  Format_to_General
End Sub

Sub apost()
  For r = 2 To 15556
    Cells(r, 11) = "'" & Cells(r, 11)
  Next r
End Sub

Sub Format_to_General()
  For Each c In Range("J2:J1614")
    c.NumberFormat = "General"
    v = c.Value
    c.Value = v
  Next c
End Sub

excel_macros

Save the changes and close the editor.

To execute the Macro go to Tools > Macro > Format_Changes > Run

In order to master scripting in ODI I recommend the following books.

Java BeanShell

Scripting in Java: Languages, Frameworks, and Patterns

Jython

The Definitive Guide to Jython: Python for the Java Platform.

Jython Essentials (O’Reilly Scripting)