Posted: June 18th, 2009 | Author: admin | Filed under: Business Intelligence | Tags: Amazon EC2, aster, ebay, facebook, google, greenplum, mapreduce | 2 Comments »
If your job involves processing massive amounts of data you should familiarize yourself with Greenplum, MapReduce, and Hadoop.
With 6.5 Petabytes of data eBay runs the world’s largest data warehouse on Greenplum. Facebook runs a 2 PB warehouse on Hadoop. Impressive.
Both Greenplum and Hadoop make use of the MapReduce framework pioneered by Google.
You can run Hadoop on Amazon Elastic MapReduce to play around with the technology.
There have also been two Hadoop books published recently. I have ordered both of them and can’t wait to hold them in my hands.
Hadoop: The Definitive Guide
Pro Hadoop
No books on Greenplum, but they have some good whitepapers on their website.
Posted: April 29th, 2009 | Author: admin | Filed under: Oracle Data Integrator (ODI) | Tags: execute in parallel, interface, odi, oracle data integrator, package, scenario | No Comments »
I have been working on Oracle Data Integrator (ODI) over the last three weeks. It is an extremely flexible and innovative tool. Claims by Oracle that it is up to ten times faster to develop ETL routines in ODI as compared to traditional ETL tools are actually no exaggeration.
In this article I will show you how to execute Procedures and Interfaces in parallel in a Package. I spent about half a day in the forums etc. before I actually figured out how to set this up. The way you execute Mappings or Transformations in OWB sent me in the wrong direction. While in OWB you use a Fork and AND Operator to accomplish this, in ODI you create a Scenario from your Interfaces and/or Procedures, and connect these inside a Package. Then you just set the Snychronous/Asynchronous parameter in those Scenarios that you want to execute in parallel to Asynchronous Mode.
We’ll now go through each of the steps. We will first create three tables in the SH sample schema. We will then create three Procedures that will load these tables. Finally we will create a Package from the three Procedures that will load two of them in parallel.
Note: Instead of the Procedures we could have used Interfaces. It is the same principle. However, it is easier and quicker to set up and demonstrate the parallel load with Procedures.
Let’s create the three tables first. Log in to the SH schema via SQL+ and execute the following SQL.
SQL> create table a (a number);
Table created.
SQL> create table b (b number);
Table created.
SQL> create table c (c number);
Table created.
Next we create the Procedures.
Note: I have already created all of the Topologies, Models, Contexts etc. that are pre-requisites for creating the Procedures. Have a look at John Goodwin’s excellent series of articles on ODI for some tutorials.
Log on to Designer. Expand your Project and Folder, right click Procedures and select Insert Procedure.
Populate fields at the Definition tab as per screenshot below.
Name: insert a
Target Technology: Oracle

Change to the Details tab and click on the Add button
Populate the fields as per screenshot below
Name: insert a
Log Counter: Insert
Schema: ORCL_SH. This could be different in your environment. Depending on the name given to the schema in the Topology Manager.
Command: insert into a values (7).

In this Procedure we populate the table a in the SH schema with the value 7.
Next we create the Procedure that populates table b. Go through the same steps as for table a. As the Command type in: insert into b select prod_id from sales.
Next we create the Procedure that populates table c. Go through the same steps as for table a and b. Populate the Command with: insert into c values (8).
Next we need to create a Scenario for each of our Procedures.
Right click the insert a Procedure and from the menu select Generate Scenario…
Leave the default values and click OK.

Repeat the same steps for the other two Procedures.
Next we create the Package. Right click Packages and select Insert Package from the menu.
Populate the Name field as per screenshot

Change to the Diagram tab and drag the following items onto the diagram.
Scenario_A Version 001
Scenario_B Version 001
Scenario_C Version 001
ODIWaitForChildSession tool from the Event Detection toolbox
Next right click the Execution of the Scenario Insert_A item and select First Step from the menu
Next connect the items as per screenshot below

We want to execute Scenario Insert_B and Insert_A in parallel, so we need to set the Synchronous/Asynchronous parameter to Asynchronous Mode for these Scenarios.
Click on the Execution of the Scenario Insert_B. In the Properties area set the Synchronous/Asynchronous parameter to Asynchronous Mode

Do the same for Execution of the Scenario Insert_C.
That’s it. You can now execute the Package and then verify in the Operator that Scenario Insert_B and Insert_C actually executed in parallel.
Posted: April 22nd, 2009 | Author: admin | Filed under: Oracle | 1 Comment »
At the OUG Ireland 2009 we launched the OUG Ireland Business Intelligence Special Interest Group. All OUG Ireland members are welcome to join. Tony Cassidy from Bearing Point and me will co-chair the SIG.
We will hold our first meeting in June or July 09. I will give a presentation on Oracle Data Integrator and how it favourably compares to Oracle Warehouse Builder. This will be followed by a hands on tutorial on how to develop Knowledge Modules in ODI Designer. We will also try and organise a session on a more business orientated topic.
Posted: April 5th, 2009 | Author: admin | Filed under: OBIEE | Tags: OBIEE, query hints | 2 Comments »
I have recently come across a post on the OTN forums on how to use hints in OBIEE. OBIEE lets you specify hints in the physical layer on a table by table basis. The problem of the user in the forum was how it is possible to hint the alias of a table in the SQL generated by Answers. This is relatively straightforward and I will show you how this can be done in a minute. However, the bigger issue around usage of hints in OBIEE is that it is extremely inflexible. In Business Objects you can use a workaround to use hints on a query by query basis. This same workaround does not work in OBIEE and use of variables did not work either. But let’s first have a look at how we can use hints with aliased tables before we have a look at why OBIEE is very inflexible when it comes to query hints.
We have the following scenario: We want to use dynamic sampling on a table either because we have stale statistics or we want to filter multiple columsn on the table. If you want to find out more about dynamic sampling have a look at Tom Kyte’s great article on dynamic sampling in Oracle magazine.
In order to find the alias that OBIEE will use for the physical table in BI Administrator go to Tools > Query Repository. For name type in the name of your table and Physical Table as type (as per screenshot below)

This will return the internal ID of this table. In the screenshot this is 3001:210. We are interested in the part after the colon. This piece is the alias that OBIEE will give this table in Answers. It will prefix it with a T.
So if we want to add a dynamic sampling hint to the products table we need to add the following to the product table’s hint field : dynamic_sampling(T210,3)

This will then generate SQL as follows:

As you can see, our hint was applied to the query.
However, the whole thing is terribly inflexible. It will add this hint to each query that you generate via answers that involves the products table.
I have tried various workarounds to make this more flexible, unfortunately without any succcess. To get this to work in Business Objects you create the equivalent of a logical column and populate it with /*+ dynamic_sampling(T210,3) */ ”. This will throw a parse error but it will still allow you to add it as a column to the equivalent of the presentation layer. When you create an ad hoc query you have to add this column as the first field to your ad hoc report. This will then generate SQL similar to this:
SELECT
/*+ dynamic_sampling(T210,3) */ '' as col1,
'whatever' as col2
FROM
products T210
When you try to do the same in OBIEE it will throw a parse error and won’t allow you to proceed. I have also tried to create the hint as a variable and then reference this in the hint field of the physical table. Again without success.

When Answers generated the SQL it just took the literal value of VALUEOF(hinter) without evaluating the hinter variable and put the phrase in as the hint for the query.
If anyone has a neat solution for more flexible hinting in OBIEE please let me know.
You may also be interested in the OBIEE book Oracle BI Enterprise Edition Dashboard & Report Best Practices. This is currently the only published OBIEE book on the market.
Posted: January 20th, 2009 | Author: admin | Filed under: Oracle | Tags: compound key, query performance, snowflake schema, star transformation | No Comments »
This article builds on Mark Rittman’s recent post on explain plans for star transformation.
Every now and again I have come across claims that the CBO only uses star transformation with single part foreign keys on the fact table:
“In order for a RDBMS query optimizer to execute a query using a Star Transformation, a single part foreign key with a bitmap index is required.”
Generally, no explanation or proof is given for this claim.
Inspired by Mark’s excellent post I wanted to get to the bottom of this. And while I am at this I’ll also have a look at star transformation in snowflaked dimensional models. We will be using Oracle 11.1.0.7 on Windows XP.
Let’s start by setting up our snowflaked star schema. We will be using the SH sample schema as a basis for this.
SQL> create table sales_star
2 as
3 select * from sh.sales;
Table created.
SQL> create table customers_star
2 as
3 select * from sh.customers;
Table created.
SQL> create table products_star
2 as
3 select * from sh.products;
Table created.
SQL> create table countries_star
2 as
3 select * from sh.countries;
Table created.
SQL> alter table customers_star add constraint cust_star_pk primary key (cust_id);
Table altered.
SQL> alter table products_star add constraint prod_star_pk primary key (prod_id);
Table altered.
SQL> alter table countries_star add constraint countries_star_pk primary key (country_id);
Table altered.
SQL> create bitmap index sales_star_cust_bix on sales_star(cust_id);
Index created.
SQL> create bitmap index sales_star_prod_bix on sales_star(prod_id);
Index created.
SQL> alter table countries_star add constraint countries_star_pk primary key (country_id);
Table altered.
SQL> create bitmap index customers_star_gender_bix on customers_star(cust_gender);
Index created.
SQL> create bitmap index customers_star_city_bix on customers_star(cust_city);
Index created.
SQL> create bitmap index products_star_subcategory_bix on products_star(prod_subcategory_desc);
Index created.
SQL> create bitmap index customers_star_country_bix on customers_star(country_id);
Index created.
SQL> analyze table sales_star compute statistics for table for all indexes for all indexed columns;
Table analyzed.
SQL> analyze table customers_star compute statistics for table for all indexes for all indexed columns;
Table analyzed.
SQL> analyze table products_star compute statistics for table for all indexes for all indexed columns;
Table analyzed.
SQL> analyze table countries_star compute statistics for table for all indexes for all indexed columns;
Table analyzed.
This will give us the following (very simple) snowflaked model:

So let’s actually run a query against our snowflake
SQL> select
2 sum(quantity_sold),
3 p.prod_subcategory_desc,
4 c.cust_gender
5 from
6 sales_star s
7 join products_star p ON (s.prod_id = p.prod_id)
8 join customers_star c ON (s.cust_id = c.cust_id)
9 join countries_star d ON (c.country_id = d.country_id)
10 where
11 p.prod_subcategory_desc = 'Memory' and
12 c.cust_city = 'Oxford' and
13 c.cust_gender = 'F'
14 group by
15 p.prod_subcategory_desc, c.cust_gender;
Execution Plan
----------------------------------------------------------
Plan hash value: 1638875787
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 58 | 28 (4)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6609_53663F | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID | CUSTOMERS_STAR | 29 | 783 | 9 (0)| 00:00:01 |
| 4 | BITMAP CONVERSION TO ROWIDS | | | | | |
| 5 | BITMAP AND | | | | | |
|* 6 | BITMAP INDEX SINGLE VALUE | CUSTOMERS_STAR_CITY_BIX | | | | |
|* 7 | BITMAP INDEX SINGLE VALUE | CUSTOMERS_STAR_GENDER_BIX | | | | |
| 8 | HASH GROUP BY | | 1 | 58 | 19 (6)| 00:00:01 |
|* 9 | HASH JOIN | | 1 | 58 | 19 (6)| 00:00:01 |
|* 10 | HASH JOIN | | 1 | 54 | 18 (6)| 00:00:01 |
|* 11 | HASH JOIN | | 1 | 36 | 15 (0)| 00:00:01 |
| 12 | TABLE ACCESS BY INDEX ROWID | PRODUCTS_STAR | 2 | 32 | 2 (0)| 00:00:01 |
| 13 | BITMAP CONVERSION TO ROWIDS | | | | | |
|* 14 | BITMAP INDEX SINGLE VALUE | PRODUCTS_STAR_SUBCATEGORY_BIX | | | | |
| 15 | TABLE ACCESS BY INDEX ROWID | SALES_STAR | 13 | 260 | 13 (0)| 00:00:01 |
| 16 | BITMAP CONVERSION TO ROWIDS | | | | | |
| 17 | BITMAP AND | | | | | |
| 18 | BITMAP MERGE | | | | | |
| 19 | BITMAP KEY ITERATION | | | | | |
| 20 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6609_53663F | 1 | 13 | 2 (0)| 00:00:01 |
|* 21 | BITMAP INDEX RANGE SCAN | SALES_STAR_CUST_BIX | | | | |
| 22 | BITMAP MERGE | | | | | |
| 23 | BITMAP KEY ITERATION | | | | | |
| 24 | TABLE ACCESS BY INDEX ROWID | PRODUCTS_STAR | 2 | 32 | 2 (0)| 00:00:01 |
| 25 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 26 | BITMAP INDEX SINGLE VALUE | PRODUCTS_STAR_SUBCATEGORY_BIX | | | | |
|* 27 | BITMAP INDEX RANGE SCAN | SALES_STAR_PROD_BIX | | | | |
| 28 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6609_53663F | 29 | 522 | 2 (0)| 00:00:01 |
| 29 | INDEX FULL SCAN | COUNTRIES_STAR_PK | 23 | 92 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------------
Note
-----
- star transformation used for this statement
So what conclusions can we draw from the above explain plan. Well, first of all we see that Oracle has used star transformation for this query. This demonstrates that star transformation is used by the CBO in a snowflaked dimensional model. The next question then is, how exactly did this happen
As a first step, on lines 2-7 (Id 2-7), Oracle loads a global temporary table (GTT). It expects to load 29 rows into this table from the Bitmap ANDed predicates on the customer table. It uses that GTT in the star transformation itself (lines 11-27). So rather than joining directly to the customer dimension Oracle uses the GTT as part of the star transformation. On lines 10 and 28 our GTT is hash joined to the results of the star transformation. On lines 9 and 29 our snowflaked countries_star dimension is joined to our result set and this is then finally aggregated in line 10 and returned in line 0. Interestingly, the customers_star dimension does not directly take part in a join at all.
Let’s move on to the next item in our list: Does the CBO use star transformation when it finds a compound key in both fact and dimension table?
In order to demonstrate this we will first create a compound key in our products_star dimension and also set this up as a foreign key in the sales_star fact table. We will use the prod_name in products_star as the second item in the compound key. We will also create the prod_name column in the sales_star table.
Let’s first drop the products_star.prod_id primary key:
SQL> alter table products_star drop constraint prod_star_pk;
Table altered.
Now, we create the compound primary key
SQL> alter table products_star add constraint prod_star_pk primary key (prod_id,prod_name);
Table altered.
Next we add the prod_name column to the sales_star fact table and populate this column with the prod_name from the products_star.prod_name column:
SQL> ALTER TABLE sales_star ADD prod_name VARCHAR2(50);
Table altered.
SQL> MERGE INTO sales_star a USING (
2 SELECT
3 prod_id,
4 prod_name
5 FROM
6 products_star
7 ) b ON (a.prod_id = b.prod_id)
8 WHEN MATCHED THEN UPDATE SET
9 a.prod_name = b.prod_name;
904924 rows merged.
SQL> COMMIT;
Commit complete.
Next we drop the Bitmap index on sales_star.prod_id and recreate it as a compund Bitmap index
SQL> drop index sales_star_prod_bix;
Index dropped.
SQL> create bitmap index sales_star_prod_bix on sales_star(prod_id,prod_name);
Index created.
We gather stats on the two tables
SQL> exec dbms_stats.gather_table_stats ( ownname => USER, tabname => 'products_star', degree => DBMS_STATS.AUTO_DEGREE, estimate_percent => dbms_stats.AUTO_SAMPLE_SIZE, cascade => TRUE ) ;
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats ( ownname => USER, tabname => 'sales_star', degree => DBMS_STATS.AUTO_DEGREE, estimate_percent => dbms_stats.AUTO_SAMPLE_SIZE, cascade => TRUE ) ;
PL/SQL procedure successfully completed.
And rerun our query.
SQL> select
2 sum(quantity_sold),
3 p.prod_subcategory_desc,
4 c.cust_gender
5 from
6 sales_star s
7 join products_star p ON (s.prod_id = p.prod_id)
8 join customers_star c ON (s.cust_id = c.cust_id)
9 join countries_star d ON (c.country_id = d.country_id)
10 where
11 p.prod_subcategory_desc = 'Memory' and
12 c.cust_city = 'Oxford' and
13 c.cust_gender = 'F'
14 group by
15 p.prod_subcategory_desc, c.cust_gender;
Execution Plan
----------------------------------------------------------
Plan hash value: 252248325
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 43 | 37 (3)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D660E_54B1A1 | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID | CUSTOMERS_STAR | 45 | 810 | 12 (0)| 00:00:01 |
| 4 | BITMAP CONVERSION TO ROWIDS | | | | | |
| 5 | BITMAP AND | | | | | |
|* 6 | BITMAP INDEX SINGLE VALUE | CUSTOMERS_STAR_CITY_BIX | | | | |
|* 7 | BITMAP INDEX SINGLE VALUE | CUSTOMERS_STAR_GENDER_BIX | | | | |
| 8 | HASH GROUP BY | | 1 | 43 | 25 (8)| 00:00:01 |
|* 9 | HASH JOIN | | 1 | 43 | 25 (8)| 00:00:01 |
|* 10 | HASH JOIN | | 1 | 39 | 23 (5)| 00:00:01 |
|* 11 | HASH JOIN | | 1 | 30 | 21 (5)| 00:00:01 |
| 12 | TABLE ACCESS BY INDEX ROWID | PRODUCTS_STAR | 2 | 36 | 2 (0)| 00:00:01 |
| 13 | BITMAP CONVERSION TO ROWIDS | | | | | |
|* 14 | BITMAP INDEX SINGLE VALUE | PRODUCTS_STAR_SUBCATEGORY_BIX | | | | |
| 15 | TABLE ACCESS BY INDEX ROWID | SALES_STAR | 20 | 240 | 19 (0)| 00:00:01 |
| 16 | BITMAP CONVERSION TO ROWIDS | | | | | |
| 17 | BITMAP AND | | | | | |
| 18 | BITMAP MERGE | | | | | |
| 19 | BITMAP KEY ITERATION | | | | | |
| 20 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660E_54B1A1 | 1 | 13 | 2 (0)| 00:00:01 |
|* 21 | BITMAP INDEX RANGE SCAN | SALES_STAR_CUST_BIX | | | | |
| 22 | BITMAP MERGE | | | | | |
| 23 | BITMAP KEY ITERATION | | | | | |
| 24 | TABLE ACCESS BY INDEX ROWID | PRODUCTS_STAR | 2 | 36 | 2 (0)| 00:00:01 |
| 25 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 26 | BITMAP INDEX SINGLE VALUE | PRODUCTS_STAR_SUBCATEGORY_BIX | | | | |
|* 27 | BITMAP INDEX RANGE SCAN | SALES_STAR_PROD_BIX | | | | |
| 28 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660E_54B1A1 | 45 | 405 | 2 (0)| 00:00:01 |
| 29 | INDEX FULL SCAN | COUNTRIES_STAR_PK | 23 | 92 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("C"."CUST_CITY"='Oxford')
7 - access("C"."CUST_GENDER"='F')
9 - access("C1"="D"."COUNTRY_ID")
10 - access("S"."CUST_ID"="C0")
11 - access("S"."PROD_ID"="P"."PROD_ID")
14 - access("P"."PROD_SUBCATEGORY_DESC"='Memory')
21 - access("S"."CUST_ID"="C0")
26 - access("P"."PROD_SUBCATEGORY_DESC"='Memory')
27 - access("S"."PROD_ID"="P"."PROD_ID")
Note
-----
- star transformation used for this statement
Statistics
----------------------------------------------------------
2 recursive calls
8 db block gets
211 consistent gets
1 physical reads
600 redo size
562 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Et voilá, q.e.d.: star transformation used in dimensional model with compound key.
So, what does this mean now? First of all it means that the CBO can use star transformation with compound keys. Claims to the contrary are simply false. This also means that surrogate keys are not a pre-requisite for star transformation to be used in a dimensional model. So another reason to get rid of them (in most situations).