Comparing Exadata and Netezza TwinFin

Posted: August 13th, 2010 | Author: Uli Bethke | Filed under: Data Warehouse, Oracle | Tags: | No Comments »

Comparison between Exadata and Netezza Twin Fin. Ok, it comes from Netezza and as such is biased, but still an interesting read.

It is worthwhile to remember though that Exadata is designed for mixed workloads (OLTP and Analytics), which is a key differentiator to any of the other DW appliance vendors.

Interesting posts by Curt Monash on this

http://www.dbms2.com/2009/09/29/integration-oltp-data-warehousing-exadata-2/
http://www.dbms2.com/2010/01/22/oracle-database-hardware-strategy/

Where is the response from Oracle?


BigQuery: Data Warehousing with Google?

Posted: May 23rd, 2010 | Author: Uli Bethke | Filed under: Data Warehouse, google | Tags: , , | No Comments »

Google has added two new products to their Labs. The first one is BiqQuery, which according to Google allows users to query trillions of records in an SQL dialect via a RESTful web service. If they get their pricing right on this then I can see Google becoming a top player in the data warehousing as a service space. This one could be quite interesting as unlike Hadoop or the other players in the nosql space do not support SQL. Only problem is that there are currently no query tools that will support BigQuery.

The other product they have added is a Prediction API. This one is a machine learning algorithm implemented via a RESTful web service.


10 Reasons you really need predictive analytics

Posted: February 4th, 2010 | Author: Brendan Tierney | Filed under: Best Practice, Business Intelligence, Data Warehouse, analytic functions, data mining | Tags: , , | No Comments »

SPSS have recently posted and article called “10 Reasons you really need predictive analytics“. I thought it would be interesting to post the main points from this article to illustrate that not all predictive analytic projects involve Data Mining, but involve a number of different techniques and looking the the business data in a different way. Yes data mining can be a very important element in some of the following

1. Get a higher return on your data investment
Your organization has a significant investment in data – data that contains critical information about every aspect of your business. Today more than ever, you need to get the best return on the data you have collected–and predictive analytics is the most effective way to do this. Predictive analytics combines information on what has happened in the past, what is happening now, and what’s likely to happen in the future to give you a complete picture of your business.

2. Find hidden meaning in your data
Predictive analytics helps you maximize the understanding gained from your data. It enables you to uncover hidden patterns, trends, and relationships and transform this information into action.

3. Look forward, not backward
Unlike reporting and business intelligence solutions that are only valuable for understanding past and current conditions, predictive analytics helps organizations look forward. By leveraging sophisticated statistical and modeling techniques, you can use the data you already have to help you anticipate future events and be proactive, rather than reactive.

4. Deliver intelligence in real time
Your business is dynamic. With predictive analytics, you can automatically deploy analytical results to both individuals and operational systems as changes occur, helping to guide customer interactions and strategic nd tactical decision making.

5. See your assumptions in action
Advanced analytical methods give you the tools to develop hypotheses about your organization’s toughest challenges and test them by creating predictive models. You can then choose the scenario that is likely to result in the best outcome for your organization.

6. Empower data-driven decision making
Better processes help people throughout your organization make better decisions every day. Predictive analytics enables your organization to automate the flow of information to match your business practices and deliver the insights gained through this technology to people who can apply them in their daily work.

7. Build customer intimacy
When you know each of your customers or constituents intimately—including what they think, say, and do—you can build stronger relationships with them. Predictive analytics gives you a complete view of your customers, and enables you to capture and maximize the value of each and every interaction.

8. Mitigate risk and fraud
Predictive analytics helps you evaluate risk using a combination of business rules, predictive models, and information gathered from customer interactions. You can then take the appropriate actions to minimize your organization’s exposure to fraudulent activities or highrisk customers or transactions.

9. Discover unexpected opportunities
Your organization can use predictive analytics to respond with greater speed and certainty to emerging challenges and opportunities, helping you to keep pace in a constantly changing business environment.

10. Guarantee your organization’s competitive advantage
Predictive analytics can drive improved performance in every operational area, including customer relations, supply chain, financial performance and cost management, research and product development, and strategic planning. When your organization runs more efficiently and profitably, you have what it takes to out-think and out-perform your competitors

So what is Predictive Analytics. Check out the description on Wikipedia

Let me know you views and comments on the above.

Brendan Tierney


Oracle Data Miner – New Resources

Posted: January 27th, 2010 | Author: Brendan Tierney | Filed under: Best Practice, Business Intelligence, Data Warehouse, Training, data mining | Tags: , , , , | No Comments »

Over the past couple of weeks a couple of new web resources have appeared on Oracle Data Miner

The first one is that Charlie Berger, the director of Oracle Data Mining Product Management, has started a blog specifically for Oracle Data Miner. Check it out,
http://blogs.oracle.com/datamining/

If you are already using Oracle Data Miner or are interested in following its developments why not join the Oracle Data Miner Facebook group
http://www.facebook.com/pages/Oracle-Data-Mining/287065104533?ref=mf


Why Has Data Mining Struggled So Much?

Posted: November 20th, 2009 | Author: Brendan Tierney | Filed under: Business Intelligence, Data Warehouse, Oracle, data mining | Tags: , , , , | No Comments »

Bill Inmon has recently posted an article on “Why has Data Mining struggled so much?”

The article discusses 7 diferent reasons why data mining has struggled, as it has been around for a very long time.

The main points are
1. We have been waiting a long time for it to become available in a usable way
2. Data mining is considered an academic focused with very few practitioners. But this is become less so
3. Data mining requires a different set of skills. Yes you need data management skills but you also need some data mining skills. I will be making a posting focusing on the skill sets required for data mining in the coming weeks.
4. Some industries and application areas are more suited to data mining than others. The difficult is in identifying suitable projects.
5. Data for Data Mining is unclean. Not if you use a data warehouse. Idealy an organisation who has a matur-ish BI infrastrucure will benefit must from a Data Mining project
6. Data is incomplete. Yes you may need to enrich the data from various sources. But again if you have a Data Warehouse you will have most of these
7. Approaches to data mining inadequate. Alot of the approches to data mining projects as based on its statistical history. New problem areas are evolving all the time and we can use data mining in lots of different way.

To view Bill Inmon’s article – click here.

To view our 2 training courses on data mining – click here

Brendan Tierney


Data Warehousing Books: Design and architecture

Posted: October 31st, 2009 | Author: Uli Bethke | Filed under: Books, Business Intelligence, Data Warehouse, Data Warehousing Books | Tags: , , , , | 2 Comments »

In another post I have covered data warehousing books in the world of Oracle. We’ve also had a look at data warehousing and business intelligence books for project management and business analysis. Today we will look at data warehousing and business intelligence books that look at the technical design and architecture of a data warehouse solution.

Must Have

DW 2.0: The Architecture for the Next Generation of Data Warehousing: Bill Inmon revisits his data warehouse architecture. Addresses the following issues: Real-time BI, unstructured data, the enterprise data warehouse and change, the data life cycle, time variance of data. Very useful from a conceptual point of view, but not enough detail.

The Data Warehouse Toolkit- The Complete Guide to Dimensional Modelling. My first book on data warehousing. Still valuable today. Great for dimensional modelling data marts or small non-realtime Enterprise Data Warehouses based on Kimball’s conformed dimensions. It also has a good overview on industry specific data model patterns in a dimensional context. A must have.

The Data Model Resource Books Vol 1-3: The books describe fundamental data modeling patterns that can be applied and reused across the enterprise. If you are assigned the task of modelling an Enterprise Data Warehouse, these books give you great insight into best practices in data modelling. Volume 2 offers industry specific data model patterns and provides invaluable information to better understand the issues at hand in a particular industry. Personally I find it that you should actually start with volume 3 as this is the most generic of the three books. Also if you only get one of the books get volume 3.

If you have a requirement around near-real time data warehousing and operational business intelligence I recommend to look into Dan Linstedt’s data vault modelling techniques. The Business of Data Vault Modeling will get you started.

Some more recent additions to the data warehouse architecture league of books includes Building and Maintaining a Data Warehouse and Advanced Data Warehouse Design. The first of these walks us through all the technical areas of a data warehouse project: source system analysis, database design, bi reporting, data quality, metadata. In my opinion, the best chapter is on data integration and ETL. There are very few dedicated ETL books out there and this is one of the few that touches on the subject, albeit from a high level. In Advanced Data Warehouse Design the authors discuss the shortcomings of existing data warehouse implementations focusing mainly on spatial and temporal data, e.g. the shortcomings of slowly changing dimensions when capturing changes over time. They propose a truly temporal and spatial data warehouse. Examples are given in MS SQL Analysis Service (temporal) and Oracle OLAP (temporal and spatial).

To my knowledge the only book out there dedicated to the physical design of databases is Physical Database Design: the database professional’s guide to exploiting indexes, views, storage, and more. Most of the stuff covered here is for advanced users. It covers Oracle, DB2, SQL Server, and for some of the MPP stuff Teradata. Personally I found the chapter on physical design for a shared nothing architecture, and the chapter on hardware (CPU architecture, disks, server sizing etc.) the most useful.

               
               
               

Dr. Ronnie Abrahiem, Software Engineer at CIBER has recently published a book on combining SOA and data warehousing in a near-real time environment. This looks quite interesting but I haven’t read the book myself. It has the rather long title Data Warehousing with Service-oriented Architecture: Designing and Implementing Prototype Models For an Integration of Near-Real-Time Data Warehousing Architecture with Service-oriented Architecture. I am currently working on a project where we want to integrate a SOA based MDM solution with the data warehouse. The book may offer some interesting insights around this.

Should Have

If you have a lot of aggregate tables in your warehouse I recommend to have a look at Mastering Data Warehouse Aggregates for a formalised methodology and some really useful tips and tricks around an aggregate navigator.

Another recent addition to data warehouse design books is Data Warehouse Design: Modern Principles and Methodologies. Very useful chapter on ETL and quite affordable.

Could Have

Data Warehouse Design Solutions. This is useful as a second reference for industry specific dimensional models. However, it can not replace Kimball’s original book on the subject.

Clickstream Data Warehousing. If you are implementing a data warehouse for web analytics you should have a look here. However, in light of the explosion of data volumes and with Hadoop and MapReduce at hand this one is slightly obsolete.


TIME dimension script Oracle

Posted: October 30th, 2009 | Author: Uli Bethke | Filed under: Data Warehouse, ETL, SQL for Analysis | Tags: | No Comments »
SELECT
   n AS time_id,
   TO_CHAR(to_date(n,'SSSSS'),'HH24') AS hour,
   TO_CHAR(to_date(n,'SSSSS'),'MI') AS minute,
   TO_CHAR(to_date(n,'SSSSS'),'SS') AS second
FROM (
   SELECT
      level-1 n
   FROM
      DUAL
   CONNECT BY LEVEL <= 86400
)

Data warehousing for free! Terabyte sized data warehouse and business intelligence without license costs

Posted: October 26th, 2009 | Author: Uli Bethke | Filed under: Business Intelligence, Data Warehouse | Tags: , , , | No Comments »

Greenplum

This is no joke. Greenplum on 19 October announced a free single node edition of its analytical database.

For those of you who haven’t heard about Greenplum, they are a provider of an MPP database software that runs on commodity hardware (unlike some its competitors such as Teradata, Netezza, or recently Oracle with Exadata). The database is based on open source database software PostgreSQL, however, is closed source itself.

Features of the database include Massively Parallel Processing, redundancy, compression, row-level or column oriented data storage, compression, partitioning, SQL standard including SQL 2003 OLAP (analytic functions etc.), MapReduce support, ODBC & JDBC support.

So what restrictions are there for the single node edition. Obviously you are only allowed to run it on a single node. Below is an extract from the Greenplum datasheet:

  • Unlimited production usage on a single commodity x86 server using up to 2 CPU sockets (and unlimited CPU cores), or in a single virtual machine using up to 8 virtual CPU cores.
  • Fully parallel SQL and MapReduce processing leverages multi-core parallel-processing engine for every query.
  • No storage capacity cap: from GBs to 10s of TBs.
  • Hybrid row and column-oriented processing.
  • Free community support as well as a low-cost, paid support option.

Of course, the full power of Greenplum’s shared nothing architecture only materialises with multiple nodes. But the company says that you can expand seamlessly from a single-node to multi-node architecture.

Documentation is installed when you install the single-node edition. Couple of thousand pages long but tiny compared to the beast you get with the Oracle database.

Use cases

I can see two immediate use cases for this:

(1) Greenplum themselves promote this offering as part of their Enterprise Data Cloud. They have a vision of self service data marts. Based on this, data analysts can go to the Enterprise Data Warehouse and via interfaces create their own data marts for in depth analysis outside the EDW. Have a look at Curt Monash’s excellent article on the future of data marts.

(2) I can see another use case for departmental solutions. You could set up your first couple of subject areas or data marts on a single node machine and if you reach limits on this single node, add more nodes to scale out. Or if you don’t reach this limit just stay on this setup forever.

So why are they giving away data warehouses for free? In another article, Curt Monash gives the following reasons:

  • Adding value to its Enterprise Data Cloud story
  • Seeding the market for future enterprise sales
  • Depriving competitors of revenue, perhaps at enterprises too small to ever be paying Greenplum customers

Microstrategy

Combine the Greenplum offering with Microstrategy’s free Reporting Suite, and you have a best of breed departmental solution for zilch.

The following restrictions apply to the Microstrategy BI tool:

- 100 named users for the frontend of the BI tool and the BI server
- Two named users for the semantic layer module
- Limited to one CPU. I presume it is limited to one CPU core, but this is not clear from the website
- Two named users for the other modules in their BI suite, e.g. OLAP reporting etc.

Have a look at their website for a full set of features and conditions.

For the right set of requirements the above is an attractive and very cost-effective combination. On top of that it is scalable. So if you grow out of it just scale out and add on.