Achieving dimension model with a schema less approach and Entity attribute value model

The static schema-based approach may not work in following scenarios

Source data keeps changing, New attributes keep getting added or removed
The data needs to be viewed from different systems (Analytic cubes, Services etc )
Time to market is less
The model should be adaptable to change with minimum effort

The traditional modeling present the following challenges

· Model needs to modify every time there is a change in the data

o This involves a lot of Data architect's effort, DBA effort, and system downtime

· Changes in ETL flow

· Testing effort

The approach is explained with a common business scenario

A traditional bank has acquired a new payment gateway and needs to incorporate the user activity of the system to the existing data warehouse

The user activity is stored in MongoDB and comes as a file with varying fields to the traditional ETL. The organization wants to incorporate the user activity to the traditional warehouse.

The data fields are varying in each geography and this may not be possible in traditional data modeling.

The NoSql based systems can be used to store the user activity data. The tech team wants to leverage the existing warehouse to store the user activity data and use SQL to query and join the data with existing legacy DW system.

There are modeling techniques like Entity value method /Open schema to address certain issues with this kind of scenarios

The architecture specified below is a hybrid approach which brings out best of both the worlds

Input data comes in forms of csv .Stream of strings coma separated with with header information .

session id uniquely identifies the record

Input data

The Data model

The data model captures the session id and attribute names ( col names) and attribute values in a row fashion

· The model can accommodate varying list of columns

· Model maintains the order of columns in the input data

Data presentation for traditional and legacy systems

The fact can be exposed as a view to other traditional dimension models for legacy systems like Cubes or other reporting

Views can be created by pivot sqls .This can be done in multiple ways.SQL server has a pivot function and Oracle,Db2 , Teradata has recursive sqls and table functions to generate the column list dynamically.

A generic ANSI SQL way to do achive the functionality is given below

db2 " Create view ashok.view_FACT_USER_ACTIVITY as SELECT SESSION_ID,

MAX(DECODE(ATTRIBUTE_NAME, 'loginame' , ATTRIBUTE_VALUE)) AS loginame,

MAX(DECODE(ATTRIBUTE_NAME, 'location' , ATTRIBUTE_VALUE)) AS location,

MAX(DECODE(ATTRIBUTE_NAME, 'Time_logged_in' , ATTRIBUTE_VALUE)) AS Time_logged_in,

MAX(DECODE(ATTRIBUTE_NAME, 'Total_time' , ATTRIBUTE_VALUE)) AS Total_time,

MAX(DECODE(ATTRIBUTE_NAME, 'Browser_id' , ATTRIBUTE_VALUE)) AS Browser_id

FROM ashok.FACT_USER_ACTIVITY

GROUP BY SESSION_ID"

Base table

The output of the view

Integration with existing dimensional schema

Key Benefits

Easy to load and maintain. Less of snowflaking and better performance
Can use the RDBMS indexes, compression, and other features
Integrate with existing dimensional model
Integrates well with insert approach of big data systems like spark
Easy to version the data

Ashok

Search This Blog

Achieving dimension model with a schema less approach and Entity attribute value model

Comments