Skip to main content

Achieving dimension model with a schema less approach and Entity attribute value model




The static schema-based approach may not work in following scenarios
  1. Source data keeps changing, New attributes keep getting added or removed
  2. The data needs to be viewed from different systems (Analytic cubes, Services  etc )
  3. Time to market is less
  4. The model should be adaptable to change with minimum effort



The traditional modeling present the following challenges
·      Model needs to modify every time there is a change in the data
o   This involves a lot of Data architect's effort, DBA effort, and system downtime
·      Changes in ETL flow
·      Testing effort


The approach is explained with a common business scenario

A traditional bank has acquired a new payment gateway and needs to incorporate the user activity of the system to the existing data warehouse

The user activity is stored in MongoDB and comes as a file with varying fields to the traditional ETL. The organization wants to incorporate the user activity to the traditional warehouse.
The data fields are varying in each geography and this may not be possible in traditional data modeling.

The NoSql based systems can be used to store the user activity data. The tech team wants to leverage the existing warehouse to store the user activity data and use SQL to query and join the data with existing legacy DW system.


There are modeling techniques like Entity value method /Open schema to address certain issues with this kind of scenarios

The architecture specified below is a hybrid approach which brings out best of both the worlds


Input data comes in forms of csv .Stream of strings coma separated with with header information .
session id uniquely identifies the record

Input data


The Data model

The data model captures the session id and attribute names ( col names) and attribute values in a row fashion





·      The model can accommodate varying list of columns
·      Model maintains the order of columns in the input data

Data presentation for traditional and legacy systems

The fact can be exposed as a view to other traditional dimension models for legacy systems like Cubes or other reporting  

Views can be created by pivot sqls .This can be done in multiple ways.SQL server has a pivot function and Oracle,Db2 , Teradata has recursive sqls and table functions to generate the column list dynamically.

A generic ANSI SQL way to do achive the functionality is given below

db2 " Create view ashok.view_FACT_USER_ACTIVITY as SELECT SESSION_ID,
       MAX(DECODE(ATTRIBUTE_NAME, 'loginame' , ATTRIBUTE_VALUE)) AS loginame,
       MAX(DECODE(ATTRIBUTE_NAME, 'location' , ATTRIBUTE_VALUE)) AS location,
       MAX(DECODE(ATTRIBUTE_NAME, 'Time_logged_in' , ATTRIBUTE_VALUE)) AS Time_logged_in,
       MAX(DECODE(ATTRIBUTE_NAME, 'Total_time' , ATTRIBUTE_VALUE)) AS Total_time,
       MAX(DECODE(ATTRIBUTE_NAME, 'Browser_id' , ATTRIBUTE_VALUE)) AS Browser_id
  FROM ashok.FACT_USER_ACTIVITY
  GROUP BY SESSION_ID"

Base table


The output of the view 


Integration with existing dimensional schema 





Key Benefits
  • Easy to load and maintain. Less of  snowflaking  and better performance 
  • Can use the RDBMS indexes, compression, and other features
  • Integrate with existing dimensional model 
  • Integrates well with insert approach of big data systems like spark 
  • Easy to version the data 


Comments