How to convert Relational Model Schema into Dimensional Model
Introduction
Recently, there has been a growing preference among companies for dimensional data modeling in the implementation of data warehouses and data lakes. Many organizations with existing data warehouses based on relational modeling are now transitioning to dimensional modeling. It’s essential to understand the process of converting from relational to dimensional modeling.
This article serves as a practical guide for those looking to practice dimensional modeling, providing insights and strategies for transforming a robust relational data model from an operational Online Transaction Processing (OLTP) system into a dimensional model suitable for data warehousing.
Need of Dimensional Modeling
The need for dimensional modeling in data warehouses arises for two main reasons. First, according to Ralph Kimball, is the complexity and performance issues with relational models. He argues that relational models are complex and hard to comprehend for business professionals, and their granularity hampers quick responses to ad-hoc queries. In contrast, dimensional modeling simplifies data for better understanding by business managers.
Secondly, Kimball points out performance concerns. He observes that relational models are overly detailed, with many interrelated tables, making them cumbersome to join. These models often represent business processes that don’t occur simultaneously, such as sales and delivery. While OLTP systems aim to minimize redundancy and inconsistency, data warehouses focus on providing historical data for reporting and decision-making, where immediate consistency is not the primary concern. This approach does not imply that data warehouse data is inconsistent; rather, it highlights a different emphasis. Additionally, the sheer volume of historical data stored in data warehouses, often spanning several years, further impacts performance.
Conversation steps
To convert a relational data model into a dimensional model, you begin by identifying the primary business process for analysis, in our example, it’s the sales process. The next step is to determine the grain or the most detailed level of the fact table, which in this case is each individual sales transaction. The fact table, fact_Sales, is then created to store quantitative metrics like quantity sold and total price.
The next phase involves identifying the dimension tables, which describe various aspects of the facts. In our schema, these include dim_Customer, dim_Product, and dim_Time. These tables are denormalized, meaning they combine data from various related tables in the original relational model to reduce the number of joins required during querying.
Additionally, it’s important to simplify any complex many-to-many relationships from the relational model, as these are less efficient in a dimensional context. Surrogate keys are added to each dimension table to ensure uniqueness and to provide a consistent linking structure to the fact table.
Hierarchies within these dimensions are defined next, such as the time hierarchy (Date, Month, Quarter, Year) in the dim_Time table. This facilitates more efficient and meaningful data analysis.
Another crucial aspect is managing slowly changing dimensions, deciding how to handle historical data when attributes in the dimension tables change over time.
Finally, the completed dimensional model should be validated with business users to ensure it aligns with their reporting and analytical needs. This transformation process results in a denormalized, subject-oriented schema optimized for data warehousing and analytical querying, offering improved performance and easier comprehension for business users.