Module 1: Data Warehouse Design and Implementaion
Lecture Summary
- Data Warehouse and Design
The data warehouse and design lecture begins by discussing 2 major types of organizational information assets (1) Operational Data and (2) Data Warehouses. These two types of information assets are stored in Online Transaction Processing System (OLTP) or Online Analytical Processing System (OLAP). The lecture clearly defines the difference and purpose of the OLTP and OLAP databases. OLTP is oriented to support day-to-day business operations while OLAP focuses on the management or aggregation of data.
As the lecture continues, 2 additional concepts are introduced (1) data warehouses (2) data marts. Data warehouse being an enterprise wide collection of data from various organizational databases. (2) Data marts being a data warehouse focusing on the function for a specific department. The concepts of Data Warehouses and Data Marts are used in a variety of industries (e.g. banking / retail). To implement a data warehouse there's a structured design process that uses 5 key processes Data Sources, Extract, Transform and Load Processes (3) data staging areas (4) data warehousing repositories (BI) tools.
- Introduction to Balance Scorecard
The introduction to the balance scorecard lecture begins by explaining the importance of performance measurement and management in Business Intelligence. A common industry framework that is used to link performance measurement and management is a balanced scorecard. The balance scorecard offers a framework to connect financial, customer, internal business and learning and growth goals to an organization's strategy. The balanced scorecard focuses on more than financial metrics and KPIs.
As the lecture continues, we walk through a case study of Southwest Airlines using the balanced scorecard approach to address 4 challenges within the airline industry that all domestic US based carriers (e.g. high fuel, security concerns). We are given context of the airline industry by being presented with industry facts and metrics. The remainder of the lecture applies the balanced scorecard approach to the Southwest Airlines case study. This walks through how the balanced scorecard should tie to the strategy of the company and illustrates objectives, measures, targets and initiatives that Southwest put in place to achieve through organizational goal.
- Dimensional Modeling and Star Schema
The introduction of the dimensional modeling star schema design lecture begins by outlining the 4 key step process of developing a star schema (e.g. Step 1 - select a business process to model). The steps in the star schema are then applied to a retail case study of a Southwest USA grocery store. This illustrates how to use the 4 step process for the case study to develop a preliminary start schema outlining two key components (1) facts (2) dimensions.
As the lecture continues, it delineates the differences between facts and dimensions in their purpose, characteristics, and which types of business events are categorized as a "fact" or "dimension". Once this difference is outlined, the lecture discusses how to bring facts and dimensions together using the star schema or dimensional model. The lecture concludes with the introduction of data cubes and its relationship to multidimensional data and the star schema. This focuses on how we can use a data cube to view our data through different operations such as (1) slice (2) dice (3) roll up (4) drill down (5) pivot.
- Advanced Star Schema Design
Building on the concepts introduced in the last lecture, the advanced start schema design lecture focuses on advanced dimensional concepts. These are additional dimension concepts that are introduced in this lecture to help us better model the data from the types of information we are receiving from the OLTP. For example, (1) Surrogate keys focusing on maintain a sequence integer helps use maintain uniqueness in our data model and join dimensions (2) Degenerative Dimensions to use as a group key for all row within a fact table (3) Junk Dimension - when you have an indicator that has a Boolean value like "0" or "1" (4) Slow Changing Dimensions - which gives the data modeler a variety of different strategies to handle different scenarios when dimensions start to change.
As the lecture continues, we focus on the definitions and differences of different types of facts. This includes for primary concepts (1) additive facts (3) semi-additive facts (3) non-additive facts (4) fact less fact tables. This is important because storing facts will being the baseline for quantitative analysis that will be used by the business intelligence solution to support management decision making and performance management.
- Data Quality Analysis
The data quality analysis lecture begins by focusing on the role of data quality analysis within an organization, specifically, data profiling. Data profiling is used to understand the quality, accuracy and challenges that you have when trying to create your data model. The data profiling process is a multiple step procedure that includes different types of analysis (e.g. invalid values and data inconsistencies). There is so much complexity within the data profiling process that software tools are using to automate the transactional work. A critical component to successful data profiling is having a "profiling plan". The data profiling plan should address integrity checks (e.g. no null values or blanks allowed) and business checks (e.g. employee can only have 1 roll at a time) off the underlying data.
As the lecture continues, we are able to review the magic quadrant for augmented data quality solutions in the market and begin the discussion on the Master Data Management lifecycle process.
- Dashboard Design and Its Use for Analytics
The dashboard design and its use for analytics begins by focusing on the purpose of a dashboard. A dashboard being an easily interpreted real time graphical user interface to monitor current status and trends of an organization's performance against its goals. The lecture discusses the dashboard is not an enterprise information system but delivers quantitative measures on performance. Dashboards can use multiple mechanisms or "widgets" to visualize and contextualize the information that is being measured.
As the lecture continues, the lecture focuses on the (1) dashboard characteristics (provide the big picture (2) design guidelines (minimal distractions) (3) common pitfalls to avoid (exceeding the boundary of a single screen). The lecture concludes with the definition of different types of dashboards (Geographic, Real Time, Performance Management, Analytical, Custom), their purpose and their audience. Finally, there is a Ted Talk by an economist where I learn the following 4 facts to make "stats interesting" to include (1) Accessibility (2) Exciting (3) Simple (4) Contextualization.
- Increasing the Urgency and importance of the Task Type (how?)
- Automate administrative non-value added as part of master data maintenance (what?)
- Involve Functional Business team members on only critical updates to master data (where?)
Hi Michael, thanks for posting your blog I enjoyed it! I definitely agree with you where people like to underestimate the importance of KPIs and how complex they can get. As a former consultant we strained on making sure our data was accurate and that our KPIs were good to present to upper management. Not everything does need a KPI but we should find that correct balance of reporting accurately and not obsessing over little points.
ReplyDeleteThanks again for the read!
Hello Michael,
ReplyDeleteThanks for the interesting post. I've actually just recently been tasked with cleaning up years and years worth of data that has been collected for research in a REDCap database. IT should make for quite the puzzle as the initial design of the project was...interesting, to say the least. On top of this the project changed hands a couple of times over the years, people have tacked on new fields, and data entry elements of the project have changed over the years without necessarily documenting these changes. Thus far, it has involved meeting with the few individuals still in the department who can help fill the gaps. We've also been trying to determine if we can focus on data that is most relevant to our needs and ditching everything else. For instance, there was a lot of self reported family history involving free form fields but there were also was also a count of total family members and the number of them that were diagnosed with particular diseases that could be extracted. This process also really been reinforcing the need for good design at the start of a project!
Hi Graham, are you planning to use any tools for your data quality analysis? How will you figure out which fields are relevant to your business needs?
DeleteHi Michael,
ReplyDeleteGreat post indeed. This post resonates strongly with my experience in designing a star schema for a data warehouse project at Kaiser Health Care. Like the lecture's discussion, I focused on supporting metrics such as patient waiting times, procedure costs, and clinic operations trends.
For example, when selecting the business process for modeling (Step 1 in the star schema design), I prioritized patient scheduling and visits. I categorized facts like the number of visits, revenue generated, and average waiting times, while dimensions included patient demographics, clinic locations, and appointment types.
Additionally, while working with advanced star schema design, I encountered the concept of slow-changing dimensions. To handle scenarios where clinic locations or patient details changed over time, I implemented a strategy to maintain historical data while ensuring the most recent updates were accessible. This approach aligns closely with the post's mention of using strategies to handle evolving dimensions.
Lastly, the section on dashboard design reflects my efforts to create an interactive dashboard for Kaiser’s operations team. Following design principles discussed in the lecture, I minimized distractions and ensured the dashboard provided clear visualizations for KPIs such as patient flow and resource utilization. This real-time tool allowed stakeholders to quickly assess performance and make data-driven decisions, much like the post’s emphasis on a dashboard’s purpose and usability.
Hi Michael,
ReplyDeleteI think your approach to applying the balanced scorecard for a smaller, non-enterprise-wide goal is insightful. It’s a great reminder that tools like this can be effective at all levels, not just at the top. It’s easy to encounter a method in one context and overlook its broader potential. Reflecting on your approach, I realized I was considering an even more granular application—using it for personal goal-setting—but you’ve highlighted how well it bridges mid-range goals and actionable strategies. Thanks for drawing my attention to this in your comment on my post.
Your comments about the challenges in making a business case for data quality and business intelligence expenditures also struck a chord. It reminded me of an intelligence analysis course I took while in uniform. The course seemed geared toward officers, and I never quite understood why I was there. Still, I recall the retired colonels leading it noting a curious pattern: intelligence is often used not for planning, but as retrospective justification.
To me, intelligence should be about gathering information to create an accurate view of the operating environment, identifying opportunities and risks, and using that analysis to shape strategy. Too often, though, the process is reversed—organizations decide what they want to do, draw up plans, and then ask analysts to "sprinkle a little intel on it." Or worse, if something goes wrong, intelligence is used to justify poor decisions after the fact. If that’s how intelligence is perceived and used, it’s no wonder initiatives in its support struggle to gain traction.
Addressing this perception could be key. In an ideal scenario, decisions would stem from sound, rational, evidence-based analysis. But when individuals or organizations prioritize their pre-existing agendas over data-driven decision-making, intelligence becomes a secondary consideration. Shifting this dynamic—perhaps by emphasizing how intelligence can proactively inform and enhance decision-making—might be a way to strengthen the case for these initiatives. Alternatively, demonstrating a clear and undeniable success from using intelligence properly—where you insisted on its proper application and it paid off—could also help build support.
This is very interesting. Do you have any sites or articles that talk about the conversion of intelligence into strategy?
Delete