Module 1: Data Warehouse Design and Implementaion

Lecture Summary

  • Data Warehouse and Design

    The data warehouse and design lecture begins by discussing 2 major types of organizational information assets (1) Operational Data and (2) Data Warehouses. These two types of information assets are stored in Online Transaction Processing System (OLTP) or Online Analytical Processing System (OLAP). The lecture clearly defines the difference and purpose of the OLTP and OLAP databases. OLTP is oriented to support day-to-day business operations while OLAP focuses on the management or aggregation of data.

    As the lecture continues, 2 additional concepts are introduced (1) data warehouses (2) data marts. Data warehouse being an enterprise wide collection of data from various organizational databases. (2) Data marts being a data warehouse focusing on the function for a specific department. The concepts of Data Warehouses and Data Marts are used in a variety of industries (e.g. banking / retail).  To implement a data warehouse there's a structured design process that uses 5 key processes Data Sources, Extract, Transform and Load Processes (3) data staging areas (4) data warehousing repositories (BI) tools. 

  • Introduction to Balance Scorecard

    The introduction to the balance scorecard lecture begins by explaining the importance of performance measurement and management in Business Intelligence. A common industry framework that is used to link performance measurement and management is a balanced scorecard. The balance scorecard offers a framework to connect financial, customer, internal business and learning and growth goals to an organization's strategy. The balanced scorecard focuses on more than financial metrics and KPIs. 

    As the lecture continues, we walk through a case study of Southwest Airlines using the balanced scorecard approach to address 4 challenges within the airline industry that all domestic US based carriers (e.g. high fuel, security concerns). We are given context of the airline industry by being presented with industry facts and metrics. The remainder of the lecture applies the balanced scorecard approach to the Southwest Airlines case study. This walks through how the balanced scorecard should tie to the strategy of the company and illustrates objectives, measures, targets and initiatives that Southwest put in place to achieve through organizational goal. 

  • Dimensional Modeling and Star Schema

    The introduction of the dimensional modeling star schema design lecture begins by outlining the 4 key step process of developing a star schema (e.g. Step 1 - select a business process to model). The steps in the star schema are then applied to a retail case study of a Southwest USA grocery store. This illustrates how to use the 4 step process for the case study to develop a preliminary start schema outlining two key components (1) facts (2) dimensions. 

    As the lecture continues, it delineates the differences between facts and dimensions in their purpose, characteristics, and which types of business events are categorized as a "fact" or "dimension". Once this difference is outlined, the lecture discusses how to bring facts and dimensions together using the star schema or dimensional model. The lecture concludes with the introduction of data cubes and its relationship to multidimensional data and the star schema. This focuses on how we can use a data cube to view our data through different operations such as (1) slice (2) dice (3) roll up (4) drill down (5) pivot. 

  • Advanced Star Schema Design

    Building on the concepts introduced in the last lecture, the advanced start schema design lecture focuses on advanced dimensional concepts. These are additional dimension concepts that are introduced in this lecture to help us better model the data from the types of information we are receiving from the OLTP.  For example, (1) Surrogate keys focusing on maintain a sequence integer helps use maintain uniqueness in our data model and join dimensions (2) Degenerative Dimensions to use as a group key for all row within a fact table (3) Junk Dimension - when you have an indicator that has a Boolean value like "0" or "1" (4) Slow Changing Dimensions - which gives the data modeler a variety of different strategies to handle different scenarios when dimensions start to change. 

    As the lecture continues, we focus on the definitions and differences of different types of facts. This includes for primary concepts (1) additive facts (3) semi-additive facts (3) non-additive facts (4) fact less fact tables. This is important because storing facts will being the baseline for quantitative analysis that will be used by the business intelligence solution to support management decision making and performance management. 

  • Data Quality Analysis 

    The data quality analysis lecture begins by focusing on the role of data quality analysis within an organization, specifically, data profiling. Data profiling is used to understand the quality, accuracy and challenges that you have when trying to create your data model. The data profiling process is a multiple step procedure that includes different types of analysis (e.g. invalid values and data inconsistencies). There is so much complexity within the data profiling process that software tools are using to automate the transactional work. A critical component to successful data profiling is having a "profiling plan". The data profiling plan should address integrity checks (e.g. no null values or blanks allowed) and business checks (e.g. employee can only have 1 roll at a time) off the underlying data. 

    As the lecture continues, we are able to review the magic quadrant for augmented data quality solutions in the market and begin the discussion on the Master Data Management lifecycle process. 

  • Dashboard Design and Its Use for Analytics 

    The dashboard design and its use for analytics begins by focusing on the purpose of a dashboard. A dashboard being an easily interpreted real time graphical user interface to monitor current status and trends of an organization's performance against its goals. The lecture discusses the dashboard is not an enterprise information system but delivers quantitative measures on performance. Dashboards can use multiple mechanisms or "widgets" to visualize and contextualize the information that is being measured. 

    As the lecture continues, the lecture focuses on the (1) dashboard characteristics (provide the big picture (2) design guidelines (minimal distractions) (3) common pitfalls to avoid (exceeding the boundary of a single screen). The lecture concludes with the definition of different types of dashboards (Geographic, Real Time, Performance Management, Analytical, Custom), their purpose and their audience. Finally, there is a Ted Talk by an economist where I learn the following 4 facts to make "stats interesting" to include (1) Accessibility (2) Exciting (3) Simple (4) Contextualization.

Analysis of Materials / Reading

    When I read through presentations in this module, I have been thinking about the database warehouse and design section. Specifically, looking at the process of extracting data from the OLTP through the ETL to the data staging area. In theory, this process looks fairly straight forward. The reality is that getting data out of the OLTP system to the data staging area is incredibly difficult.  Depending on the size of the business, number of locations, number of functional business units and areas of operation (North America, Europe, Asia) business can run 10s to 100s of OLTP systems across the enterprise. There is a cost associated with the ETL process and many organizations are not willing to invest in the resources it takes to get data out of the OLTP to a data warehouse to build BI reports. The management teams will ask "Why can't we use data reporting that is available in the OLTP?, We are spending hundreds of thousands of dollars in our OLTP and we cannot get key KPIs to manage our business?". This is a challenging question to try to answer and making the business case to spend money on BI tools takes a significant commitment. For example, in the automotive industry, we run very lean budgets. It has been very difficult to develop a business case to justify the cost of developing BI tools. Because, the business is running without them and how much efficiency that we will really gain is hard to quantify. Here is an article that I found to try to justify the cost of BI tools (Colling, 2023)







I would be interested to hear from others in the class working in similar or other industries if they have similar challenges and how they addressed this challenge?

    I really love the simplicity of the balance scorecard. I think businesses are getting farther and farther away managing their operations with pragmatic and simple approaches to execute their strategy.  If they even have a strategy defined or are just executing against a series of goals. This section was a good reminder to me of these simple management tools that exist. For example, shortly after reviewing this lecture I tried to see if we could apply a balanced scorecard approach to a procurement master data project that I have to implement in 2025. Below is a simple outline I was able to generate using Chat GPT. 


What I liked about this is, generally the only KPI that our business will focus on is "automation level of data processes" which is not a strategic, but a goal and it just pushes all the work from the business to IT which is not the owner of the data. To me, I can easily see how many other aspects of the strategy we are missing to effectively implement a procurement master data cleanup project. I find master data clean-up projects to be incredibly difficult because of the nature of the task. There is a very simple matrix called an Eisenhower Matrix (Eisenhower Matrix - Overview, History, and Categories, n.d.) that measures task importance vs. urgency. I have been arguing that the reason that master data projects are difficult is because, master data is only addressed when it is a problem. Simply put, if an employee has the choice of a strategic negotiation with a supplier or dedicating time to clean up master data. I doubt that will spend time fixing or improving master data. There is always something else that is more important. 


I have argued that when it comes to improving data quality, we need to look at multiple dimensions
  1. Increasing the Urgency and importance of the Task Type (how?)
  2. Automate administrative non-value added as part of master data maintenance (what?)
  3. Involve Functional Business team members on only critical updates to master data (where?)

I would be interested to hear about some data quality issues that other members of the class have seen and how they were able to address them?

Understanding the star schema and advanced star schema to create the dimension model is interesting to understand that there is a systematic approach to development facts and dimensions.

Has anyone used the star schema of any of the BI projects that have been deployed in your organization? How useful was it to organize your thoughts in this framework? If you found it useful why?

I can see you can find some information about the Star Schema in Microsoft Document for Power BI documentation. (Understand Star Schema and the Importance for Power BI - Power BI, 2024)

Finally, KPIs in management dashboards seem like a very simple concept and make sense on surface but, in practices gets really complicated. I found this article interesting to start to illustrate some of the nuances and possible downsides of KPIs. 

Don’t Let Metrics Undermine Your Business ((Don't Let Metrics Undermine Your Business, n.d.)

Let me know your feelings about KPIs and what impact they have had on your organizations?

References

Colling, B. (2023, December 12). How to Build a Business Case for a BI Dashboard Solution — Prime 8 Consulting. Prime 8 Consulting. Retrieved November 16, 2024, from https://www.prime8consulting.com/articles/how-to-build-a-business-case-for-a-bi-dashboard-solution

Eisenhower Matrix - Overview, History, and Categories. (n.d.). Corporate Finance Institute. Retrieved November 16, 2024, from https://corporatefinanceinstitute.com/resources/management/eisenhower-matrix/

Understand star schema and the importance for Power BI - Power BI. (2024, October 29). Microsoft Learn. Retrieved November 16, 2024, from https://learn.microsoft.com/en-us/power-bi/guidance/star-schema

Don't Let Metrics Undermine Your Business. (n.d.). Harvard Business Review. Retrieved November 16, 2024, from https://hbr.org/2019/09/dont-let-metrics-undermine-your-business









































Comments

  1. Hi Michael, thanks for posting your blog I enjoyed it! I definitely agree with you where people like to underestimate the importance of KPIs and how complex they can get. As a former consultant we strained on making sure our data was accurate and that our KPIs were good to present to upper management. Not everything does need a KPI but we should find that correct balance of reporting accurately and not obsessing over little points.

    Thanks again for the read!

    ReplyDelete
  2. Hello Michael,

    Thanks for the interesting post. I've actually just recently been tasked with cleaning up years and years worth of data that has been collected for research in a REDCap database. IT should make for quite the puzzle as the initial design of the project was...interesting, to say the least. On top of this the project changed hands a couple of times over the years, people have tacked on new fields, and data entry elements of the project have changed over the years without necessarily documenting these changes. Thus far, it has involved meeting with the few individuals still in the department who can help fill the gaps. We've also been trying to determine if we can focus on data that is most relevant to our needs and ditching everything else. For instance, there was a lot of self reported family history involving free form fields but there were also was also a count of total family members and the number of them that were diagnosed with particular diseases that could be extracted. This process also really been reinforcing the need for good design at the start of a project!

    ReplyDelete
    Replies
    1. Hi Graham, are you planning to use any tools for your data quality analysis? How will you figure out which fields are relevant to your business needs?

      Delete
  3. Hi Michael,

    Great post indeed. This post resonates strongly with my experience in designing a star schema for a data warehouse project at Kaiser Health Care. Like the lecture's discussion, I focused on supporting metrics such as patient waiting times, procedure costs, and clinic operations trends.

    For example, when selecting the business process for modeling (Step 1 in the star schema design), I prioritized patient scheduling and visits. I categorized facts like the number of visits, revenue generated, and average waiting times, while dimensions included patient demographics, clinic locations, and appointment types.

    Additionally, while working with advanced star schema design, I encountered the concept of slow-changing dimensions. To handle scenarios where clinic locations or patient details changed over time, I implemented a strategy to maintain historical data while ensuring the most recent updates were accessible. This approach aligns closely with the post's mention of using strategies to handle evolving dimensions.

    Lastly, the section on dashboard design reflects my efforts to create an interactive dashboard for Kaiser’s operations team. Following design principles discussed in the lecture, I minimized distractions and ensured the dashboard provided clear visualizations for KPIs such as patient flow and resource utilization. This real-time tool allowed stakeholders to quickly assess performance and make data-driven decisions, much like the post’s emphasis on a dashboard’s purpose and usability.

    ReplyDelete
  4. Hi Michael,

    I think your approach to applying the balanced scorecard for a smaller, non-enterprise-wide goal is insightful. It’s a great reminder that tools like this can be effective at all levels, not just at the top. It’s easy to encounter a method in one context and overlook its broader potential. Reflecting on your approach, I realized I was considering an even more granular application—using it for personal goal-setting—but you’ve highlighted how well it bridges mid-range goals and actionable strategies. Thanks for drawing my attention to this in your comment on my post.

    Your comments about the challenges in making a business case for data quality and business intelligence expenditures also struck a chord. It reminded me of an intelligence analysis course I took while in uniform. The course seemed geared toward officers, and I never quite understood why I was there. Still, I recall the retired colonels leading it noting a curious pattern: intelligence is often used not for planning, but as retrospective justification.

    To me, intelligence should be about gathering information to create an accurate view of the operating environment, identifying opportunities and risks, and using that analysis to shape strategy. Too often, though, the process is reversed—organizations decide what they want to do, draw up plans, and then ask analysts to "sprinkle a little intel on it." Or worse, if something goes wrong, intelligence is used to justify poor decisions after the fact. If that’s how intelligence is perceived and used, it’s no wonder initiatives in its support struggle to gain traction.

    Addressing this perception could be key. In an ideal scenario, decisions would stem from sound, rational, evidence-based analysis. But when individuals or organizations prioritize their pre-existing agendas over data-driven decision-making, intelligence becomes a secondary consideration. Shifting this dynamic—perhaps by emphasizing how intelligence can proactively inform and enhance decision-making—might be a way to strengthen the case for these initiatives. Alternatively, demonstrating a clear and undeniable success from using intelligence properly—where you insisted on its proper application and it paid off—could also help build support.

    ReplyDelete
    Replies
    1. This is very interesting. Do you have any sites or articles that talk about the conversion of intelligence into strategy?

      Delete

Post a Comment

Popular posts from this blog

Module 0: Introduction to Big Data and Business Intelligence

Module 0: Self-Introduction