Master of Science in Management & Systems
Data Warehousing and Data Mining


Professor:Sam Sultan   []
Class website: [] (or) []
Course Days: Fridays 1/28 - 5/6 (no class 3/18 Spring Week)
Course Hours: 2:00pm - 4:35pm
Modality: Onsite - Woolworth Buildling, Room 322


+ syllabus
+ outline
+ books
+ grades
+ final project
+ student list
+ examples & demos
+ homework submission
+ student feedback
+ student evaluation & comments

Session - 1   2   3   4   5   6   7   8   9a   9b   10   11   12   13   14   14b  
                1-sql   3-design   5-join   8-aggr   9-DDL   9-DML   X-func  

Search -
Data Warehousing - Data Mining - SQL - SQL*Tester© - SQL*Chart© - Create DB Insert© - DataMining Tools©
Site Helpful?


The course addresses the concepts, skills, methodologies, and models of data warehousing. The course addresses proper techniques for designing data warehouses for various business domains, and covers concepts for potential uses of the data warehouse and other data repositories in analytics and data mining opportunities.


1. Course Objectives:

In today's organization, the data warehouse is the center of the information systems' knowledge repository. Data warehousing supports informational processing by providing a solid platform of integrated, historical data from which to perform enterprise-wide data analysis. This helps improve profit and guide strategic decision making

Data mining is a recent advancement in data analysis. Data mining exploits the knowledge that is held in enterprise data warehouses and other data stores by examining the data to reveal untapped patterns that suggest better ways to improve quality of product, customer satisfaction and retention, and profit potentials

This course will cover the concepts and methodologies of both data warehousing and data mining.

       The focus of the course will be on the following topics:

2. Student Learning Outcomes:


Required Reading & Materials -

Recommended Reading & Materials -


Contributing factors for determining your course grade include:

Details of Assignment and Evaluation. Grades are FINAL
Please do not negotiate for a better grade. Professor will not provide any "make-up" or "extra credit assignment" to make up for a low grade. If you are expecting to receive a grade of an "A" at the end of the semester, then I expect you to study hard, to attend all sessions (unless you previously notify me), to participate in all classes, to turn in your homework on time, and to keep up with the class reading material. If you see yourself falling behind do not hesitate to ask for help. This will ensure that you stay current with the class, and will ensure that you get a good grade on your work.

NYU SPS Academic Policies and Grading Scale


[Week 1] 1
  • Introduction to Data Warehousing
  • Relationship of Data Mining and Data Warehousing
  • What is a Data Warehouse?
  • Data Warehousing ROI
  • DSS - Decision Support Systems
  • Operational vs. Analytical Systems
  • Evolution of DSS and Data Warehousing
  • OLTP - Online Transaction Processing
  • Characteristics of a Data Warehouse
  • What is a Data Mart? Creating a Data Mart
  • Data Comparison Chart
  • OLAP - Online Analytical Processing
  • Reading: Chapter 1 (both DW Toolkit, and Building the DW),
    Skim thru Glossary (DW Lifecycle Toolkit)
    [Week 2] 2
  • Self Study
  • Planning & Building the Data Warehouse
  • Sponsorship and Cost Justification
  • Project Prerequisites
  • Barriers, Challenges and Risks
  • Preparing for Implementation
  • Developing the Data Warehouse
  • SDLC Methodologies - Waterfall vs. RUP Approach
  • Planning & Project Management
  • Analysis
  • Logical & Physical Design
  • Implementation and Deployment
  • Operations
  • Reading: Chapter 1, 2 (The Data Warehouse Lifecycle Toolkit)
    [Week 3] 3
  • Data Warehouse Design
  • Drivers for Multi-Demensional Analysis
  • Limitations of Relational Models
  • The Data Cube
  • What is dimensional modeling?
  • Advantages of Dimensional Models
  • Logical and Physical Design
  • Data Normalization
  • Benefits and Drawbacks of Data Normalization
  • De-Normalizing of Data
  • Characteristics of a Data Warehouse
  • Subject Oriented, Integrated, Time Variant, Non-Volatile
  • The Star Schema
  • Reading: Chapter 6 (The Data Warehouse Lifecycle Toolkit)
    [Week 4] 4
  • Data Warehouse Schemas
  • Dimensions and Dimension Tables
  • Facts and Fact Tables
  • The Star Schema
  • The Snowflake Schema
  • The Data Warehouse Bus Architecture
  • Conformed Dimensions and Standard Facts
  • Data Granularity
  • Changing Dimensions
  • Reading: Chapter 6 (The Data Warehouse Lifecycle Toolkit)
    [Week 5] 5
  • Components of a Data Warehouse
  • Source Systems, Staging Area, Presentation, Access Tools
  • Building the Data Matrix
  • The Four Steps Process
  • Multiple Fact Tables in a single Data Mart
  • Chain, Heterogeneous, Transaction/Snapshot & Aggregate Facts
  • Degenerate and Junk Dimensions
  • Fact and Dimension Table Detail
  • Identifying Source for each Fact & Dimension
  • Mapping from Source to Target
  • Reading: Chapter 7, 4 (The Data Warehouse Lifecycle Toolkit)
    [Week 6] 6
  • The ETL Process
  • Extracting the Data into the Staging Area
  • The Challenge of Extracting from Disparate Platforms
  • Full vs. Incremental Extracts
  • Detecting Changes to Data
  • Transforming the Data
  • Complexity of Data Integration
  • Dealing with Missing & Dirty Data
  • Data Transformation Tasks
  • Loading the Data
  • Timing and Job Control of Data Loads
  • Reading: Chapter 9 (The Data Warehouse Lifecycle Toolkit)
    [Week 7] 7
  • Midterm Exam
  • Covers sessions 1 through 6
    [Week 8] 8
  • Aggregating Data
  • Goals and Risks of Data Aggregation
  • Deciding What to Aggregate
  • Data Sparsity
  • Design Requirement for Aggregates
  • The problem with Aggregates
  • Aggregate Navigators
  • Reading: Chapter 8 p353-357(The Data Warehouse Lifecycle Toolkit)
    [Week 9] 9a
  • Self Study
  • Selecting the Business Subject
  • Declaring the Grain
  • Choosing the Dimension
  • Identify the Fact
  • Avoiding Null Keys
  • Retail Market Basket Analysis
  • Additive and Semi-Additive Facts
  • The Value Chain Integrated Inventory Model
  • Order Management Data Marts
  • Date and Other Dimension Role Playing
  • Allocation to Lower Level Facts
  • Profit and Loss Data Marts
  • Reading: Chapter 2, 3, 5 (The Data Warehouse Toolkit)
  • Self Study
  • CRM Overview
  • Customer Dimension
  • Demographic Dimension Outriggers
  • Date Dimension Outriggers
  • Large Changing Customer Dimension
  • Mini-Dimensions
  • Commercial Customer Hierarchies
  • Fixed vs. Variable Level Hierarchies
  • General Ledger Accounting
  • OLAP role in G/L and Chart of Accounts
  • Time Stamped Employee Dimensions
  • Reading: Chapter 6, 7, 8 (The Data Warehouse Toolkit)
    [Week 10] 10
  • Clickstream/Web Data Warehouses & Analytics
  • Overview of Web Based Interaction
  • Challenges of Tracking Data
  • Creating Persistent State on the Web
  • Techniques for Tracking States
  • Working with Cookies
  • User Registration
  • Web Server Log Files
  • Online Advertising and Analytics
  • Online Page Tracking and Analytics
  • User Dimension and Page Hits Facts
  • Reading: Chapter 15 (The Data Warehouse Toolkit)
    [Week 11] 11
  • Introduction to Data Mining
  • What is Data Mining Good For?
  • Data Mining Examples and Tools
  • Connection between Data Mining and Data Warehousing
  • Retrospective Reporting vs. Predictive
  • Data Mining Business Applications
  • Data Mining vs. Statistics vs. OLAP
  • Sampling, Probabilities, Bayes Theory, Regression & Decision Trees
  • Probabilities with Laplace Smoothing
  • Clustering, Segmentation and Association Techniques
  • Reading: Online
    [Week 12] 12
  • Data Mining Techniques I
  • Terminology
  • Bayesian Theorem and Probabilities
  • Naive Bayes Classifier and Predictions
  • Naive Bayes with multiple variables and multiple targets
  • Linear Regression
  • Linear Regression with multiple independent variables
  • Other Data Mining techniques
  • Reading: Online
    [Week 13] 13
  • Data Mining Techniques II
  • Supervised vs. Unsupervised
  • Classification and Segmentation
  • Tree Inductions
  • Entropy and Information Gain
  • The ID3 and C4.5 classifier process
  • Clustering
  • Association
  • Reading: Online
    [Week 14] 14
  • Data Mining using Weka
  • Practice excersises
  • Reading: Online
  • Final Exam
  • Covers sessions 8 through 13
  • Final Project Due

  • All contents © Sam Sultan.
    NYU SPS Master's Degree Program web site
    For more information, send e-mail to: