Master of Science in Management & Systems
Data Warehousing and Data Mining

MASY1-GC3510

Professor:Sam Sultan   [sam.sultan@nyu.edu]
Class website: [workshop.sps.nyu.edu/~sultans/dwdm] (or) [samsultan.com/dwdm]
Course Days: Fridays - January 26 - May 3 (no class March 22)
Course Hours: 2:00pm - 4:35pm
Modality/Location: Onsite. Midtown, room 1013

Announcement(s):

+ syllabus
+ outline
+ books
+ grades
+ final project
+ student listing
+ examples & demos
+ homework submission
+ student feedback
+ student evaluation & comments

Session - 1    2    3    4    5    6    M    8    9a, b    10    11    12    13    14    F   
                1-sql   3-design   5-join   8-aggr   DDL   DML   func  

Search -
Data Warehousing - Data Mining - SQL*Tester© - SQL*Chart© - Create DB Insert© - DataMining*Tools©
Site Helpful?

COURSE DESCRIPTION:

The course addresses the concepts, skills, methodologies, and models of data warehousing. The course addresses proper techniques for designing data warehouses for various business domains, and covers concpets for potential uses of the data warehouse and other data repositories in mining opportunities.


COURSE LEARNING GOALS:

1. Course Objectives:

In today's organization, the data warehouse is the center of the information systems' knowledge repository. Data warehousing supports informational processing by providing a solid platform of integrated, historical data from which to perform enterprise-wide data analysis. This helps improve profit and guide strategic decision making

Data mining is a recent advancement in data analysis. Data mining exploits the knowledge that is held in enterprise data warehouses and other data stores by examining the data to reveal untapped patterns that suggest better ways to improve quality of product, customer satisfaction and retention, and profit potentials

This course will cover the concepts and methodologies of both data warehousing and data mining.

       The focus of the course will be on the following topics:

2. Student Learning Outcomes:


BOOKS:

Required Reading & Materials -

Recommended Reading & Materials -

GRADE ASSIGNMENT AND EVALUATION

Contributing factors for determining your course grade include:

Details of Assignment and Evaluation. NYU SPS Grading Scale

Grades are FINAL
Please do not negotiate for a better grade. Professor will not provide any "re-do" or "make-up" or "extra credit assignment" to make up for a low grade. If you are expecting to receive a grade of an "A" at the end of the semester, then I expect you to study hard, to attend all sessions (unless you previously notify me), to participate in all classes, to turn in your homework on time, and to keep up with the class reading material. If you see yourself falling behind do not hesitate to ask for help. This will ensure that you stay current with the class, and will ensure that you get a good grade on your work.

Please Note: Professor will not entertain any request for an assignment "redo" or extra credit assignment to improve grade


NYU SPS Academic Policies and Grading Scale
https://www.sps.nyu.edu/homepage/student-experience/policies-and-procedures.html#Graduate1



Statement on Academic Integrity:

New York University is a top level academic institution that takes academic integrity very seriously. All students suspected of violating this policy including cheating and/or plagiarism and/or copying from others or published materials on assignments or exams will be severely penalized for their action.

Statement on Usage of Mobile Devices:

Usage of smartphones is not allowed during class. If you are using a tablet or a laptop to support class learning, these devices must only be used strictly for class purposes. No social media, web surfing or usage of any kind is allowed outside the needs for class consideration.


COURSE OUTLINE:

DATE SESSION TOPIC[s] COVERED
 
[Week 1] 1
  • Introduction to Data Warehousing
  • Relationship of Data Mining and Data Warehousing
  • What is a Data Warehouse?
  • Data Warehousing ROI
  • DSS - Decision Support Systems
  • Operational vs. Analytical Systems
  • Evolution of DSS and Data Warehousing
  • OLTP - Online Transaction Processing
  • Characteristics of a Data Warehouse
  • What is a Data Mart? Creating a Data Mart
  • Data Comparison Chart
  • OLAP - Online Analytical Processing
  • Reading: Chapter 1 (both DW Toolkit, and Building the DW),
    Skim thru Glossary (DW Lifecycle Toolkit)
     
    [Week 2] 2
  • Planning & Building the Data Warehouse
  • Sponsorship and Cost Justification
  • Project Prerequisites
  • Barriers, Challenges and Risks
  • Preparing for Implementation
  • Developing the Data Warehouse
  • SDLC Methodologies - Waterfall vs. RUP Approach
  • Planning & Project Management
  • Analysis
  • Logical & Physical Design
  • Implementation and Deployment
  • Operations
  • Reading: Chapter 1, 2 (The Data Warehouse Lifecycle Toolkit)
     
    [Week 3] 3
  • Data Warehouse Design
  • Drivers for Multi-Demensional Analysis
  • Limitations of Relational Models
  • The Data Cube
  • What is dimensional modeling?
  • Advantages of Dimensional Models
  • Logical and Physical Design
  • Data Normalization
  • Benefits and Drawbacks of Data Normalization
  • De-Normalizing of Data
  • Characteristics of a Data Warehouse
  • Subject Oriented, Integrated, Time Variant, Non-Volatile
  • The Star Schema
  • Reading: Chapter 6 (The Data Warehouse Lifecycle Toolkit)
     
    [Week 4] 4
  • Data Warehouse Schemas
  • Dimensions and Dimension Tables
  • Facts and Fact Tables
  • The Star Schema
  • The Snowflake Schema
  • Degenerate and Junk Dimensions
  • The Data Warehouse Bus Architecture
  • Conformed Dimensions and Standard Facts
  • Data Granularity
  • Changing Dimensions
  • Reading: Chapter 6 (The Data Warehouse Lifecycle Toolkit)
     
    [Week 5] 5
  • Components of a Data Warehouse
  • Source Systems, Staging Area, Presentation, Access Tools
  • Building the Data Matrix
  • The Four Steps Process
  • Multiple Fact Tables in a single Data Mart
  • Chain, Heterogeneous, Transaction/Snapshot & Aggregate Facts
  • Fact and Dimension Table Detail
  • Identifying Source for each Fact & Dimension
  • Mapping from Source to Target
  • Reading: Chapter 7, 4 (The Data Warehouse Lifecycle Toolkit)
     
    [Week 6] 6
  • The ETL Process
  • Extracting the Data into the Staging Area
  • The Challenge of Extracting from Disparate Platforms
  • Full vs. Incremental Extracts
  • Detecting Changes to Data
  • Transforming the Data
  • Complexity of Data Integration
  • Dealing with Missing & Dirty Data
  • Data Transformation Tasks
  • Loading the Data
  • Timing and Job Control of Data Loads
  • Reading: Chapter 9 (The Data Warehouse Lifecycle Toolkit)
     
    [Week 7] 7
  • Midterm Exam
  • Covers sessions 1 through 6
  •  
    [Week 8] 8
  • Aggregating Data
  • Goals and Risks of Data Aggregation
  • Deciding What to Aggregate
  • Data Sparsity
  • Design Requirement for Aggregates
  • The problem with Aggregates
  • Aggregate Navigators
  • Reading: Chapter 8 p353-357(The Data Warehouse Lifecycle Toolkit)
     
    [Week 9] 9a
  • Self-Study
  • Selecting the Business Subject
  • Declaring the Grain
  • Choosing the Dimension
  • Identify the Fact
  • Avoiding Null Keys
  • Retail Market Basket Analysis
  • Additive and Semi-Additive Facts
  • The Value Chain Integrated Inventory Model
  • Order Management Data Marts
  • Date and Other Dimension Role Playing
  • Allocation to Lower Level Facts
  • Profit and Loss Data Marts
  • Reading: Chapter 2, 3, 5 (The Data Warehouse Toolkit)
     
      9b
  • Self-Study
  • CRM Overview
  • Customer Dimension
  • Demographic Dimension Outriggers
  • Date Dimension Outriggers
  • Large Changing Customer Dimension
  • Mini-Dimensions
  • Commercial Customer Hierarchies
  • Fixed vs. Variable Level Hierarchies
  • General Ledger Accounting
  • OLAP role in G/L and Chart of Accounts
  • Time Stamped Employee Dimensions
  • Reading: Chapter 6, 7, 8 (The Data Warehouse Toolkit)
     
    [Week 10] 10
  • Clickstream/Web Data Warehouses & Analytics
  • Overview of Web Based Interaction
  • Challenges of Tracking Data
  • Creating Persistent State on the Web
  • Techniques for Tracking States
  • Working with Cookies
  • User Registration
  • Web Server Log Files
  • Online Advertising
  • Online Page Tracking and Analytics
  • User Dimension and Page Hits Facts
  • Reading: Chapter 15 (The Data Warehouse Toolkit)
     
    [Week 11] 11
  • Data Mining
  • What is Data Mining Good For?
  • Statistics, Artificial Intelligence & Machine Learning
  • Data Mining Examples and Tools
  • Connection between Data Mining and Data Warehousing
  • Retrospective Reporting vs. Predictive
  • Data Mining Applications
  • Data Mining vs. Statistics vs. OLAP
  • Data Mining Statistical Techniques (Sampling, Regression & Decision Trees)
  • Clustering, Segmentation and Nearest Neighbor Techniques
  • Keys to commercial success of Data Mining
  • Reading: Online
     
    [Week 12] 12
  • Data Mining Techniques - Part I
  • Hands-on Presentation and Lab
  • Classification, Regression, Similarity Matching, Co-occurence Grouping
  • Predictive Modeling
  • Clustering/Segmentation
  • Data Mining and Statistics Terminologies
  • Supervised vs. Unsupervised
  • Tree Induction
  • Entropy and Information Gain
  • The ID3 and C4.5 classifier process
  • Clustering
  • Association and Co-occurrence grouping
  • Reading: Online
     
    [Week 13] 13
  • Data Mining Techniques - Part II
  • Hands-on Presentation and Lab
  • Classification, Regression, Similarity Matching, Co-occurence Grouping
  • Predictive Modeling
  • Clustering/Segmentation
  • Data Mining and Statistics Terminologies
  • Supervised vs. Unsupervised
  • Tree Induction
  • Entropy and Information Gain
  • The ID3 and C4.5 classifier process
  • Clustering
  • Association and Co-occurrence grouping
  • Reading: Online
     
    [Week 14] 14
  • Data Mining using Weka
  • Practice excersises
  • Reading: Online
     
    [Week F] F
  • Final Exam
  • Covers sessions 8 through 13
  • Final Project Due


  • All contents © Sam Sultan.
    NYU SPS Master's Degree Program web site
    For more information, send e-mail to: sam.sultan@nyu.edu