Μ201. Data Warehousing and Data Mining

1. Course Identity

Course title: Data Warehousing and Data Mining
Semester: 2nd
Hours per week: 3
ECTS Units: 6

2. Learning Goals

The course considers the current practice relating to methods and techniques in data organization and processing that facilitate the extraction of useful information from large datasets and databases. The technologies pertaining to Data Warehousing and Data Mining are addressed in theory as well as in practice, utilizing open source as well as commercial software

3. Course content

Data Warehousing

  • Data warehouse architectures
  • Data marts
  • Data preparation stage: extraction and cleansing, transformation and loading (ETL)
  • Data exploitation stage: Online Analytical Processing (OLAP)
  • OLAP operations: roll-up, drill-down, slice-and-dice
  • OLAP implementations: Relational (ROLAP), Multidimensional (MOLAP), Hybrid (HOLAP)
  • ROLAP models: star, snowflake, constellation
  • The MDX language for managing multidimensional data

Knowledge Discovery from Databases (KDD) / Data mining (DM)

  • Information vs. data
  • Shallow, multidimensional, hidden, and deep information
  • SQL, OLAP, and DM
  • The discrete stages of DM: data preparation, model building, model validation, deployment
  • Basic DM techniques: decision trees, association rules, regression, nearest neighbour, clustering
  • DM integration into data base applications facilitating decsion support: recommender systems
  • Special topics: time series analysis, web mining, the Page Rank algorithm

4. Teaching

The course will be covered by weekly 4-hour lectures, complemented with representative exercises and model answers presented/discussed in class. Students to work individually on and submit class projects in the following three (3) topics: (a) OLAP and Data Warehousing, (b) Data Mining, and (c) Recommender System or Web Minin

5. Student evaluation

Class projects: 40% of the course grade

Final (written) exam: 60% of the course grade

6. Software and hardware requirements

A range of open source and commercial software is to be utilized, like MS-SQL Server Analysis Services, IBM DWE Intelligent Miner for Data, WEKA, R, Mondrian Pentaho, Palo, etc. The students will take advantage of the existing departmental “educational use only” licensing policy affecting the use of commercial software.

The course will also make use of the relevant DBTechNet educational and training content ( http://dbtech.uom.gr), including a Debian Linux (Virtual Box®) virtual computer with pre-installed database and business intelligence software. The latter is to become freely available to all course participants and runs under all popular operating systems. A minimum of 2GB main memory storage is recommended.

This class is supported by DataCamp, the most intuitive learning platform for data science. Learn R, Python and SQL the way you learn best through a combination of short expert videos and hands-on-the-keyboard exercises. Take over 100+ courses by expert instructors on topics such as importing data, data visualization or machine learning and learn faster through immediate and personalised feedback on every exercise.

7. Bibliography

  • Berry M.J.A., Linoff G., Data Mining Techniques: For Marketing, Sales, and Customer Support, Wiley, 1997: Chapters 7 and 10
  • Connolly T.M., Begg C.E., Database Systems: A Practical Approach to Design, Implementation and Management, Addison Wesley, 2009: Chapters 32-35
  • Dunham M.H., Data Mining: Εισαγωγικά και Προηγµένα Θέµατα Εξόρυξης Γνώσης από Δεδοµένα, Εκδόσεις Νέων Τεχνολογιών, Αθήνα 2004
  • (in Greek) Elmasri R., Navathe S.B., Θεµελιώδεις Αρχές Συστηµάτων Βάσεων Δεδοµένων, τόµοι Α! και Β!, 5η έκδοση (αναθεωρηµένη), Εκδόσεις Δίαυλος, Αθήνα 2007
  • Hand D.J., Mannila H., Smyth P., Principles of Data Mining, MIT Press, 2000
  • Hair J.F., Black B., Babin B., Anderson R.E., Tatham R.L., Multivariate Data Analysis, Prentice Hall, 2005
  • IBM Easy Mining: Administration and Programming Guide, IBM Publication Number SH12-6837-01
  • IBM DB2 Data Warehouse Edition, Using the Intelligent Miner Visualizers, Version 9.1, IBM Publication Number SH12-6840-00
  • IBM Data Management Software RedBook, Enhance Your Business Appplications: Simple Integration of Advanced Data Mining Functions, IBM Publication Number SG24-6879-00
  • Kimball R., Ross M., The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd Ed., Wiley, 2002
  • (in Greek) Ramakrishnan R., Gehrke J., Συστήµατα Διαχείρισης Βάσεων Δεδοµένων, 3η έκδοση, Εκδόσεις Τζιόλα, Θεσσαλονίκη 2012
  • (in Greek) Roiger R.J., Geatz M.W., Εξόρυξη Πληροφορίας: Ένας Εισαγωγικός Οδηγός µε Παραδείγµατα, Εκδόσεις ‘Κλειδάριθµος’, Αθήνα 2008
  • Witten I.H., Frank E., Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann 2005