1. Course Identity:
Course Title: Data Mining and Data Warehouses
Semester: 2nd
Weekly Hours: 3
ECTS Credits: 6
2. Course Objectives:
The course aims to familiarize students with new techniques and methods for discovering patterns and knowledge from large datasets through data mining. It also covers the design and construction of data warehouses for effective data storage and retrieval.
Specifically, the course covers the following topics:
- Data Warehouses: models, design, and construction.
- Organizing data into multidimensional structures and the stages of OLAP (OnLine Analytical Processing).
- Data preprocessing and data cleaning techniques.
- Supervised and unsupervised data mining techniques.
- Key data mining algorithms: decision trees, association rules, regression, nearest neighbor search, clustering.
- Evaluation and validation of data mining models.
- Applying data mining algorithms for real-world applications.
- Practical application: data preparation, knowledge discovery, interpreting results, generating/reconfiguring recommendations for decision making.
- Specialized applications: time series and sequences, multimedia data on the Internet, personalized web interfaces, mining data from the Internet.
3. Course Topics:
The course will cover the fundamentals of modern database technologies used in decision support environments:
- Data Warehouses, specifically: (a) differences between a classical database environment that handles transactions in real-time (OLTP – OnLine Transaction Processing) and a data warehouse, (b) the architectural structures of a data warehouse, (c) the individual stages of data flow in a data warehouse, (d) managing metadata in a data warehouse, (e) the concept of data marts and their application in practice, and more.
- Design and construction of a data warehouse (star schemas, snowflakes, constellations).
- Organizing data for analytical processing.
- Basics of analytical processing with OLAP, designing and implementing hierarchies of concepts and multidimensional data cubes. Processing cubes through operations like slicing, dicing, roll-up, aggregation, drill-down, and rotation.
- Alternative implementations of OLAP systems: MOLAP (Multidimensional OLAP) and ROLAP (Relational OLAP).
- MDX query language for managing multidimensional data structures in OLAP applications.
- Data preprocessing techniques: data cleaning, transformation, integration, and reduction.
- Supervised data mining techniques: decision trees for classification and regression, association rules for discovering relationships, and regression for predicting numeric values.
- Unsupervised data mining techniques: clustering for grouping similar data, and nearest neighbor search for recommendation systems.
- Evaluation methods for assessing the quality and performance of data mining models.
- Real-world applications of data mining in various domains.
- Specialized applications of data mining methods in time series and sequences, as well as data resulting from the use of multimedia and the Internet space.
4. Teaching Method:
Lectures (4 hours per week)
Three (3) individual assignments: (a) Data Warehouse & OLAP, (b) Data Mining, (c) Real-world Data Mining Application
5. Student Assessment Method:
The grade for the course will be calculated based on the following formula: (40% average performance grade in the three assignments) + (60% grade in the final written examination).
6. Equipment and Software Requirements:
Familiarity with the use of various OLAP software systems: commercial and open-source (MS-SQL Server Analysis Services, Mondrian Pentaho, Palo)
Familiarity with the use of various data mining software systems (commercial and open-source), e.g., WEKA, IBM DB2 DWE Intelligent Miner for Data, MS-SQL Server Analysis Services, etc.
The above software is available at no cost at the department (as open-source software or through the department’s participation in academic programs for the promotion of commercial software with licenses for educational and non-commercial use).
An essential component of the course is the educational content of the virtual laboratory DBTech EXT: Business Intelligence and Knowledge Discovery from Databases (BI & KDD: http://dbtech.uom.gr/course/view.php?id=6)
The course Data Mining and Data Warehouses is supported by DataCamp, which is a learning platform for data science. Through DataCamp, you can learn R, Python, and SQL through a combination of short expert videos and hands-on-the-keyboard exercises. You can take more than 100 courses from expert instructors in topics such as data importing, data visualization, machine learning, and learn faster through immediate and personalized feedback for each exercise.
7. Suggested Bibliography:
- Berry M.J.A., Linoff G., Data Mining Techniques: For Marketing, Sales, and Customer Support, Wiley, 1997: Chapters 7 and 10
- Connolly T.M., Begg C.E., Database Systems: A Practical Approach to Design, Implementation and Management, Addison Wesley, 2009: Chapters 32-35
- Dunham M.H., Data Mining: Εισαγωγικά και Προηγμένα Θέματα Εξόρυξης Γνώσης από Δεδομένα, Εκδόσεις Νέων Τεχνολογιών, Αθήνα 2004
- Elmasri R., Navathe S.B., Θεµελιώδεις Αρχές Συστημάτων Βάσεων Δεδομένων, Εκδόσεις Κλειδάριθμος, 2011: Chapters 25 and 26
- Han J., Kamber M., Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000: Chapters 2, 3, and 8
- Inmon W.H., Building the Data Warehouse, Wiley, 2005: Chapters 1-4
- Linoff G., Berry M.J.A., Data Mining Techniques: For Marketing, Sales, and Customer Support, Wiley, 2003
- Margaret H. Dunham, Srinidi Mukherjee, Data Mining: Introductory and Advanced Topics, Prentice Hall, 2005
- Ross M., Introduction to Data Mining, Pearson, 2005: Chapters 5 and 8
Further material is provided through the learning environment of the course.