Data Mining
EECS-4412
Fall 2016
York University


Semester: Fall 2016
Course/Sect#: EECS-4412
Time: Tue 1:00pm-2:30pm
Thu 1:00pm-2:30pm
Location: CB 122
Instructor: Aijun An
Office: LAS 2048
Office Hours: Tuesdays and Thursdays 2:40-3:30pm
Phone #: 416-736-2100 x44298
e-mail: aan@cse.yorku.ca


Welcome to the Data Mining course, EECS-4412, for Fall 2016. Materials, instructions, and notices for the course will accumulate here over the semester.


Message Board

January 2, 2017
Grades are posted. You can check yours on ePost.
December 6, 2016
Please be reminded that the final exam will take place at 2:00am-4:30pm on Wednesday December 7. The location is ACE 005.
December 1, 2016
The due date for the project is extended to Sunday December 4 at 8pm.
November 18, 2016
The size of the data used in the project is increased. The training data now contain 2000 emails (1362 ham emails and 638 spam emails), and the test data contain 1000 emails (688 ham emails and 312 spam emails). Please go to the project page to download the new data. You should use the new data in the project.
November 16, 2016
Project is posted. Please see the link below in the Assignments and Project section.
November 15, 2016
The due time for Assignement 3 is extended to 11:00pm tonight.
November 15, 2016
Midterm solutions are posted. Click here to download.
November 10, 2016
Midterm marks are posted. You can check yours by using ePost.
November 10, 2016
An FAQ page for Assignment #3 is created. Please see here.
November 4, 2016
Assignment #3 is posted. Please see the link below in the Assignments and Project section.
October 30, 2016
Solutions to Assignment #2 are posted. Please see here.
October 27, 2016
Please be reminded that the midterm test will be held on Tuesday November 1 at the class time in CB 115. Note that the location is different from our regular classroom. For sample test questions, click here. The username and password are the same as the ones used for accessing the lecture notes.
October 23, 2016
A correction is made to the standard deviation value of the example on Slide 20 of "Data Preparation (Part 1)". The value should be 29735 instead of 22000. The z-score values in the example of that slide are changed accordingly.
October 20, 2016
Solutions to Assignment #1 are posted. Please see here.
October 15, 2016
Assignment #2 is posted. Please see the link below in the Assignments and Project section.
September 19, 2016
Assigment #1 is posted. Please see the link below in the Assignments and Project section. The access to the assignment is password-protected. The username and password have been emailed to your eecs account (or yorku account if you do not have an eecs account).
September 7, 2016
This web page is set up. Welcome to the course!


Description

Data mining or knowledge discovery from databases (KDD) is one of the most active areas of research in databases. It is at the intersection of database systems, statistics, AI/machine learning, and data visualization. In this course, we will introduce the concepts of data mining and present data mining algorithms and applications. Topics include association rule mining, sequential pattern mining, classification models, clustering, and text mining.


Prerequisites

  • Required: a course on data structures and an introductory course on database systems.
  • Preferred: basic concepts in probability and statistics.


Materials

  • Textbook
      Jiawei Han, Micheline Kamber and Jian Pei, Data Mining - Concepts and Techniques, Morgan Kaufmann, Third Edition, 2011.
  • Reference Books and Materials
    • Charu C. Aggarwal, Data Mining, The Textbook, Springer, 2015.
    • Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining, Addison Wesley, 2006.
    • Ian H. Witten and Eibe Frank, Data Mining -- Practical Machine Learning Tools and Techniques (Second Edition), Morgan Kaufmann, 2005.
    • S.M. Weiss and N. Indurkhya, Predictive Data Mining, Morgan Kaufmann, 1998.
    • Margaret H. Dunham, Data Mining -- Introductory and Advanced Topics, Prentice Hall, 2003.
    • Some conference/journal papers
    • More books can be found here


Grading Scheme

  • Assignments (25%)
  • Midterm (20%) (Tuesday November 1 at the class time in CB 115)
  • Project (20%)
  • Final exam (35%)


Lectures


Assignments and Project

  • Assignment 1 (Weight: 7%) (Due Tuesday October 4 in class)
  • Assignment 2 (Weight: 5%) (Due Thursday October 27 by 5pm. Please submit a pdf file online here
  • Assignment 3 (13%) (Due Tuesday November 15 by 5pm).
  • Project (20%) (Due Friday December 2 at 5pm)

Useful On-line Information

Academic Honesty