Data analytics and visualization is an emerging discipline of immense importance to any data-driven organization. This is a project-focused course that provides students with knowledge on tools for data mining and visualization and practical experience working with data mining and machine learning algorithms for analysis of very large amounts of data. It also focuses on methods and models for efficient communication of data results through data visualization.
Lectures: Mon 16:00pm-19:00pm at VH 2005 (Vari Hall)
Office Hours: Drop by my office or by appointment (LAS3050, Lassonde building)
Contact: firstname.lastname@example.org, email@example.com
Data-driven organizations, DDO solutions reference model.
Data ingestion, ETL, data quality, data quality reference model, record linkage, entity resolution, string similarity, data quality scaling issues.
Computing platforms, single-node computing, parellel computing, cluster computing, grid computing, data storage, data warehouse model, data lakes, data storage systems, relational DBMS, columnar DBMS, NoSQL, HDFS, Key-Value stores, object storage, software defined storage, CAP theorem, moving large data, data definition, schema-on-read, schema-on-write, big data analytics architectures, lambda architecture, kappa architecture.
Batch processing (Hadoop/MapReduce), interactive query processing (Dremel/BigQuery), data stream processing (Storm/Huron), unified processing engines (Spark).
Association rules, Market-Basket model, frequent itemsets, A-Priori algorithm, PCY algorithm, SON algorithm.
Finding Similar Items, Shingling, Min-Hashing, Locality-Sensitive Hashing (LSH).
High dimensionality clustering, hierarchical clustering (dendrogram, Euclidean vs non-Euclidean cases), the k-means family of algorithms (initialization, picking k), the BFR algorithm, the CURE algorithm.
Anscombe's quartet, Bertin's visual variables, cognition and perception, colors, pre-attentive vs attentive processing, Gestalt principles, visual metaphors, Tufte's principles of graphical excellence, data sculpture.
Taxonomy of visualization, visualizations qualitative and quantitave data (comparisons, proportions, relationships, hierarchies, maps, part-to-a-whole, distributions, patterns).
Introduction, administrivia, introduction to main problems about networks, basic mathematical concepts, bow-tie structure of the Web.
Degree distributions, shortest paths, clustering coefficient, measuring power-laws.
Project-focused course; no assignments.
Online resources of data.
Online resources of network data.
Online data visualization resources.
Graph/network exploration and visualization
A list of useful online tutorials relating to the course material.
Similar courses about information networks and network analysis.