Welcome to
EECS6414 Data Analytics and Visualization



About

Course Description

Data analytics and visualization is an emerging discipline of immense importance to any data-driven organization. This is a project-focused course that provides students with knowledge on tools for data mining and visualization and practical experience working with data mining and machine learning algorithms for analysis of very large amounts of data. It also focuses on methods and models for efficient communication of data results through data visualization.

Topics
  • basic graph theory
  • network measurements
  • network models
  • community detection
  • link analysis & prediction
  • frequent itemsets
  • finding similar items
  • clustering
  • dimensionality reduction
  • mining data streams
  • value of visualization
  • exploratory data analysis
  • visualization of multidimensional data
  • visualization of networks
  • tools/systems for data analytics and visualization
Lectures & Office Hours

Lectures: Mon 16:00pm-19:00pm at ACE 013 (Accolade Building East)

Office Hours: Drop by my office or by appointment (LAS3050, Lassonde building)

Instructor

Manos Papagelis

Contact: papaggel@gmail.com, papaggel@eecs.yorku.ca

Textbooks

The course will rely mainly on the following textbooks.

Syllabus

Download the syllabus (v1.0)

Handouts

Lecture 1. Introduction [Slides]

Introduction, administrivia.

Readings:

Lecture 2. Information Networks [Slides]

Introduction to networks, introduction to main problems about network analysis, basic mathematical concepts, bow-tie structure of the Web.

Readings:

Optional readings:

  • A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, J. Wiener. Graph structure in the Web. Computer Networks, 33, 2000.
Lecture 3. Network Measurements [Slides]

Degree distributions, shortest paths, clustering coefficient, measuring power-laws.

Readings:

Optional readings:

  • M. E. J. Newman, Power laws, Pareto distributions and Zipf's law, Contemporary Physics.
Lecture 4. Network Models [Slides]

Erdos-Renyi random graph model, small-world model, configuration model, power-law distributions, scale-free networks, the anatomy of the long-tail, preferrential attachment model.

Readings:

Optional readings:

  • P. Erdos, A. Renyi. On Random Graphs I. Publ. Math. Debrecen, 1959.
  • P. Erdos, A. Renyi. On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutato Int. Koezl., 1960.
  • B. Bollobas. Random Graphs. Cambridge University Press.
  • M.E.J. Newman, S. H. Strogatz and D.J. Watts. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E 64, 026118, 2001.
  • R. Milo, N. Kashtan, S. Itzkovitz, M.E.J. Newman, U. Alon. On the uniform generation of random graphs with prescribed degree sequences. Arxiv, 2004.
  • D. Ellis. The expansion of random regular graphs. Lecture notes from Algebraic methods in combinatorics, Cambridge University, 2011.
  • S. Arora, S. Rao and U. Vazirani. Expander Flows, Geometric Embeddings and Graph Partitioning. In proc. STOC '04, 2004.
  • S. Milgram. The small world problem. Psychology Today 1(1967).
  • J. Travers and S. Milgram. An experimental study of the small world problem. Sociometry 32, 1969.
  • P. S. Dodds, R. Muhamad, D. J. Watts. An Experimental Study of Search in Global Social Networks. Science 301(2003), 827.
  • J. Leskovec, E. Horvitz. Worldwide Buzz: Planetary-Scale Views on an Instant-Messaging Network. Proc. International WWW Conference, 2008.
  • P. Killworth and H. Bernard, Reverse small world experiment. Social Networks 1, 1978.
  • J. Kleinfeld. Could it be a Big World After All? The `Six Degrees of Separation' Myth. Society, 2002.
  • P. Killworth, C. McCarty, H.R. Bernard, M. House. The accuracy of small-world chains in social networks. Social Networks 28, 85-96, 2006.
  • F. Menczer. Growing and Navigating the Small World Web by Local Content. Proc. Natl. Acad. Sci., 99(22): 14014-14019, 2002.
  • L. Backstrom, P. Boldi, M. Rosa, J. Ugander, S. Vigna. Four Degrees of Separation. ACM Web Science Conference. 2012.
  • J. Ugander, B. Karrer, L. Backstrom, C. Marlow. The Anatomy of the Facebook Social Graph. Arxiv 2012.
  • M. Papagelis, F. Bonchi, A. Gionis. Suggesting Ghost Edges for a Smaller World. In Proc. CIKM, 2011.
  • M. Papagelis. Refining Social Graph Connectivity via Shortcut Edge Addition. TKDD, 10(2), 2015.
  • M. Mitzenmacher. A Brief History of Generative Models for Power Law and Lognormal Distributions. Internet Mathematics, vol 1, No. 2, pp. 226-251, 2004.
  • A. Clauset, C.R. Shalizi, and M.E.J. Newman. Power-law distributions in empirical data. SIAM Review 51(4), 661-703, 2009.
  • M.E.J. Newman. Power laws, Pareto distributions and Zipf's law. Contemporary Physics 46(5), 323-351, 2005.
  • M. Faloutsos, P. Faloutsos, C. Faloutsos. On Power-Law Relationships of the Internet Topology. In Proc. SIGCOMM, 1999.
  • A.L Barabasi, R. Albert. Emergence of scaling in random networks. Science, 286, 1999.
  • B.A. Huberman, L. A. Adamic. Growth dynamics of the World-Wide Web. Nature, 399, 1999.
  • H.A. Simon. On a class of skew distribution functions. Biometrika 42, 425-440, 1955.
  • D.J. de S. Price. A general theory of bibliometric and other cumulative advantage processes. J. Amer. Soc. Inform. Sci. 27: 292-306, 1976.
  • A.L. Barabasi, R. Albert, H. Jeong. Mean-field theory for scale-free random networks. Physica A 272 173-187, 1999.
  • R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, E. Upfal. Stochastic models for the Web graph. In Proc. FOCS 2000.
  • W. Aiello, F. Chung, L. Lu. Random evolution of massive graphs. Handbook of Massive Data Sets, Kluwer, pages 97-122, 2002.
  • B. Bollobas, C. Borgs, J. Chayes, O. Riordan. Directed scale-free graphs. In Proc. SODA 2003.
  • R. Kleinberg, J. Kleinberg. Isomorphism and Embedding Problems for Infinite Limits of Scale-Free Graphs. In Proc. SODA, 2005.
  • A. Fabrikant, E. Koutsoupias, C. Papadimitriou. Heuristically Optimized Trade-offs: A New Paradigm for Power Laws in the Internet. In Proc. ICALP, 2002.
  • N. Berger, C. Borgs, J. Chayes, R. D'Souza, R. Kleinberg. Competition-Induced Preferential Attachment. In Proc. ICALP 2004.
  • M. Molloy and B. Reed. A Critical Point for Random Graphs with a Given Degree Sequence. Random Structures and Algorithms 6, 161-180, 1995.
  • J. Carlson and J. Doyle. Highly Optimized Tolerance: A Mechanism for Power Laws in Designed Systems. Physical Review E 60:2, 1999.
  • M.E.J. Newman. The first-mover advantage in scientific publication. European Physics Letters 86, 68001, 2009.
  • S. Redner. Citation statistics from 110 years of Physical Review. Physics Today 58, 49-54, 2005.
  • S. Goel, A. Broder, E. Gabrilovich, B. Pang. Anatomy of the Long Tail: Ordinary People with Extraordinary Tastes. In Proc. WSDM, 2010.
  • D. Pennock, G. Flake, S. Lawrence, E. Glover, C. Lee Giles. Winners don't take all: Characterizing the competition for links on the web. PNAS 99(8), 2002.
  • J. Leskovec, L. Backstrom, R. Kumar, A. Tomkins. Microscopic Evolution of Social Networks. In Proc. KDD 2008
  • J. Leskovec, J. Kleinberg, C. Faloutsos. Graph Evolution: Densification and Shrinking Diameters. ACM TKDD, 2007.
  • R. Kumar, J. Novak, A. Tomkins. Structure and evolution of online social networks. In Proc. KDD, 2006.
  • A.L. Barabasi, H. Jeong, Z. Neda, E. Ravasz, A. Schubert, T. Vicsek. Evolution of the social network of scientific collaborations. Physica A: Statistical, 2002.
  • G. Kossinets, D.J. Watts. Empirical Analysis of an Evolving Social Network. Science, 2006.
  • D. Fetterly, M. Manasse, M. Najork, J.L. Wiener. A large-scale study of the evolution of web pages. In Proc. WWW, 2003.
  • A. Ntoulas, J. Cho, C. Olston. What's new on the web? The evolution of the web from a search engine perspective. In Proc. WWW, 2004.
  • A.L. Barabasi. The origin of bursts and heavy tails in human dynamics. Nature 435, 207-211, 2005.
  • J. Kleinberg. Bursty and Hierarchical Structure in Streams. In Proc. KDD, 2002.
  • R. Kumar, J. Novak, P. Raghavan, A. Tomkins. On the bursty evolution of Blogspace. In Proc. WWW, 2003.
  • M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan, A. Tomkins. Visualizing Tags over Time. In Proc. WWW, 2006.
  • S. Lattanzi, D. Sivakumar. Affiliation Networks. In Proc. STOC, 2009.
Lecture 5. Link Analysis and Prediction [Slides]

Web search, Hubs and Authorities (HITS), PageRank, topic-sensitive PageRank, personalised PageRank, Link prediction, neighborhood-based prediction methods, node proximity based prediction methods, supervised learning models, Facebook's "PYMK" algorithm, Twitter's "WtF" algorithm.

Readings:

  • Networks, Crowds, and Markets (chapter 14)
  • D. Liben-Nowell, J. Kleinberg. The Link Prediction Problem for Social Networks. Proc. CIKM, 2003.
  • L. Backstrom, J. Leskovec. Supervised Random Walks: Predicting and Recommending Links in Social Networks. In Proc. WSDM, 2011.
  • P. Gupta, A. Goel, J. Lin, A. Sharma, D. Wang, R.Zadeh. WTF: The Who to Follow Service at Twitter, WWW 2013.

Optional readings:

  • S. Brin and L. Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Proc. 7th International World Wide Web Conference, 1998.
  • J. Kleinberg. Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.
  • P. Berkhin. A Survey of PageRank Computing. Internet Mathematics, 2005.
  • S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins. Mining the link structure of the World Wide Web. IEEE Computer, August 1999.
  • A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, S. Raghavan. Searching the Web. ACM Transactions on Internet Technology 1(1): 2-43, 2001.
  • A. Borodin, J. S. Rosenthal, G. O. Roberts, P. Tsaparas,Finding Authorities and Hubs From Link Structures on the World Wide Web. 10th International World Wide Web Conference, May 2001.
  • D. Achlioptas, A. Fiat, A. Karlin, F. McSherry. Web Search via Hub Synthesis. 42nd IEEE Symposium on Foundations of Computer Science, p.611-618, 2001.
  • D. Rafiei, A. Mendelzon. What is this Page Known for? Computing Web Page Reputations. Proc. WWW Conference, 2000.
  • P. Domingos, M. Richardson. The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank. In Proc. NIPS, 2002.
  • T. H. Haveliwala. Topic-Sensitive PageRank. 11th International World Wide Web Conference, 2002.
  • A. Altman, M. Tennenholtz. Ranking Systems: The PageRank Axioms. In Proc. of ACM EC, 2005.
  • Z. Gyongyi, H. Garcia-Molina, J. Pedersen. Combating Web Spam with TrustRank. In Proc. of VLDB, 2004.
  • Z. Gyongyi, P. Berkhin, H. Garcia-Molina, J. Pedersen. Link Spam Detection Based on Mass Estimation. In Proc. of VLDB, 2006.
  • A. Borodin, G. O. Roberts, J. S. Rosenthal, P Tsaparas. Link Analysis Ranking: Algorithms, Theory, and Experiments. ACM TOIT, 2005.
  • A. Ntoulas, J. Cho, C. Olston. What’s New on the Web? The Evolution of the Web from a Search Engine Perspective. In Proc. WWW, 2004.
  • M. A. Najork. Comparing the effectiveness of HITS and SALSA. In Proc. CIKM, 2007.
  • B. Bahmani, A. Chowdhury, A. Goel. Fast Incremental and Personalized PageRank. In Proc. of VLDB, 2010.
  • D. Liben-Nowell, and J. Kleinberg, The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7) 1019–1031 (2007)
  • R. Lichtenwalter, J. T. Lussier, N. V. Chawla: New perspectives and methods in link prediction. KDD 2010: 243-252
  • S. A. Myers, J. Leskovec. On the Convexity of Latent Social Network Inference. Neural Information Processing Systems (NIPS), 2010.
  • M. Gomez-Rodriguez, D. Balduzzi, B. Schoelkopf. Uncovering the Temporal Dynamics of Diffusion Networks. In Proc. ICML 2011.
  • A. Clauset, C. Moore, M.E.J. Newman. Hierarchical structure and the prediction of missing links in networks. Nature, 2008.
  • M. Kim, J. Leskovec. The Network Completion Problem: Inferring Missing Nodes and Edges in Networks. In Proc. SDM, 2010.
  • F. Chierichetti, J. Kleinberg, D. Liben-Nowell. Reconstructing Patterns of Information Diffusion from Incomplete Observations. Neural Information Processing Systems (NIPS), 2011.
  • B. Taskar, M.F. Wong, P. Abbeel, D. Koller. Link prediction in relational data. Neural Information Processing Systems (NIPS), 2006.
  • G. Jeh, J. Widom: SimRank: a measure of structural-context similarity. KDD 2002: 538-543
  • R. Lempel, S. Moran: SALSA: the stochastic approach for link-structure analysis. ACM Trans. Inf. Syst. 19(2): 131-160 (2001)
  • P. D’haeseleer, S. Liang, R. Somogyi. Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics, vol. 16, 2000.
Lecture 6. Community Structure in Networks [Slides]

Network communities, Strength of weak ties, community detection, Girvan-Newman algorithm, modularity, modularity optimization, graph partitioning, graph cuts, conductance, spectral graph theory, spectral graph clustering.

Readings:

Optional readings:

  • M. Granovetter. The strength of weak ties. American Journal of Sociology, 78(6):1360-1380, 1973.
  • J.-P. Onnela, J. Saramaki, J. Hyvonen, G. Szabo, D. Lazer, K. Kaski, J. Kertesz, A.L. Barabasi. Structure and tie strengths in mobile communication networks. PNAS, 2007.
  • M. Girvan and M.E.J. Newman. Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99, 8271-8276, 2002.
  • M.E.J. Newman. Modularity and community structure in networks., Proc. Natl. Acad. Sci., 2002.
  • C. Marlow, L. Byron, T. Lento, I. Rosenn. Maintained relationships on Facebook. 2009.
  • B.A. Huberman, D.M. Romero, F. Wu. Social networks that matter: Twitter under the microscope. First Monday, 14(1), 2009.
  • L. Backstrom, D. Huttenlocher, J. Kleinberg, X. Lan. Group Formation in Large Social Networks: Membership, Growth, and Evolution. In Proc. KDD, 2006.
  • P.S. Bearman, J. Moody. Suicide and Friendships Among American Adolescents. Am J Public Health, 94(1): 89-95, 2004.
  • R. Burt. Structural Holes versus Network Closure as Social Capital. Chapeter in Social Capital: Theory and Research, 2001.
  • R. Burt. Structural Holes and Good Ideas. American Journal of Sociology, Vol. 110, No. 2 2004.
  • G. Flake, S. Lawrence, C.L. Giles, F. Coetzee. Self-Organization and Identification of Web Communities. IEEE Computer, 35:3, 2002.
  • G. Flake, K. Tsioutsiouliklis, R.E. Tarjan. Graph Clustering Techniques based on Minimum Cut Trees. Technical Report 2002-06, NEC, Princeton, NJ, 2002.
  • S. Fortunato Community detection in graphs, Arxiv 2009.
  • A. Clauset, M.E.J. Newman, C. Moore. Finding community structure in very large networks. Phys. Rev. E 70, 066111, 2004.
  • M.E.J. Newman, M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113, 2004.
  • U. Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 2001.
  • J. Reichardt, S. Bornholdt. Statistical Mechanics of Community Detection., Phys. Rev. E 74 016110, 2006.
  • S. Fortunato, S. Barthelemy. Resolution limit in community detection. Proc. Natl. Acad. Sci., 2007.
  • U. Brandes, D. Delling, M. Gaertler, R. Goerke, M. Hoefer, Z. Nikoloski, D. Wagner. On Modularity Clustering. IEEE TKDE, 2007.
  • J. Shi, J. Malik. Normalized Cuts and Image Segmentation. IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 22, no. 8, 2000.
  • R. Kannan, S. Vempala, A. Vetta. On clusterings: Good, bad and spectral. Journal of the ACM, 51(3):497-515, 2004.
  • M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal, 1973.
  • A. Pothen, H.D. Simon, K.P. Liou. Partitioning sparse matrices with egenvectors of graph. SIAM Journal of Matrix Anal. Appl., 11:430--452, 1990.
  • L. Hagen, A.B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1992.
  • A. Ng, M. Jordan, Y. Weiss. On spectral clustering: Analysis and an algorithm. NIPS, 2001.
  • U. von Luxburg. Tutorial on spectral clustering. Arxiv 2009.
  • S. Dill, R. Kumar, K. McCurley, S. Rajagopalan, D. Sivakumar, A. Tomkins. Self-similarity in the Web. In Proc. VLDB, 2001.

Assignments

Project-focused course; no assignments.

Project

Resources

Software Tools and Libraries

Data cleansing/wrangling

Graph/network analysis

  • SNAP Libary for working with massive network datsets (C++, Python)
  • NetworkX Library for studying graphs and networks (Python)
  • JUNG Library for modeling, analysis, and visualization of graphs (Java)
  • Metis Family of programs for partitioning graphs

Graph/network exploration and visualization

  • Pajek Program for large network analysis and visualization
  • Gephi Program for graph visualization and exploration

Data Visualization

  • The data visualisation catalogue Reference library for different types of data visualisations
  • Highcharts Libraries for aesthetically pleasing standard charts
  • Google Charts Rich gallery of interactive charts and data tools by Google.
  • Tableau Software for exploratory data analysis
  • D3 Software for interactive visualizations
Tutorials

A list of useful online tutorials relating to the course material

Similar Courses

Similar courses about information networks and network analysis