Tutorial: Big Data Algorithms and Applications Under Hadoop
Dr. Kunpeng Zhang
Department of Information and Decision SciencesCollege of Business Administration
University of Illinois at Chicago
601 S Morgan Street, Room UH2407
Chicago, IL 60607
The development of web 2.0 and related technologies have led to an exponential increase in various types of user-generated content including textual and networked information. Finding meaningful nuggets of knowledge from such a big and diverse data has attracted a lot of attention. Hadoop and MapReduce as a well known distributed environment and computing framework have been widely and successfully deployed in many domains, particularly in the field of finance and marketing. Many scalable machine-learning algorithms such as K-means clustering, association rule mining, collaborative filtering, topic modeling, and network analysis have been proposed and implemented in many open-source packages (e.g. Apache Mahout). In this tutorial, we plan to discuss basic concepts, widely used algorithms, and some real-world applications in big data.
Kunpeng Zhang (KZ) is a researcher in the area of large-scale data analysis with particular focuses on mining social media data through machine learning, network analysis, and natural language processing techniques. KZ's projects include understanding people's voices by mining online product/patient reviews, optimizing online advertising strategy by analyzing user historical social behaviors, assessing social brand reputation, distributed network measuring algorithms, and others. Since the 3V characteristics (volume, variety, velocity) among data, scalable mining algorithms are desirely proposed and implemented. KZ received his Ph.D. in Computer Science from Northwestern University in 2013, working with Alok Choudhary. His thesis is "Big Social Media Data Mining for Marketing Intelligence". His research interests include large-scale social computing, text/web mining, scalable machine learning (probabilistic graphical models, optimization), NLP (semantic web, information extraction), and social network analysis (community detection, information diffusion), and healthinformatics under social media data.