Learning Outcomes At the end of this course, students should be able to:
explain the principles and best practices of managing data with efficiency and effectiveness;
demonstrate knowledge of SQL and NoSQL;
explain data warehouse concepts, methodologies and tools; and
explain data mining architecture and applications.
Course Contents Rational Databases: Mapping conceptual schema to relational schema; Database Query Languages (SQL) and NoSQL, Concept of functional dependencies & multi-valued dependencies. Transaction processing; distributed databases, XML and semantic Web. Data warehousing. Introduction to data science. Introduction to Data Warehouse, OLTP Systems; Differences between OLTP Systems and Data Warehouse: Characteristics of Data Warehouse; Functionality of Data Warehouse: Advantages and Applications of Data Warehouse. Advantages, Applications: Top- Down and Bottom-Up Development Methodology: Tools for Data warehouse development: Data Warehouse Types. Introduction: Scope of Data Mining: What is Data Mining. How Data Mining Works, Predictive Modelling: Data Mining and Data Warehousing: Architecture for Data Mining: Profitable Applications: Data Mining Tools.
Lab work: Practical exercises on basic R commands and data structures for manipulating data; how to read data from multiple formats in and out of R, using loops, conditional statements, and functions to automate common data management tasks. Exercises on how to clean and manage multiple complex datasets, manipulate textual data, basic web scraping techniques, for both standard web pages and the Twitter API. Work on techniques and hardware necessary to manage large datasets efficiently. Practical exercise on managing multiple data sets by example; working with text data; converting long- and wide-format data; and dealing with messy data. R Programming Fundamentals for data I/O and packages, looping and conditional statements, and functions.