أهداف المساق |
This module aims to address some of the key concepts required for the traditionally important area of data management, and the increasingly important area of data analytics. The module will compare traditional relational databases with an alternate model (a NoSQL database), and will enable students to choose between the alternatives to select an appropriate means of storing and managing data, depending on the size and structure of a particular dataset and the use to which that data will be put. Students will be introduced to preliminary techniques in data analysis, starting from the position that data is used to answer a question, and introduced to a range of data visualisation and visual analysis techniques that will instil an understanding of how to start exploring a new data set. To ensure that students are comfortable with handling datasets, they will explore a range of openly licensed real-world datasets (either downloaded from their host websites, or provided as snapshots) to illustrate the key concepts in the course. Sources such as data.gov.uk, the World Bank, and a range of other national and international agencies will be used to provide appropriate data. The module will aim to divide approximately equally between issues in data management (technical and socio-legal issues in storing and maintaining datasets), and issues in data analytics (using data to answer questions). Students are not expected to have a background in statistics, but should be comfortable working with mathematical concepts and will need to be competent programmers. The module will be framed around a narrative that looks at how to manage and extract value and insight from a range of increasingly large data collections. At each stage, a comparison will be drawn between different ways of representing the data (for example, using different sorts of charts or geographical mapping techniques), and limitations of the mechanisms presented. To enable students to get a feel for the use of data, each stage will also include an overview of some data analysis techniques, including summary reporting and exploratory data visualisation. The module will be driven by Richard Hamming's famous quote: The purpose of computing is insight, not numbers. Some of the key ideas are: - Introducing data analysis. Starting with a text based data file such as comma separated variable (CSV) document, this unit will provide a brief introduction to some basic operations on simple data files. This will give an opportunity to provide an outline of the key ideas in the module, to ensure that the students have installed the module software correctly, and to begin to familiarise themselves with that software.
- Concepts in data management. The module will look at three key areas in data management: data architectures and data access (CRUD), data integrity, and transaction management (ACID). Each of these will be illustrated using a relational database, and one non-relational alternative, and the advantages and limitations of each model discussed.
- Legal and ethical issues. The module will consider the legal and ethical issues involved in managing data collections. Students will be required to obtain and read (parts of) the Data Protection Act and the Freedom of Information Act, and demonstrate how these apply to issues in data management. They will also consider privacy, ownership, intellectual property and licensing issues in data collection, management, retrieval and reuse.
Concepts in data analytics. These sections will focus on using data to answer a real question; the focus will be on exploratory techniques (such as visualisation) and formulating a question into a form which can realistically be answered using the data that is available. Issues in processing techniques for large and real-time streamed data collections will also be addressed along with techniques and technologies (such as mapreduce) for handling them. This part will use a statistical package such as the python scientific libraries and/or ggplot to visualise the data and carry out appropriate analyses. It is not anticipated that students will need to understand statistical methods in depth.
|
مخرجات المساق |
A. Knowledge and understanding Upon completing this course, students will be able to: - Discuss and describe the similarities and differences between at least two different database models, and how they are used to manage data collections.
- Identify and explain the legal issues surrounding data collection, usage and retention.
- Explain the stages and process of database design
B. Cognitive skills
Upon completing this course, students will be able to: - Select an appropriate database model for a data collection.
- Use data to answer a practical question.
- Analyse a simple scenario to produce a conceptual model.
C. Practical and professional skills
Upon completing this course, students will be able to: - Use a query language to extract information from a database.
- Use a statistical package to explore a data set
- Present an analysis of a dataset to a variety of audiences.
D. Key transferable skills
Upon completing this course, students will be able to: - Write a report detailing a systematic approach to analysing a data set.
- Gain Active listening to the stakeholders regarding their data analysis needs
Communicate the results of data analysis to stakeholders at appropriate level |