# 2023/24 Taught Postgraduate Module Catalogue

## OMAT5102M Exploratory Data Analysis

### 15 creditsClass Size: 150

Module manager: Dr Robert Aykroyd
Email: R.G.Aykroyd@leeds.ac.uk

Taught: 1 Jan to 28 Feb, 1 Jul to 31 Aug View Timetable

Year running 2023/24

### Pre-requisite qualifications

Students are required to meet the programme entry requirements prior to studying the module.

Module replaces

N/A

This module is not approved as an Elective

### Module summary

This course will introduce students to basic techniques, which can be used to perform a preliminary investigation of data sets. Exploring data involves visualising the variables and relationships to help determine outliers, identify trends, suggest suitable statistical models and inform future data gathering.

### Objectives

This module gives students knowledge on how to explore and analyse data sets appropriate for differing data types. The module will provide students with opportunities to develop the skills for visualising and summarising data sets, and practise applying statistical analysis. As well as introducing students to archetypical methods, they will explore more novel approaches: such as kernel density estimation and principal component analysis.

Learning outcomes
On completion of this module students will be able to:

1. Use software to visualise data in different ways.
2. Calculate numerical data summaries.
3. Identify probability models for data.
4. Understand clustering in data and measures of distance.
5. Effectively visualise high dimensional data.
6. Understand through examples how visualisation of data can inform statistical model selection.

Skills outcomes
Skills developed in this module:

- Effective communication through visualisation of data.
- Interpreting data and making modelling decisions based on that interpretation.

### Syllabus

1. Data types: Categorical, discrete, continuous. Data cleaning.
2. Graphical summary: Boxplots, Histogram, KDE.
3. Numerical summary: Location, variability, quantiles. Data manipulation.
4. Discrete distributions: Binomial, geometric, Poisson.
5. Continuous distributions: normal distribution, exponential, Uniform.
6. Bivariate data: Scatterplots, correlation. Linear regression.
7. Logistic regression and classification. PCA and dimension reduction.
8. Use a statistical software to import data and perform simple visualization, exploration and summary.

### Teaching methods

 Delivery type Number Length hours Student hours On-line Learning 1 1.50 1.50 On-line Learning 5 1.00 5.00 Discussion forum 6 2.00 12.00 Independent online learning hours 42.00 Private study hours 89.50 Total Contact hours 18.50 Total hours (100hr per 10 credits) 150.00

### Opportunities for Formative Feedback

Students will have weekly formative assignments (e.g. quizzes, problem sheets or practical tasks) for each taught unit of the module and will be given model solutions with comments.

### Methods of assessment

Coursework
 Assessment type Notes % of formal assessment In-course Assessment Students will be tested predominantly using e-assessment methods or MCQs. 20.00 Assignment The assignment will require students to complete a written report which may feature components of R code, R outputs, calculations and critical analysis of results.    It is expected that the assignment will be completed in one week. 80.00 Total percentage (Assessment Coursework) 100.00

Resit assessment will be available via the Assignment when the module next runs. The Assignment covers all learning outcomes for the module.

### Reading list

There is no reading list for this module

Last updated: 11/08/2023

Disclaimer

## Browse Other Catalogues

Errors, omissions, failed links etc should be notified to the Catalogue Team.PROD

© Copyright Leeds 2019