2024/25 Taught Postgraduate Module Catalogue
OMAT5102M Exploratory Data Analysis
15 creditsClass Size: 150
Module manager: Dr Robert Aykroyd
Email: R.G.Aykroyd@leeds.ac.uk
Taught: 1 Jan to 28 Feb, 1 Jan to 28 Feb (adv year), 1 Jul to 31 Aug View Timetable
Year running 2024/25
Pre-requisite qualifications
Students are required to meet the programme entry requirements prior to studying the module.Module replaces
N/AThis module is not approved as an Elective
Module summary
This course will introduce students to basic techniques, which can be used to perform a preliminary investigation of data sets. Exploring data involves visualising the variables and relationships to help determine outliers, identify trends, suggest suitable statistical models and inform future data gathering.Objectives
This module gives students knowledge on how to explore and analyse data sets appropriate for differing data types. The module will provide students with opportunities to develop the skills for visualising and summarising data sets, and practise applying statistical analysis. As well as introducing students to archetypical methods, they will explore more novel approaches: such as kernel density estimation and principal component analysis.Learning outcomes
On completion of this module students will be able to:
1. Use software to visualise data in different ways.
2. Calculate numerical data summaries.
3. Identify probability models for data.
4. Understand clustering in data and measures of distance.
5. Effectively visualise high dimensional data.
6. Understand through examples how visualisation of data can inform statistical model selection.
Skills outcomes
Skills developed in this module:
- Effective communication through visualisation of data.
- Interpreting data and making modelling decisions based on that interpretation.
Syllabus
1. Data types: Categorical, discrete, continuous. Data cleaning.
2. Graphical summary: Boxplots, Histogram, KDE.
3. Numerical summary: Location, variability, quantiles. Data manipulation.
4. Discrete distributions: Binomial, geometric, Poisson.
5. Continuous distributions: normal distribution, exponential, Uniform.
6. Bivariate data: Scatterplots, correlation. Linear regression.
7. Logistic regression and classification. PCA and dimension reduction.
8. Use a statistical software to import data and perform simple visualization, exploration and summary.
Teaching methods
Delivery type | Number | Length hours | Student hours |
On-line Learning | 1 | 1.50 | 1.50 |
On-line Learning | 5 | 1.00 | 5.00 |
Discussion forum | 6 | 2.00 | 12.00 |
Independent online learning hours | 42.00 | ||
Private study hours | 89.50 | ||
Total Contact hours | 18.50 | ||
Total hours (100hr per 10 credits) | 150.00 |
Opportunities for Formative Feedback
Students will have weekly formative assignments (e.g. quizzes, problem sheets or practical tasks) for each taught unit of the module and will be given model solutions with comments.Methods of assessment
Coursework
Assessment type | Notes | % of formal assessment |
In-course Assessment | Students will be tested predominantly using e-assessment methods or MCQs. | 20.00 |
Assignment | The assignment will require students to complete a written report which may feature components of R code, R outputs, calculations and critical analysis of results. It is expected that the assignment will be completed in one week. | 80.00 |
Total percentage (Assessment Coursework) | 100.00 |
Resit assessment will be available via the Assignment when the module next runs. The Assignment covers all learning outcomes for the module.
Reading list
There is no reading list for this moduleLast updated: 24/05/2024 17:06:29
Browse Other Catalogues
- Undergraduate module catalogue
- Taught Postgraduate module catalogue
- Undergraduate programme catalogue
- Taught Postgraduate programme catalogue
Errors, omissions, failed links etc should be notified to the Catalogue Team.PROD