Module and Programme Catalogue

Search site

Find information on

2024/25 Undergraduate Module Catalogue

MATH1604 Modelling for Big Data

20 creditsClass Size: 60

Module manager: Francesco Cosentino; Pierre-Philippe Dechant
Email: F.Cosentino@leeds.ac.uk; P.P.Dechant@leeds.ac.uk

Taught: Semester 2 (Jan to Jun) View Timetable

Year running 2024/25

Pre-requisite qualifications

Acceptance onto the BSc Data Science programme

This module is not approved as a discovery module

Module summary

This module lays the foundations for computer programming for big data, with basics of procedural and functional programming, data structures and good programming practice such as reproducibility, collaborative editing and version control. Learners encounter and practise techniques for handling big data such as parallelisation, vectorisation and high-performance computing, and for the design of efficient algorithms. These are applied to exploratory data analysis, computational modelling, problem solving, and data visualisation.

Objectives

In this module, the main aims are
- To learn to adopt the computer science mind set and good programming practices necessary for being able to tackle big data sets.
- To lay the foundations for more advanced computer science content around object-oriented programming, software engineering and databases, as well as for data science content around machine learning, artificial intelligence and network analysis.
- To investigate computational implementations of mathematical modelling paradigms from algorithmic and statistical approaches in an integrated way, using a variety of examples, and comparing models with real-world complexity such as problems in sustainability.
- To explore the relationships between deterministic & stochastic simulations, mock data & real data, and predictive modelling & inference.
- To build on ideas from algebra to transform, analyse and visualise vector data, laying the foundations of data science ideas around data visualisation, clustering, optimisation and best fit (e.g. least squares).
- To start building automated processing pipelines that take raw data and then apply reproducible transformation, analysis and visualisation.
The teaching, learning and assessment focus will therefore be on hands-on, collaborative learning sessions and project-based assessment for learning.

Learning outcomes
On successful completion of the module students will have demonstrated the following learning outcomes relevant to the subject:
1. Apply basic principles from functional and procedural programming to create robust code.
2. Use data types, data structures, libraries and algorithms appropriate for big data.
3. Demonstrate good programming practice using appropriate collaborative tools, conventions and behaviours.
4. Explain the mathematical modelling cycle and build computational implementations of basic models of real-world questions.
5. Use deterministic and stochastic simulations, draw conclusions within the model, and critically evaluate their applicability back in the real-world domain.
6. Investigate the role of hyperparameters of models and use mathematical and computational reasoning to make general statements about the behaviour of classes of models.
7. Demonstrate how models with microscopic objects and rules can lead to the emergence of new objects and rules at a higher level.
8. Aggregate and visualise predictions of simulations using appropriate visualisation, and apply basic statistical measures.
9. Generate mock data from simulations, and appreciate the relationship to real data and inference.
10. Perform basic automation, data curation, transformation and exploratory data analysis.
11. Articulate the fundamental role of algebra to data science topics and implement algebraic operations computationally.
12. Algorithmically implement cycles of enhancement for optimisation, use a design approach, and articulate the role of selection criteria, incentives and metrics in the search for optimality.

Skills Learning Outcomes
On successful completion of the module students will have demonstrated the following skills learning outcomes:
SLO1. Apply principles of programming and algorithm design to creative problem solving.
SLO2. Work in mathematical and computational representations of the real world and critically evaluate applicability and limitations.
SLO3. Create appropriate visualisations in a range of formats.
SLO4. Critically evaluate statements concerning numeracy and data literacy.
SLO5. Follow good practices in documentation, transparency and reproducibility of work undertaken.
SLO6. Appreciate how selection criteria, incentives and metrics drive behaviours and affect evaluations.
SLO7. Critically evaluate the importance of different features, and prioritise time and resource accordingly.
SLO8. Follow a design approach for the iterative solution of problems.


Syllabus

1. Functional and procedural programming in python: control structures, data types and structures, development platform, scripting; data acquisition: importing and curating data from external sources e.g. web services, open data, government data, news and social media.
2. Good coding practices: reproducibility, collaborative editing and version control e.g. git.
3. Paradigms, libraries and efficient algorithms for big data such as vectorisation, pandas, dataframes, dictionaries, lists, stacks, queues, trees; automation, parallel processing, high-performance computing, flattening, one-shot encoding etc.
4. Exploratory data analysis, data visualisation, problem solving, and critical thinking.
5. Implementation of algebraic ideas for data science tools such as for data visualisation, transformation, clustering, eigentheory, dimensionality reduction, mock data and simulations; least squares, best fit and optimisation; neighbours and clusters such as k-means and nearest neighbours; common metrics/distances such as Euclidean, Hamming, Levenshtein, cosine.
6. Computational modelling and investigation: stochastic and deterministic models such as random numbers, random walks, Monte Carlo simulations, stochastic reactions, Gillespie, SIR models; Monty Hall; matrix models; game of life, forest fires, sustainability, evolution, diversity and selection, evolutionary optimisation and its role in AI such as for self-driving cars.

Methods of Assessment

We are currently refreshing our modules to make sure students have the best possible experience. Full assessment details for this module are not available before the start of the academic year, at which time details of the assessment(s) will be provided.

Assessment for this module will consist of;

3 x Coursework

Teaching methods

Delivery typeNumberLength hoursStudent hours
Practical102.0020.00
Practical103.0030.00
Independent online learning hours40.00
Private study hours110.00
Total Contact hours50.00
Total hours (100hr per 10 credits)200.00

Opportunities for Formative Feedback

Learners will regularly produce work in the hands-on teaching sessions, and get peer and formative feedback, including for preparation for the group and individual projects. Some information searching, written work and presentation tasks will be set with feedback opportunities. The students can then act on this feedback for their projects or reflective portfolio.

Reading list

The reading list is available from the Library website

Last updated: 22/10/2024

Disclaimer

Browse Other Catalogues

Errors, omissions, failed links etc should be notified to the Catalogue Team.PROD

© Copyright Leeds 2019