2024/25 Undergraduate Module Catalogue
LING2065 Data Science for Linguists
20 creditsClass Size: 18
Module manager: Cecile De Cat
Email: c.decat@leeds.ac.uk
Taught: Semester 1 (Sep to Jan) View Timetable
Year running 2024/25
Pre-requisites
MODL1060 | Language: Structure and Sound |
This module is not approved as a discovery module
Module summary
This module aims to equip students with the knowledge and skills required to address the following questions: What kinds of data inform linguistic research? How are these generated? How can we transform, visualise, and analyse these data to identify interesting patterns, and thereby gain a deeper understanding of the information contained in the data? Through demonstrations and hands-on activities using linguistic data, students will learn the basics of data manipulation and analysis.Objectives
In this module students will learn:- To navigate the IT environment required to work with electronic data. This will include organising data, setting up folder structures, and being able to use R, a versatile tool which is freely downloadable, very widely used, and very well supported by a large community of data scientists across the world.
- To understand the different types of data one can encounter in linguistic research. This will include understanding the different types of variables that are used in statistics.
- to prepare data to make it suitable for analysis. This includes data cleaning, reorganisation, and transformation.
- to describe a dataset so it is properly documented.
- to visualise data and understand how it is distributed.
- To understand how variables can relate with each other; to be able to visualise and interpret different types of relationship between variables.
- To estimate whether a difference observed visually is statistically meaningful, on the basis of statistical model summaries.
In classes, students will be guided through demonstrations and practical exercises based on linguistic datasets (e.g., experimental data, corpus data, questionnaire data). The demonstrations and exercises will be done in R.
Learning outcomes
On successful completion of the module students will be able to:
LO.1. describe linguistic datasets: explain how they are structured, and what types of information they contain;
LO.2. explain how data is distributed in a given dataset;
LO.3. explain and show how variables relate to each other in a given dataset;
LO.4. identify how the available data can be used to answer a specific research question, and provide visualisations to inform a preliminary analysis;
LO.5. understand the basics of regression analysis. This includes being able to explain in simple terms how it works and when it is useful, and to be able to interpret the results of a given model, in relation to a research question in linguistics.
Skills Learning Outcomes
On successful completion of the module students will be able to:
SO6. describe well-documented datasets in general terms;
SO7. visualise and describe how data is distributed in a particular dataset;
SO8. Visualise and interpret the relationship between variables in a dataset. This includes creating different types of plots and being able to explain what they show.
SO9. do some basic data cleaning and transformation, so that data can be explored or analysed to answer a specific research question. This includes for instance dealing with missing data, recoding some data if required, and creating new variables if required.
SO10. interpret the results of a simple regression analysis.
Syllabus
Details of the syllabus will be provided on the Minerva organisation (or equivalent) for the module
Teaching methods
Delivery type | Number | Length hours | Student hours |
Lecture | 10 | 1.00 | 10.00 |
Practical | 10 | 1.00 | 10.00 |
Independent online learning hours | 20.00 | ||
Private study hours | 160.00 | ||
Total Contact hours | 20.00 | ||
Total hours (100hr per 10 credits) | 200.00 |
Opportunities for Formative Feedback
As this module aims to develop practical skills, it will be heavily based on demonstrations and hands-on activities. Students will learn by doing, and will be expected to be proactive in the monitoring of their understanding. Independent self-study will be essential, as knowledge and skills will be acquired incrementally. The lectures and practicals will be interactive, and will aim to provide ample opportunities for feedback and support.- Interactive tools will be used during lectures to monitor student understanding.
- Exercises will be set weekly in preparation for the practicals. Feedback and advice will be provided during the practicals.
- There will be two pieces of formative assessment (one before each summative assessment).
Methods of assessment
Coursework
Assessment type | Notes | % of formal assessment |
Presentation | 10-minute pre-recorded presentation | 50.00 |
Total percentage (Assessment Coursework) | 50.00 |
Normally resits will be assessed by the same methodology as the first attempt, unless otherwise stated
Exams
Exam type | Exam duration | % of formal assessment |
Online Time-Limited assessment | 24 hr | 50.00 |
Total percentage (Assessment Exams) | 50.00 |
Normally resits will be assessed by the same methodology as the first attempt, unless otherwise stated
Reading list
The reading list is available from the Library websiteLast updated: 10/09/2024
Browse Other Catalogues
- Undergraduate module catalogue
- Taught Postgraduate module catalogue
- Undergraduate programme catalogue
- Taught Postgraduate programme catalogue
Errors, omissions, failed links etc should be notified to the Catalogue Team.PROD