Module and Programme Catalogue

Search site

Find information on

2024/25 Undergraduate Module Catalogue

LING2065 Data Science for Linguists

20 creditsClass Size: 18

Module manager: Cecile De Cat
Email: c.decat@leeds.ac.uk

Taught: Semester 1 (Sep to Jan) View Timetable

Year running 2024/25

Pre-requisites

MODL1060Language: Structure and Sound

This module is not approved as a discovery module

Module summary

This module aims to equip students with the knowledge and skills required to address the following questions: What kinds of data inform linguistic research? How are these generated? How can we transform, visualise, and analyse these data to identify interesting patterns, and thereby gain a deeper understanding of the information contained in the data? Through demonstrations and hands-on activities using linguistic data, students will learn the basics of data manipulation and analysis.

Objectives

In this module students will learn:
- To navigate the IT environment required to work with electronic data. This will include organising data, setting up folder structures, and being able to use R, a versatile tool which is freely downloadable, very widely used, and very well supported by a large community of data scientists across the world.
- To understand the different types of data one can encounter in linguistic research. This will include understanding the different types of variables that are used in statistics.
- to prepare data to make it suitable for analysis. This includes data cleaning, reorganisation, and transformation.
- to describe a dataset so it is properly documented.
- to visualise data and understand how it is distributed.
- To understand how variables can relate with each other; to be able to visualise and interpret different types of relationship between variables.
- To estimate whether a difference observed visually is statistically meaningful, on the basis of statistical model summaries.
In classes, students will be guided through demonstrations and practical exercises based on linguistic datasets (e.g., experimental data, corpus data, questionnaire data). The demonstrations and exercises will be done in R.

Learning outcomes
On successful completion of the module students will be able to:
LO.1. describe linguistic datasets: explain how they are structured, and what types of information they contain;
LO.2. explain how data is distributed in a given dataset;
LO.3. explain and show how variables relate to each other in a given dataset;
LO.4. identify how the available data can be used to answer a specific research question, and provide visualisations to inform a preliminary analysis;
LO.5. understand the basics of regression analysis. This includes being able to explain in simple terms how it works and when it is useful, and to be able to interpret the results of a given model, in relation to a research question in linguistics.

Skills Learning Outcomes

On successful completion of the module students will be able to:
SO6. describe well-documented datasets in general terms;
SO7. visualise and describe how data is distributed in a particular dataset;
SO8. Visualise and interpret the relationship between variables in a dataset. This includes creating different types of plots and being able to explain what they show.
SO9. do some basic data cleaning and transformation, so that data can be explored or analysed to answer a specific research question. This includes for instance dealing with missing data, recoding some data if required, and creating new variables if required.
SO10. interpret the results of a simple regression analysis.


Syllabus

Details of the syllabus will be provided on the Minerva organisation (or equivalent) for the module

Teaching methods

Delivery typeNumberLength hoursStudent hours
Lecture101.0010.00
Practical101.0010.00
Independent online learning hours20.00
Private study hours160.00
Total Contact hours20.00
Total hours (100hr per 10 credits)200.00

Opportunities for Formative Feedback

As this module aims to develop practical skills, it will be heavily based on demonstrations and hands-on activities. Students will learn by doing, and will be expected to be proactive in the monitoring of their understanding. Independent self-study will be essential, as knowledge and skills will be acquired incrementally. The lectures and practicals will be interactive, and will aim to provide ample opportunities for feedback and support.
- Interactive tools will be used during lectures to monitor student understanding.
- Exercises will be set weekly in preparation for the practicals. Feedback and advice will be provided during the practicals.
- There will be two pieces of formative assessment (one before each summative assessment).

Methods of assessment


Coursework
Assessment typeNotes% of formal assessment
Presentation10-minute pre-recorded presentation50.00
Total percentage (Assessment Coursework)50.00

Normally resits will be assessed by the same methodology as the first attempt, unless otherwise stated


Exams
Exam typeExam duration% of formal assessment
Online Time-Limited assessment24 hr 50.00
Total percentage (Assessment Exams)50.00

Normally resits will be assessed by the same methodology as the first attempt, unless otherwise stated

Reading list

The reading list is available from the Library website

Last updated: 10/09/2024

Disclaimer

Browse Other Catalogues

Errors, omissions, failed links etc should be notified to the Catalogue Team.PROD

© Copyright Leeds 2019