Module and Programme Catalogue

Search site

Find information on

2020/21 Taught Postgraduate Module Catalogue

OCOM5204M Data Mining and Text Analytics

15 creditsClass Size: 100

Module manager: Professor Eric Atwell
Email: e.s.atwell@leeds.ac.uk

Taught: 1 Jan to 28 Feb (adv year) View Timetable

Year running 2020/21

Pre-requisites

OCOM5100MProgramming for Data Science

This module is not approved as an Elective

Module summary

The module will provide an introduction to linguistic theory and terminology. Students will develop understanding of and the ability to use algorithms and resources for implementing and evaluating text mining and analytics systems. Students will be supported to develop solutions using open-source and commercial toolkits, and will be encouraged to consider the applications of data mining and text analytics through case studies in information retrieval and extraction.

Objectives

The module will provide an introduction to linguistic theory and terminology. Students will develop understanding of and the ability to use algorithms and resources for implementing and evaluating text mining and analytics systems. Students will be supported to develop solutions using open-source and commercial toolkits, and will be encouraged to consider the applications of data mining and text analytics through case studies in information retrieval and extraction.

Learning outcomes
On completion of this module students should be able to:

(1) understand theory and terminology of empirical modelling of natural language;
(2) understand and use algorithms, resources and techniques for implementing and evaluating text mining and analytics systems;
(3) demonstrate familiarity with some of the main text mining and analytics application areas;
(4) appreciate why unrestricted natural language processing is still a major research task.


Syllabus

Indicative content for this module includes:

Theory and terminology in Data Mining and Computational Linguistics

Data and text mining tools and resources for practical applications

Data sources and data warehouses

Tools and techniques for data preparation

Supervised machine learning

Text classification.

Unsupervised machine learning

Clustering

Association, collocation and co-occurrence discovery

Evaluation methods and metrics

Open-source and commercial text mining and text analytics tools

Web-based text analytics

Case studies of research and commercial applications

Teaching methods

Delivery typeNumberLength hoursStudent hours
On-line Learning61.006.00
Group learning62.0012.00
Independent online learning hours28.00
Private study hours104.00
Total Contact hours18.00
Total hours (100hr per 10 credits)150.00

Private study

Private study will include directed reading and exercises and self-directed research in support of learning activities, as well as in preparation for assessments.

Independent online learning involves non-facilitated directed learning. Students will work through bespoke interactive learning resources and activities in the VLE.

Opportunities for Formative Feedback

Online learning materials will provide regular opportunity for students to check their understanding (for example through formative MCQs with automated feedback). Regular group activity embedded into learning will allow self and peer assessment providing opportunities for formative feedback from peers and tutors. 

Students will complete a formative group assessment in the same format as the final individual summative assessment, providing an opportunity for formative feedback.

Methods of assessment


Coursework
Assessment typeNotes% of formal assessment
ReportReport - Analysis Case Study60.00
Group ProjectReport - Analysis Case Study (in groups)0.00
In-course MCQ4 x 20 questions40.00
Total percentage (Assessment Coursework)100.00

Normally resits will be assessed by the same methodology as the first attempt, unless otherwise stated.

Reading list

The reading list is available from the Library website

Last updated: 22/10/2020 11:29:07

Disclaimer

Browse Other Catalogues

Errors, omissions, failed links etc should be notified to the Catalogue Team.PROD

© Copyright Leeds 2019