This page outlines PhD topics proposed by AIM academics and industry partners. You are welcome to apply for one or more of the below topics, or to propose your own PhD topic, according to application guidelines specified here. We encourage you to contact your chosen supervisor for an informal chat – this will also help you to put together your research proposal, which is an integral part of your application. We also strongly encourage you to list in your application more than one PhD topic, preferably 3 topics.
Supervisor: Dr Dan Stowell
Deep learning is now a widespread tool for audio analysis tasks. Most commonly, these use “convolutional” and/or “recurrent” neural networks. But what will be the next generation of deep learning methods? There are many exciting ideas to build on, such as: deep Gaussian processes, capsule networks, gauge equivalent networks, neural ordinary differential equations, normalising flows, and point process networks. In this project you will explore such possibilities and their usefulness for analysing/generating audio data, and then refine selected methods to create a new generation of audio deep learning that is not only powerful but also insightful. The applications of this can range from automatic music transcription, audio event detection, to understanding the fine details of vocal expression. For this project a strong mathematical background is particularly desirable (e.g. calculus, statistics).
Suggested supervisor: Dr Emmanouil Benetos
in collaboration with Channel 4’s Rights department
Recent music copyright disputes rely primarily on expert opinion. This subjective approach suggests that value is to be found in establishing objective and verifiable boundaries between the works under dispute. But the expert opinion itself is valuable as the basis of a work recognition system which, in turn, may serve to inform a wider understanding of musical identity. This project will focus towards the development of an automated work recognition system using music data signifiers and features based on expert opinions. Research carried out as part of this project will lead to both a usable system for resolving music copyright disputes and may serve to inform a wider understanding of musical identity.
The successful candidate will investigate, propose and develop machine learning and AI methods for musical work recognition applied to copyright disputes. This research project will be carried out in collaboration with Channel 4’s Rights, Royalties and Reporting Systems team.
Suggested supervisor: Dr Andrew McPherson
in collaboration with Bela
Musical interaction places strong requirements on real-time computing systems, including strict specifications on latency, reliability and I/O bandwidth. Advances in computing power over the past decade have meant that tasks that once required high-end workstations are not computationally feasible on mobile and embedded hardware. However, I/O architectures have not always kept up with advances in core computation speed.
This project aims to develop infrastructure for intelligent signal processing applications, where information is extracted from high-bandwidth sensor data. The challenge will be, given a fixed total I/O bandwidth on an embedded device, how to maximise the salience of information sampled from multiple simultaneous sources, and how to synchronise data from external sources that might be clocked independently. The project aims to produce general architectural principles specifically suited to real-time interactive systems on embedded hardware alongside specific demonstrators that can be used to evaluate performance from both technical and human-centred perspectives.
The ideal student on this project will have experience in digital signal processing on embedded hardware and familiarity with electronic hardware design. Experience with musical instrument performance is also desirable.
ROLI is building towards a vision of promoting the joy in the music making, by applying technologies to reduce the entry barriers of music learning and empowering everyone to express their unique self through music creation. With the fast development of machine learning and artificial intelligent algorithms, many of the manual tasks in music teaching and creating process can be effectively automated and/or deeply customised to individual creators.
This PhD project focuses on a well-know challenging problem in the field of music information retrieval (MIR) – automatically separating individual instrument tracks from raw audio data, which allows one to reverse the recording/mixing process of music creation and enables new possibilities for instrument-based content extraction, transcription and augmentation.
The PhD research project will focus on the following tasks and areas:
1) Gaining deep understandings of the state-of-the-art solutions for audio source separation found in both academic research and commercial applications
2) Developing, applying and optimizing analytical and deep learning models to improve on the performance and accuracy of audio source separation tasks, including vocal separation, and music instruments separation, with a focus on piano, drums and string instrument sounds
3) Applying the successfully trained models on audio data collected from noisy environment and/or far-field scenarios, and optimise their performance for specific application contexts
4) Experimenting transcription techniques on the source separated audio data for melody tracking, rhythm detection and MIDI/MPE conversion, collaborating with fellow PhD students and ROLI engineers
5) Generating both world-class publications and commercial impacts on ROLI’s software/hardware products
Supervisor: Prof Josh Reiss
This PhD topic aims to intelligently generate an automatic sound mix out of an unknown set of multichannel inputs. The input channels can be analysed to determine preferred settings for gain, equalization, compression, reverb, and so on. The research explores the possibility of reproducing the mixing decisions of a skilled audio engineer with minimal or no human interaction. It has application to live music concerts, remote mixing, recording and postproduction as well as live mixing for interactive scenes.
The justification of this research is the need of non-expert audio operators and musicians to be able to achieve a quality mix with minimal effort. Mixing music is a task which requires great skills, practice and can be sometime tedious. Currently, mixing consoles and audio workstations do not adapt to a different room or to a different set of inputs, and lack the ability to automatically taking mixing decisions.
For the professional mixing engineer this kind of tool will reduce sound check time and will prove useful in multiple music group and festivals where changing from one group to another should be done really quickly. Large audio productions tend to have hundreds of channels, thus being able to group some of those channels into an automatic mode will ease the mixing task of an audio engineer. There is also the possibility of applying this technology to remote mixing applications where latency is too large to be able to interact with all aspects of the mix.
There are many approaches to design and evaluation of such systems, including deep learning, knowledge engineering, grounded theory and psychoacoustic studies. PhD students pursuing this topic will have the freedom to take the approach that most inspires them.
This research topic builds on previous, successful work by researchers within the Centre for Digital Music, but is broad enough in scope that it could be taken in new and exciting directions.
Supervisor: Dr George Fazekas
This PhD explores the synergies between computational analysis of audio and the abundance of big music data available today. The broader aim is to discover and analyse patterns in music data to enable the study of how musical style develops and evolves over time, identify leaders, innovators and influencers in artist or listener communities, or facilitate the use of music to address societal challenges. For instance, can music bring together social media users and reduce the adverse effects of isolated communities and information bubbles? There are several key AI challenges in answering these questions. This includes how to use the available information effectively and how to deal with sparsity and uncertainty in data (see https://arxiv.org/pdf/1706.02361.pdf for some relevant challenges). While the Semantic Web is an excellent source of Linked Open Data about music artists, albums, tracks, styles etc., the data is incomplete and heavily biased by popularity. Data about listener communities is also sparse because of the varying degrees users and services provide access to personal information. The combination of different data sources and inference mechanisms can therefore become key in addressing these challenges. The PhD research will investigate: 1) How to compete knowledge graphs obtained from Semantic Web resources such as Wikipedia, DBpedia, Wikidata, etc. using other information sources such as musical features extracted from audio, music similarity, playlists or listening data. 2) How to infer new knowledge from the analysis of relationship patterns in music data, both at the level of instances (e.g. artist networks) and music ontologies (http://musicontology.com). A proposed approach will use Graph Convolutional Networks (https://arxiv.org/pdf/1609.02907.pdf) to obtain an embedding space in which new relationships between tracks or artists may be uncovered. These may stem from common musical patterns or other relationships, e.g., common appearance in playlists or common social tags. The new relationships will enhance the knowledge graph and may also yield generic musical knowledge encoded as rules or web ontologies to support logical inference. There is scope to explore both of these aspects of the topic. The research has industry relevant applications (e.g. music recommendation) or enable new musicological research in digital humanities.
Suggested supervisor: Dr Andrew McPherson
in collaboration with The OHMI Trust
As with any practiced skill, control of musical instruments is not achieved entirely consciously, but by kinaesthetic learning; by the development of unconscious intuition through repetition with the desired end always in mind. In the case of the trumpet, the unconscious mind, driven by conscious will, manipulates the embouchure (the formation of a player’s lips and tongue on the mouthpiece) to produce the desired sounds. These are then adjusted reflexively according to what is heard. The valves of a trumpet adjust the length of tubing and, thereby, the pitch of a harmonic sequence. The selection of notes within that sequence is achieved with simultaneous adjustments to the embouchure and use of valves.
A central element of this project will be to create an artificial embouchure. For the purpose of this research, it is assumed that the player does not have fine and repeatable control of the embouchure, hand and fingers. The control of the embouchure and note selection must then be aided by AI techniques to supply the required control signals. The player would provide crude inputs which the machine would learn to interpret into the exact desires of the player and transmit them to the artificial embouchure and note control system for their performance.
The skills required of the researcher are, then, digital signal processing and basic familiarity with machine learning and AI techniques. Musical experience (especially on a wind instrument) is desirable but not necessary.
Supervisor: Dr Mathieu Barthet
Mixed reality (MR) merges real and virtual worlds to produce new environments and visualisations where physical and digital objects co-exist and interact in real time. Mixed reality offers promising potential for live music to augment the experiences of audiences and performers and deliver new aesthetic narratives.
The emerging field of Internet of Musical Things sets the grounds for networked musical instruments and wearable devices with embedded intelligence to support co-located or remote audience-performer interactions. This PhD will investigate how AI can facilitate the design of smart musical instruments and immersive 3D content (visuals, audio) enhancing music making and listening in a live context. The candidate will establish and assess (i) methods to automate the production of creative metadata characterising performers’ musical expression, and (ii) mapping techniques for computer-generated immersive content on head-mounted mixed reality displays.
An edge computing approach will be followed by developing AI components for smart musical instruments and/or mixed reality devices. Technical and aesthetics challenges will be considered including mixed reality content personalisation, latency and scalability. Prototypes will be developed using embedded systems and the Unity cross-platform engine. These will be evaluated in live performance scenarios with performers and audiences.
Supervisor: Prof Simon Dixon
The ever-increasing accessibility of large music datasets makes it feasible to create data-driven models of musical styles using machine learning algorithms. Such models can be used to characterise the style(s) represented in the collection, increasing our understanding of the art form, or for computational tasks such as classification of new data, or for creative tasks such as the generation of new music in a similar style to that which was modelled. Building on the work of the “Dig That Lick” project, which focusses on the extraction of melodic patterns from jazz recordings, and a new partnership with a major jazz festival, this PhD will develop models of improvisational style, relating the melodic material to the underlying harmonic and rhythmic structure of the piece. The research will shed light on the relationship between individual and collective style, and the extent to which novelty is a feature of improvisation.
Suggested supervisor: Dr Mathieu Barthet
in collaboration with Sensing Feeling Ltd and Ovomind
Live music poses a challenge for affective computing due to the complex and multidimensional nature of perceptual modalities involved and social human factors. The production and reception of live music result in a wide range of intangible expressions which pertain to the emotional states of performers and audiences. These intangible expressions can be explicit (e.g. the strumming gesture of a guitar chord, an audience dancing and cheering), or implicit (e.g. an increase in heart rate), and highly depend on context and performance social norms. This PhD research aims to establish multimodal deep learning methods to predict the emotional states of audiences or performers dynamically over the course of live music performances based on computational representations of intangible expressions. A deep learning-based sensor fusion system will be developed by combining several modalities (e.g. visual, audio, inertial, physiological) using data captured with state-of-the-art Internet of Things (IoT) sensors. Studies will be conducted to investigate how to improve current visual emotion recognition based on camera sensors with audio (e.g. crowd sounds, music), inertial (linked to audience/performer motion), and/or physiological (e.g. electrodermal activity, pulse) attributes measured using wearables. Different models will be devised to infer emotional states at individual or collective (crowd, band) levels. Data captured with the sensors will also be used to better understand the correspondence between visual valence and arousal of audiences in a number of musical styles.
With support from Universal Music Group (UMG),the world leader in music-based entertainment, this PhD will investigate the use and development of advanced multi-modal machine learning models with music industry applications, leveraging the large amount of data available in the modern digital music ecosystem. The adoption of data-driven and machine learning based methods, often using deep learning, is already significant in many areas of the music industry such as music identification, music discovery, personalisation of fan experience, catalogue management etc. However, some challenges remain open. For example, many approaches are built upon limited and sometimes simplified representations of music. For instance, audio genre classification of global music collections is typically done using a single flat taxonomy, thereby disregarding hierarchy and local territories discrepancies. Moreover, typical machine learning models that have music industry applications tend to rely on a single type of data. For example, recommendations engines rely on music consumption data, and automatic music tagging systems rely solely on audio. There is now evidence to suggest that multi-modal models are a promising avenue for further development. In particular, by connecting data sources of different nature, we expect multi-modal models to have potential to help better understand, extract and analyse the structure and trends present in large, often unstructured, datasets. The PhD research will investigate: 1) How can multi-modal models help learn more complete, more relevant and more effective representations? 2) How can such models help extract more and/or better knowledge from large amounts of data? A proposed approach is the investigation of multi-modal machine learning models with a particular interest for approaches in which audio is one of the modalities, and deep learning. Examples of potential applications are: using multi-modal models to learn/discover latent structure in unstructured data (e.g. hierarchical genre/sub-genre classifier) or detecting leading indicators of trend (e.g. identify emergence of a new “sound”, genre or influencers). This research project will be supported by a unique collection of assets and co-supervised by UMG.
Supervisor: Dr Emmanouil Benetos
Music signals and music representations incorporate and express several concepts: pitches, onsets/offsets, chords, beats, instrument identities, sound sources, and key to name but a few. In the field of music information retrieval, methods for automatically extracting information from audio focus only on isolated concepts and tasks, thus ignoring the interdependencies and connections between musical concepts. Recent advances in machine and deep learning have showed the potential of multi-task learning (MTL), where multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks.
This research project will investigate methods for multi-task learning for music information retrieval. The successful candidate will investigate, propose and develop novel machine learning methods and software tools for jointly estimating multiple musical concepts from complex audio signals. This will result in improved learning efficiency and prediction accuracy when compared to task-specific models, and will help gain a deeper understanding on the connections between musical concepts.
Suggested supervisor: Dr Mathieu Barthet
in collaboration with Holonic Systems Oy.
The concept of smart city refers to the use of data and technology to enhance efficiency and quality of life factors for people living and working in the city. If applications of the Internet of Things (IoT) to smart cities have burgeoned in recent years, very little has been done to date to exploit the audio modality. The vision for musical smart city systems underpins customisable open data environments that enable users to experience urban nature, social and cultural life through perceptual coupling by mapping data to sound. Smart city musification could help humans make the IoT, and consequently their urban environment and activities, more understandable.
The field of auditory display investigates how to convey signals, make sense of data, and augment our experiences through the audio modality. Wireless sensor networks (WSNs) typically generate a vast quantity of data over time which can be hard to interpret especially over the course of time. Audio can be an efficient source of information and a medium for aesthetic narratives. Musical smart city systems aim to produce musical content related to places, local conditions and other nearby users, complementing or substituting visual information. Adopting a two-way model, local changes may trigger global (musical) phenomena, and vice versa, both on a personal and city-wide scale.
This PhD project will research, develop and assess methods for the musical sonification of web-based data streams generated by urban WSNs. AI will be used to map multidimensional multi-user sensor data to meaningful musical attributes through generative audio synthesis and/or audio samples (e.g. using Audio Commons, https://www.audiocommons.org/). Techniques for cross-modal sensor-to-sound mapping will be devised using deep learning considering different learning strategies (e.g. unsupervised, semi-supervised, reinforcement learning). The ability of the techniques to generate smart soundscapes providing information about environmental or contextual factors will be assessed. Use cases will be designed following human-computer interaction methodologies, addressing for example the musical sonification of real-time traffic data coupled with geographic information systems (e.g. Open Street Maps).
Supervisor: Prof Mark Sandler
The web contains millions of separate sources of information about all sorts of topics and music is no exception. Many of those sources are included in the Semantic Web, Tim Berners-Lee’s second important web invention, thanks largely to pioneering work in c4dm stretching back to 2006. This has culminated in the application known as MusicLynx (https://musiclynx.github.io/#/dashboard & http://www.semanticaudio.ac.uk/demonstrators/16-musicweb/). The important features of the Semantic Web are that it enables multiple, separate sources of information to be unified, and it supports AI over this information. MusicLynx provokes some very exciting possibilities when it involves acoustic features extracted from recordings. This PhD will focus on including more audio features as well as lyrics and related information to provide more compelling music experiences for amateurs and professionals. Skills needed for this PhD include, but are not limited to: audio feature extraction, Graph Theory, User Interaction, Data Science, though applicants are not expected to be familiar with all of these from the start of the project.
Suggested supervisor: Dr George Fazekas
in collaboration with Steinberg Media Technologies GmbH
This PhD will design and develop a music recognition engine capable of ingesting, optically correcting, processing and recognising multiple pages of handwritten or music from image captured by mobile phone, or low-resolution copyright-free scans from the International Music Score Library Project (IMSLP) outputting semantic mark-up identifying as many notational elements and text as possible along with the relationship to their position in the original image. Prior solutions have used algorithmic approaches; an opportunity exists to develop and evaluate approaches based on DNN and machine learning techniques. The PhD aims to investigate and demonstrate a novel approach to converting images of sheet music into a semantic representation, such as MusicXML. Musicians, composer, arrangers, orchestrators and other users of music notation have long had a dream that they could simply take a photo or use a scan of sheet music and bring it into a music notation application (such as Dorico) to be able to make changes, rearrange, transpose, or simply listen to being played by the computer. Previous approaches to solving the problem have involved layers of algorithmic rules applied to tradition feature detection techniques such as edge detection. State-of-the-art Optical Music Recognition (OMR) is already able to recognise sheet music with up to 95% accuracy, but fixing the remaining errors may take just as long, if not longer, than transcribing the music into notation software by hand. A new method that can improve recognition rates will allow users who are not so adept at inputting notes into a music notation application to get better results quicker. Another challenge to tackle is the variability in quality of input (particularly from images captured from smartphones) and how best to preprocess the images to improve the quality of recognition for subsequent stages of the pipeline. The application of cutting edge techniques in data science, including machine learning, particularly convolutional neural networks may yield better results than traditional methods. The same techniques may also prove useful in earlier stages of the pipeline such as document detection and feature detection. It would be desirable to achieve recognition rates close to 99% of individual objects in the score. One of the first objectives will be to establish the methodology for determining the differences between the reference data and the recognised data. The ideal candidate would have previous experience of training machine learning models using relatively sparse data sets. Being well versed in image acquisition, processing techniques, and computer vision would be a significant advantage.
Suggested supervisor: Dr George Fazekas
in collaboration with MUSIC Tribe Brands UK Limited
This PhD project is aimed at exploring the synergies between optimal sparse representation of audio signals, psychoacoustics and deep learning. The broader aim is to investigate and to develop novel end-to-end neural network architectures able to emulate low level human hearing tasks including: event detection, source separation, timbre characterization and instrument detection, taking advantage of adaptive and perceptual based signal transformation. The proposed approach will investigate:
1. the state of the art of harmonic analysis techniques for audio (multiresolution time-frequency and time-scale representation of audio signal);
2. how to use established psychoacoustics models to derive novel perceptual based representations of audio signals ;
3. how to use end to end learning techniques to develop novel decomposition of audio signals based on multiresolution analysis and psychoacoustics, taking advantage of the computational efficiency of neural networks;
4. explore applications of the developed methods for various audio taks like instrument classification, source separation, event detection, timbre classification .
The research has many industry relevant applications such as intelligent music production and audio transformation. The ideal candidate has good signal processing, machine learning, psychoacoustics background and experience in music production.
Supervisor: Dr Marcus Pearce
This project will use a combination of computational modelling and empirical experiments with human participants to understand how listeners learn the syntactic structure of a musical style and how this learned syntax impacts on music perception and aesthetic experience of music. Our existing research suggests that statistical learning and probabilistic prediction are fundamental processes in music cognition. Predictions about musical events reflect a hierarchical process of top-down prediction at a range of different time-scales based on a learned syntactic model of the sensory environment. The model is acquired adaptively through exposure and the discrepancy between predicted and actual sensory input (the prediction error) is used to drive learning. Predictions are thought to be precision-weighted such that very certain (low entropy) predictions gain a higher weight than less certain (higher entropy) predictions. Within this general approach, there is scope to focus the project on different musical parameters (e.g., melody, rhythm, harmony), different computational methods (e.g., structured probabilistic models, empirical Bayesian methods, neural networks) and different musical styles and cultures. The overall goal is to develop a complete computational model of cognitive music processing from low-level audio to high-level symbolic structure, incorporating models of auditory stream segregation, memory representations for music and inference of latent variables such as tonality, metre and form. There is also scope within this framework to investigate the structure of non-musical auditory sequences.
Barascud, N., Pearce, M. T., Griffiths, T. D., Friston, K. J., & Chait, M. (2016). Brain responses in humans reveal ideal observer-like sensitivity to complex acoustic patterns. Proceedings of the National Academy of Sciences, 113, E616-E625. https://doi.org/10.1073/pnas.1508523113
Pearce, M. T. (2018). Statistical learning and probabilistic prediction in music cognition: mechanisms of stylistic enculturation. Annals of the New York Academy of Sciences, 1423, 378-395. https://doi.org/10.1111/nyas.13654
Supervisor: Dr Andrew McPherson
Steady increases in embedded computing power and the availability of high-resolution low-cost sensing have led to a flourishing of digital musical instrument designs. However, the time and difficulty of learning a new instrument mean that relatively few new instruments go on to be performed at expert levels. This project proposes an intelligent signal processing approach to designing new instruments which repurpose the existing skills of trained instrumentalists. Taking the violin family as a case study, AI and machine learning techniques will be used to create real-time models of bow-string interaction, and features from these models will control digital synthesis. The project will work closely with violinists and composers to assess the playability and creative potential of the new instruments.
Supervisor: Dr Nick Bryan-Kinns
Music making is a rich domain in which to explore creativity, engagement, and Human-Computer Interaction (HCI). Manipulating the Interaction Design of Digital Musical Interfaces (DMIs) provides opportunities for systematic evaluation of the effect of user interface features on people’s creativity and engagement with and through music making. However, such evaluation is a non-trivial challenge given the subjective nature of music and creativity and the inherently experiential nature of music making which reduces the value of conventional efficiency measures. There is the potential to use measures of people’s interaction with DMIs and to assess features of the creative products themselves to evaluate participants’ creativity, engagement, and user experience, in addition to self-reporting by people. This PhD would research the use of AI and Machine Learning techniques to analyze logs of people’s interaction DMIs to identify correlations between people’s subjective feedback, their patterns of interaction with the DMI, and DMI user interface configurations. The PhD student would need skills in AI and Machine Learning as well as some skills or interest in HCI user study techniques and DMI design and build.
Suggested supervisor: Dr Andrew McPherson
in collaboration with MXX
It has been established in Wiggins et al.  that “music, in its own right, does not exist”. Using this principle, Lyske  has established a method of meta-composition using generative symbolic primitives that can be tagged by emotional meaning. Based on the principles established in Wiggins  and on the framework proposed by Lyske , investigate the artificially intelligent automation of the job of composer/analyst. This research will establish how, and to what extent, can the principles of composition be inferred from the artefacts of scores and recordings given a symbolic, mind-based, generative theory of composition.
 Wiggins, Geraint A., Daniel Müllensiefen, and Marcus T. Pearce. “On the non-existence of music: Why music theory is a figment of the imagination.” Musicae Scientiae 14.1_suppl (2010): 231-255.
 Meta Creation for Film Scores (Joe Lyske PhD thesis)