The following project opportunities are open for Scottish businesses or public sector organisations interested in collaborating with Scottish Universities and benefit from an applied research project with a doctoral student. Details on the next call and how to apply for funding. If any of the following projects are of interest to you or your organisation, please get in touch.
FABULON: Faster Data Analytics through Better Language-level Management of NUMA
Currently most software is constructed in programming languages like Dart, Java, Go and Scala that use automatic dynamic memory
management. Many applications, especially in data analytics and AI, are memory intensive, placing significant demands on the memory management system. A second challenge for memory managers is that memory access latency and bandwidth is becoming increasingly non-uniform as the number of general purpose cores in architectures continues to grow.
While there are good hardware and operating system tools for
understanding non-uniform memory access, programming language implementations have far more information about the memory access patterns of an (analytics) program than the operating system, and the ability to dynamically adapt memory management. Hence they are better placed to both profile and optimise access to non-uniform memory. Unfortunately there has been little systematic study of the impact of Non-Uniform Memory Architectures (NUMAs), and especially emergent larger NUMAs, on automatic memory management, and many modern language implementations have limited NUMA adaption.
High-performance data analytics is an important application domain, and increasingly executes on shared-memory NUMA servers. Our research shows that poor NUMA locality can degrade performance on state-of-the-art machines by a factor of 3x. Moreover current techniques for improving NUMA performance have neither been systematically combined, nor scaled to the massive data volumes in modern data analytics applications. In consequence there are significant potential benefits for data analytics applications on server-size NUMAs.
The FABULON project aims (1) to reach a deep understanding of the performance challenges posed by emergent mid-size NUMAs for modern language implementations and (2) to investigate how to exploit this knowledge to develop better language implementation technologies and (3) evaluate the effectiveness of the technologies to improve the performance of real-life data analytics applications.
Collaboration Sought for the project: We are looking for an industry partner (1) with a compute intensive data analytics application (2) that runs on a managed runtime-environment on server-size NUMA architectures and (3) is interested in improving the performance. We envision a collaboration on the characteristics of the application and related workloads, to tune our technologies and improve performance.
Benefit to the Industrial Sponsor: The sponsor will benefit from (1) expert code review of their application; (2) faster execution of the application on large NUMA architectures; (3) faster execution of other applications with similar characteristics. That is, as our techniques operate on the runtime-environment, the improvements directly apply to other applications with similar workloads.
Contact the project supervisors, Hans-Wolfgang Loidl, Heriot-Watt University; Phil Trinder, Jeremy Singer, Glasgow University for more information.
Data matching in absence of unique identifiers for enhancing predictive modelling
In today’s data driven economy, a massive volume of data opens new opportunities for combining different databases to enhance the accuracy of different predictive models, such as predicting the cross-selling opportunities or customers’ churn. Unfortunately, often there no unique identifiers that can be used to link different sources of information together or they cannot be used for privacy preserving reasons. This limits the data that can be used for model training.
There are various approaches that have been developed in different fields of knowledge to overcome the problem, e.g. propensity score matching in statistics and econometrics or fuzzy matching in computer science. Propensity score matching (PSM) is a collection of statistical algorithms that estimates the missing or unobserved outcome or some aspect of behaviour resulting from some intervention. Fuzzy matching combines numerous variables to create approximate matches which then can be evaluated and ranked.
Nevertheless, there is no investigation as to advantages each approach can offer, and how they can be adapted to suit specific problems. This PhD project will close this gap by providing a comprehensive comparison of existing methods and proposing a new methodology that will incorporate their strengths, in particular by building on Dr Andreeva’s work on clustering combined with collaborative filtering.
Collaborative filtering is the algorithm widely used in recommender systems, e.g. Amazon or Netflix, where new products are offered based on the similarity of a new user to existing users. Clustering splits the data into homogeneous groups and can potentially reduce the time and effort associated with collaborative filtering.
There are various measures of similarity and various clustering approaches that can be evaluated in the context of a specific task/problem. The innovation consists in combining the benefits of existing approaches into a new algorithm.
The project will use vast amounts of data, and data integration/ management is one of the key problems of Data Science. The outcomes will be useful to all businesses/ public organisations that rely on predictive modelling in their operations. It will improve the accuracy of predictive modelling, which in turn will lead to increased efficiency and enhanced customer satisfaction.
Collaboration sought: An industrial partner should be willing to offer some proprietary data for analysis that can be matched to other sources. Some sources can be public. The data can be protected by non-disclosure agreement, incorporating the partner’s preferences – this is a standard practice for Dr Andreeva’s work. There is a slight preference for financial services, given Dr Andreeva’s expertise in this field, but any other sectors are welcome.
Benefit to the industry sponsor: The project will develop a tailored solution to suit any needs/problems of the industrial partner. The examples include but not limited to improving the estimates of financial risk, cross-selling opportunities, predicting which customer are most likely to switch to a competitor. The sponsor will get a first-hand access to any project results ahead of any public presentations or publications. There is a possibility of redacting the project outputs in order to preserve the confidentiality and commercial interests.
Contact the project supervisor, Dr Galina Andreeva for more information.
Predicting DNA variants causal for altered disease risk using Machine Learning
Genetic studies reveal causal links between DNA and disease risk. However, such links are to DNA variants and not to genes, and do not reveal the molecular mechanisms of disease. Our group is combining machine learning applied to Transcription Factor (TF) binding with Mendelian Randomisation in order to pinpoint individual DNA variants that alter both (1) TF binding affinity and (2) disease risk, and are thus causal. In this project, we seek to apply machine learning, as applied to experimental TF binding data, in order to improve the precision by which TF binding affinity is inferred. This is an essential step toward predicting variants causal of disease risk change.
Collaboration sought: We are looking for either a drug development company (which is interested in investigating the genetic support for drug targets) or an AI/ML company (which is interested in applying new AI/ML approaches to functional genomics data.
Benefit to the industry sponsor: Dialogue and two-way engagement with a research group working at the fertile interface between population size data analysis (e.g. UK Biobank) and functional genomics data (e.g. transcription factor binding data in human primary and cancer cell lines).
Contact the project supervisor, Chris Ponting for more information.
Economic feasibility and environmental impacts of bioenergy in supporting net-zero energy building (NZEB+Bio) in the UK
The energy consumed by a net-zero energy building (NZEB) is as much as the renewable energy generated onsite or elsewhere. It is expected that NZEB will play an important role in mitigating greenhouse gas (GHG) emission and has received significant attention in recent years. Biomass accounts for around 12% of the world’s renewable energy resources. Distributed bioenergy production serves as a potential way of fulfilling NZEB.
It is important to understand the economic feasibility and environmental impacts of bioenergy on the design of NZEB. This project will design a novel configuration of bioenergy-supported NZEB and will decide the profitability and carbon footprint of the configuration using big-data supported cost-benefit analysis and life cycle assessment. The results will enable policymakers to make informed decisions for the fulfilment of NZEB in the UK.
The project is looking for an industry sponsor in the sector of (not limited to) sustainable/green building development and design, bioenergy technology development, or distributed bioenergy application, that will could potentially provide input data on the design of bioenergy-supported net-zero energy building (NZEB). The partnership will enable the PhD candidate to receive training from an industry supervisor and to design a bioenergy-supported NZEB configuration driven by future building industry standards and market demands.
Recovering losses: transfer and dictionary learning for restoring damaged radar/RF data
Novel techniques from the machine learning community can potentially help restore these missing data based on manipulation of data and knowledge previously acquired (like people, able to recognise a known or expected face even if part of it is covered by a hat), and keep acceptable performance.
The student will perform a mix of software development and experimental validation work in this project. Novel implementation and adaptation of these techniques to the specific format and characteristics of radar data will be needed, and these will have to be validated with experimental work using the radar and software defined radio platforms available in the research group.
Mining Arguments from Natural Language Text
Giving machines the ability to understand natural language has been an AI goal for decades. A recent research direction in this area has focussed on “Argument Mining”. This is the automatic identification, extraction, and reuse of arguments from textual resources.
This project will involve a detailed study of the structure of natural language arguments from the industrial partners domain, with the aim of devising new and effective computational mining techniques. The successful candidate will be expected to further focus their project, and may choose for example, to focus on the effective application or extension of existing natural language or machine learning techniques applied to the argument mining domain of the industrial partner.
The core research themes of this project would be to:
- develop & evaluate automated argument mining techniques that can be applied to real-world problems
- extend extant tools for manual argument analysis through the addition of automated mining features so that they can be applied at scale to the creation of training data for supervised machine learning approaches
- research novel techniques for visualising and presenting mined argumentative data to support sense-making of the target domain.
Contact the project supervisor Dr Simon Wells for more information.
A unified approach based on semantic models and continuous deep learning to data uncertainty and inconsistency in smart IoT systems
Smart IoT-based Applications, such as smart city and smart factory, are characterized as sensor-driven technology, which has the tendency of producing huge volume of data with increasing velocity. The resulting data produced by these applications are mostly used to support organisation, planning, interpretation and decision-making activities. However, these data come with a number of quality issues that collectively results in uncertainties and inconsistencies.
In this project, we aim to innovatively integrate semantics-based data modelling and analysis with continuous deep learning to provide a novel effective solution to the above problem.
The semantic data model will provide a machine-understandable foundation for the IoT data and its analysis, and will be able to produce near real-time solution for the detection and correction of IoT data uncertainties. However, this model may be static and imprecise to cope with the highly dynamic nature of IoT systems and the data they have been generating. Therefore, we propose to use deep learning to support the continuous evolution of the semantic model and its data analysis algorithms.
Collaboration sought: We are looking for an industrial partner in the following area(s):
- 1. Provider of smart IoT applications, e.g. smart city, smart building, smart factory, smart transport, smart vehicle, etc;
- 2. Developer of smart IoT Applications, e.g. smart city, smart building, smart factory, smart transport, smart vehicle, etc;
- 3. Company specializing in data modeling and analysis;
- 4. Company specializing in smart sensors, IoT networks and devices;
Contact the project supervisor Prof Xiaodong Liu for more information.
Artificial Intelligence Based Communication System for Collaborative and Fault-tolerant Multicast Music Distribution
Advances of data communication make it easy today for a group of users to work remotely and collaboratively and produce rich multimedia content and distribute it to a large audience in the Internet. IP multicast constitutes an effective communication method that saves both the network bandwidth and the processing overhead especially when different sources are involved. However, real-time communications are highly sensitive to packet loss. This is especially the case for live music concerts. To address this issue, several traffic engineering and fault tolerance approaches have been devised. These techniques include and not limited to audio video compression, networking buffer management, queuing algorithms, traffic classification and prioritisation, etc.
The artificial intelligence could bring another level to improve the reliability of the transmission especially when multiple sources are involved in one single broadcast application. By monitoring the communication pattern and the network performance, artificial intelligence processes could be introduced in the communication framework to address any audio/video quality degradation or loss. A possible solution consists of creating virtual packets inside the network infrastructure or inject artificial made ones at the user’s end to replace missing critical data packets.
This thesis project aims therefore to explore how artificial intelligence and deep learning techniques could be applied to improve the reliability and the quality of multicast distributed concerts where a set of musicians collaborate remotely to record or play an album. The new proposed techniques could be embedded into the advanced audio-visual streaming technology LOLA that has been developed by Edinburgh Napier University and tested with musicians in Edinburgh, London, and Boston [ https://www.napier.ac.uk/about-us/news/word-first-for-transatlantic-real-time-album-recording]
Collaboration sought: A multimedia content publisher or distributor is required. Their expertise in audio/video compression, transmission over IP network will help to develop new reliability and quality of service techniques to broadcast real time multimedia content to a large audience.
Contact the project supervisor Dr Imed Romdhani for further information.
Smart algorithms to solve large-scale optimisation problems
Optimisation problems can be found everywhere. Examples include: finding good parameters for a model or process, scheduling and logistics, resource allocation, or finding the shortest paths for a vehicle. Sometimes there is more than one goal (such as improving both monetary cost and efficiency), and here the optimisation problem is about finding the trade-off between these goals so an informed choice of solution can be made. These problems are usually also rooted in an underlying data set, capturing the specifics of the application (e.g. databases of orders that need satisfied, resource demand over time, or regional-scale maps of locations).
The core research themes of this project would be to:
(1) devise methods to intelligently search through possible answers to an optimisation problem; exploiting what human experts already know about the problem; and specifically for large-scale problems how to break the space of possibilities down to make it easier to solve
(2) develop approaches for communicating the answers to large-scale problems in an intuitive way
(3) research ways of explaining why particular solutions were chosen.
Contact the project supervisor Sandy Brownlee email@example.com for more information.
Realising a Flexible Quality Framework for Managing Data Assets
This project will explore the following questions:
– How can data veracity measures (metrics) be encoded and enacted within a data ecosystem?
– How can data provenance be used to support new forms of veracity checking and anomaly detection?
– How can data policies be framed to reason about data veracity, and recommend appropriate decision-making actions?
Transparent & Accountable Data Management for the Internet of Things
Building on an existing portfolio of research into data transparency and provenance, the proposed project will examine the following questions: What characteristics of IoT devices and their behaviours are necessary to formulate a model of transparency? How do we represent norms against which devices (and the ecosystems of which they are a part) can be held to account?
Using data science to understand the food system
In this project I would propose to look at available data on some aspect of the food system and see if we can use that to inform modelling and make predictions about how to improve or optimise the system. The area could be at the food production end of industry or the retail end. It depends on who has got data that they think they could make more of, in terms of using it to make forecasts about yield, or sales, for example.