Research
My research “Human-Centered AI” aims to develop human-centered computing methods and tools for building trustworthy AI. Tackling the trustworthiness issue of AI requires new methodologies to address both the epistemic and ethical challenges: what the machine needs to know and what purpose it should serve. My proposition is humans in the loop, that is, including humans as computational agents to be an essential component of the computational system. This includes involving humans as part of the AI system for explaining, diagnosing, and enhancing machine learning models, and as part of the hybrid human-AI decision making pipeline that calls on people at run-time to align machine behaviors with human values.As contributions, my research develops principled approaches and practical tools for
My research methodology is both empirical and theoretical, with primary activities characterized by the design, implementation, and analysis of human studies, computational algorithms, and human-in-the-loop systems. I particularly value the collaboration with scientists from other disciplines and experts from various application domains, together with whom my research team develop usable tools to make real-world impacts.
Projects
[Model Debugging] ARCH: Know What Your Machine Doesn't Know
Despite their impressive performance, machine learning systems remain largely unreliable in safety-, trust- and ethically sensitive domains. Recent discussions in several subfields of AI have reached a consensus of knowledge need in machines, but few discussions have touched upon the diagnosis of what knowledge is needed. This project aims to develop human-in-the-loop methods and tools for the diagnosis of machine unknowns. We consider humans to be essential in understanding the knowns and unknowns of intelligent machines, through human interpretation of machine behaviour and creation of knowledge requirements. We also see computational algorithms as vital tools that can assist humans in knowledge reasoning at scale, under uncertainty.Knowing machine unknowns is essential in any context both for making AI (debugging the machine) and for using AI (deciding when to trust the machine output). We envision that this project will have a tremendous scientific and practical impact, across all areas where AI and machine learning are applied.
[Model Explanation] SECA: Know How Your Machine Works
State-of-the-art machine learning models employ neural models that generally operate as “black-boxes”. The opaqueness of these models has become a major obstacle for deploying, debugging, and tuning them. To understand how those models work, it is essential to explain model behaivor in human-understandable language. This project aims to introduces a scalable human-in-the-loop approach for global interpretability of machine learning models. We employ local interpretability methods to highlight salient input units and leverage human inteligence to annotate such units with semantic concepts. Those semantic concepts are then aggregated into a tabular representation of images to facilitate automatic statistical analysis of model behavior. Our approach supports multi-concept interpetability need for both model validation and exploration.[Data Creation] OpenCrowd: Large-scale People Engagement for Data Creation
Large-scale training data is the cornerstone of machine learning. Creating such data is a long, laborious, and usually expensive process, especially when it requires domain knowledge. Crowdsourcing provides a cost-effective way to find in a short time a large number of contributors, who as a whole possesses a broad knowledge in specific domains. This project aims to develop OpenCrowd, a large-scale expert finding and engagement platform for training data creation. The platform models relevant participant properties such as expertise, location, and cultural background, and employs peer routing techniques to scale out the finding for experts. It employs peer grading techniques for effective and efficient outcome aggregation and assessment. OpenCrowd further allows for steering the data creation process towards the preference over certain data properties using only a small amount of seed data.[Data Denoise] Scalpel-CD: Reducing Label Noise in Training Data
The success of machine learning techniques heavily relies on not only the quantity but also the quality of labeled training data. Incorrect labels in the training data are generally difficult to identify and have become a main obstacle for developing, deploying, and improving machine learning models. This project introduces Scalpel-CD, a first-of-its-kind system that leverages both human and machine intelligence to debug noisy labels from the training data of machine learning systems. Our system identifies potentially wrong labels by exploiting data distributions in the underlying latent feature space, and employs a data sampler which selects data instances that would benefit the most from being inspected by the crowd. The manually verified labels are then propagated to similar data instances in the original training data by exploiting the underlying data structure, thus scaling out the contribution from the crowd.Scalpel-CD is designed with a set of algorithmic solutions to automatically search for the optimal configurations for different types of training data, in terms of the underlying data structure, noise ratio, and noise types (random vs. structural).
[Data Debiasing] Perspective: Identifying and Characterizing Data Atypicality
High-quality data plays an important role in developing reliable machine learning models. Incompleteness of training data or inherent biases can lead to negative and sometimes damaging effects, particularly in critical domains such as transport, finance, or medicine. It also implies that high-quality test data is vital for reliable and trustworthy evaluation. Despite that, what makes an item hard to classify remains an unstudied topic. This project aims to provide a first-of-its-kind, model-agnostic characterization of data atypicality based on human understanding. We consider the setting of data classification “in the wild”, where a large number of unlabeled items are accessible, and introduce a scalable and effective human computation approach for proactive identification and characterization of atypical data items.Our approach enables the creation of a feedback loop in the lifecycle of an machine learning model, thereby enabling a never-ending learning scenario where model performance can continuously improve. It is effective, cost-efficient, and generic to any machine learning tasks.
[Systemic Evaluation] ValuableML: Co-Design of Value Metrics for ML
For decades, the primary way to develop and assess machine learning (ML) models has been based on accuracy metrics (e.g. precision, recall, F1, AUC). We have largely forgotten that ML models are applied in an organisation or societal context because they provide value to people. This leads to a significant disconnection between the amazing progress of ML research – with corresponding sky-high expectations of professionals in any field – and the limited adoption of ML. We see the need for new value-based metrics for the development and evaluation of ML models. These will cater to the actual needs and desires of users and relevant stakeholders, and will be tailored to the cost structure in specific use cases.This project aims to introduce proper value metrics and processes. We will create them by answering two fundamental questions: what makes a model 'good'? and what is the value of a model? We use a co-design methodology emphasising the importance of involving stakeholders in the creation of metrics, so they represent the collective interest of all involved.