- Deep Active Learning from Targeted Crowds (WWW 2018)
Machine learning models have been increasingly used for knowledge creation and decision making across various domains. Training these models typically requires data annotation as performed by humans. The entire process can be viewed as intelligence transfer from humans to machines, which complements machine scalability with human intelligence, thereby enabling machine learning models to solve real-world problems.
Crowdsourcing provides a convenient means for data annotation at scale. The current notion of crowdsourced data annotation assumes crowds to be anonymous and disposable and tasks to be of low complexity. Such assumption, however, does not hold for knowledge-intensive or subjective tasks, for which only a certain group of crowds (often of limited size) are suitable for data annotation. On the other hand, the growing complexity of machine learning models (e.g., deep learning models) has largely increased the demand for data annotation both in terms of quantity and quality. Such a gap greatly hinders the application of crowdsourced data annotation for effectively training machine learning models to solve complex tasks.
We propose to close the loop between machines and humans, such that the utility of each single data annotation can be optimized towards the end goal of training machine learning models with maximal efficacy. To this end, we study the following different yet closely connected components of human-machine loop systems:
Our goal is to show that by optimally designing each of the components above, we will be able to accelerate knowledge transfer from humans to machines in a principled and effective way. In the lists below, you can find our representative publications for the four components.