Pricing in Large Scale Data and Model Markets: Models, Fairness, and Scalability
Data markets are emerging and promising for harvesting data from many data owners to support data-driven AI applications and many second-uses of big data. Pricing plays a central role in data markets. In this talk, I will survey the motivations and the state-of-the-art practice of data and (machine learning) model markets, and review data pricing in end-to-end data analytics and machine learning pipelines. Then, I will focus on models, fairness, and scalability of data pricing using some well established solution concepts in cooperative game theory, such as Shapley value. As a principled approach, I will illustrate that with some simple yet practical assumptions about the utility of data products, assessing exact Shapley value of millions of products and tens of owners is highly practical. I will also demonstrate the challenges in modeling and computing fair reward allocation in one-shot cooperative machine learning processes, such as federated learning, as well as in building privacy preserving model marketplaces.
Jian Pei is Professor at Duke University. His research focuses on data science, data mining, database systems, information retrieval and applied machine learning. His expertise is on developing effective and efficient data analysis techniques for novel data intensive applications, and transferring to products and business practice. He is recognized as a Fellow of the Royal Society of Canada (Canada's national academy), the Canadian Academy of Engineering, ACM and IEEE. He received several prestigious awards, including the 2017 ACM SIGKDD Innovation Award, the 2015 ACM SIGKDD Service Award, and the 2014 IEEE ICDM Research Contributions Award. He was a past chair of ACM SIGKDD and a past EIC of IEEE TKDE.