Knowledge Discovery and Data Mining Conference 2017
One of the few top venues that bring together to researchers and practitioners from data science, data mining, knowledge discovery, and big data is the SIGKDD Conference on Knowledge Discovery and Data Mining (known as KDD). Last month, I had the pleasure to attend the 23rd edition of this conference that was held in Halifax, Canada. In this and one more following post, I’ll share some of the highlights of my experience there.
In this first part of conference memories, I want to highlight two human aspects of machine learning: influence and bias. My purpose is to show you how important is to think about these aspects when preparing a model.
Influence and bias
The influence aspect is relevant when modeling crowdsourced data and social networks data. Regarding crowdsourced data, ML practitioners usually assume that crowdworkers (known as turkers for Amazon Mechanical Turk) work in an influence-free manner. Yet, it is not hard to unveil some social networks among turkers as blogs, chat groups and others. The main disadvantage of this happening is that researchers crowdsource data because they want to know information from many people as individuals, and influence sabotage this purpose. For some applications as labeling, the consequences of influence among crowdworkers may not be important; however, they are important for other applications as in behavioral studies, in which conclusions driven from this data may be misleading. On the social networks side, there are many efforts in understanding what features makes an ad be successful. Underestimating or not considering the effect of influence in advertisement models may a gross mistake. In conclusion, it is advisable to consider this human aspect when working with these kinds of data.
The other human aspect of machine learning that I want to briefly discuss is the human bias. We can grasp the importance of considering human bias by imagining for a moment ourselves developing a machine learning model to impart justice. Our task would be translated to emulate the decision-making of a judge given a set of evidence. Now, let’s remember for just a moment how many times we have seen in the news people disputing the decision made by a judge. Did it? Then, you should agree with me that justice is a very relative matter and law is subject to interpretation. This is just an example on how learning human decision brings challenges to the ML practitioner.
A novel topic in machine learning that subsumes bias problems along with other societal issues is Fairness, Accountability, and Transparency (FAT-ML). One work that recently put this topic in the headlines was the COMPASS recidivism algorithm released last year, which ideally would assist justice decisions by predicting the risk of defendants to reoffend. This algorithm made very different predictions when only the feature related to race was changed, thing that obviously was considered as biased decision by many. Further discussion on this topic will be exciting opportunities to open paths for new applications in medicine, education, social justice, and policy making. In addition, FAT-ML concerns about privacy issues and ethics in the use of ML models. Finally, I want to mention that there is an increasing interest on FAT-ML which is clearly reflected in the number of new workshops and conferences around this topic. It’s an exciting moment to join this growing community!
Thanks for reading! In the following second part of my experience, I’ll talk on more technical stuff.