The increased use of machine learning models in many different areas of public life has led to concerns about biases and unfairnesses that the use of machine learning models can introduce.  In machine learning systems, bias-mitigation approaches aim to make outcomes fairer across privileged and unprivileged groups. Privileged groups enjoy excessive positive outcomes (e.g., approval of credit requests) based on model outputs, while unprivileged groups suffer excessive negative outcomes  (e.g., decline of credit requests) based on model outputs.  Bias-mitigation methods that seek to mitigate these biases can be effective, but have known “waterfall” effects, e.g., mitigating bias at one place may manifest bias elsewhere. In our work, a collaboration between the SFI Centre for Research Training in Machine Learning and IBM, we aim to characterise impacted cohorts when mitigation interventions are applied. To do so, we treat intervention effects as a classification task and learn an explainable meta-classifier to identify cohorts that have altered outcomes. We examine a range of bias mitigation strategies that work at various stages of the model life cycle. We empirically demonstrate that our meta-classifier is able to uncover impacted cohorts. Further, we show that all tested mitigation strategies negatively impact a non-trivial fraction of cases, i.e., people who receive unfavourable outcomes solely on account of mitigation efforts. This is despite improvement in fairness metrics. We use these results as a basis to argue for more careful audits of static mitigation interventions that go beyond aggregate metrics.

Summit 2023
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.