Ranking for Decision Making: Fairness and Usability

Kuhlman, Caitlin A.

Etd

Ranking for Decision Making: Fairness and Usability

Public

Today, ranking is the de facto way that information is presented to users in automated systems, which are increasingly used for high stakes decision making. Such ranking algorithms are typically opaque, and users don’t have control over the ranking process. When complex datasets are distilled into simple rankings, patterns in the data are exploited which may not reflect the user’s true preferences, and can even include subtle encodings of historical inequalities. Therefore it is paramount that the user’s preferences and fairness objectives are reflected in the rankings generated. This research addresses concerns around fairness and usability of ranking algorithms. The dissertation is organized in two parts. Part one investigates the usability of interactive systems for automatic ranking. The aim is to better understand how to capture user knowledge through interaction design, and empower users to generate personalized rankings. A detailed requirements analysis for interactive ranking systems is conducted. Then alternative preference elicitation techniques are evaluated in a crowdsourced user study. The study reveals surprising ways in which collection interfaces may prompt users to organize more data, thereby requiring minimal effort to create sufficient training data for the underlying machine learning algorithm. Following from these insights, RanKit is presented. This system for personalized ranking automatically generates rankings based on user-specified preferences among a subset of items. Explanatory features give feedback on the impact of user preferences on the ranking model and confidence of predictions. A case study demonstrates the utility of this interactive tool. In part two, metrics for evaluating the fairness of rankings are studied in depth, and a new problem of fair ranking by consensus is introduced. Three group fairness metrics are presented: rank equality, rank calibration, and rank parity which cover a broad spectrum of fairness considerations from proportional representation to error rate similarity across groups. These metrics are designed using a pairwise evaluation strategy to adapt algorithmic fairness concepts previously only applicable for classification. The metrics are employed in the FARE framework, a novel diagnostic tool for auditing rankings which exposes tradeoffs between different notions of fairness. Next, different ways of measuring a single definition of fairness are evaluated in a comparative study of state-of-the-art statistical parity metrics for ranking. This study identifies a core set of parity metrics which all behave similarly with respect to group advantage, reflecting well an intuitive definition of unfairness. However, this analysis also reveals that under relaxed assumptions about group advantage, different ways of measuring group advantage yield different fairness results. Finally, I introduce a new problem of fair ranking by consensus among multiple decision makers. A family of algorithms are presented which solve this open problem of guaranteeing fairness for protected groups of candidates, while still producing a good aggregation of the base rankings. Exact solutions are presented as well as a method which guarantees fairness with minimal approximation error. Together, this research expands the utility of ranking algorithms to support fair decision making.

Creator