Large-Scale Pairwise Preference Study

Snapshot 1 of the pairwise study — Snapshots of the raw data collected during the study.

Abstract

I led a pairwise preference study to rank 971 Minecraft-related items according to the community's preferences. Then, I worked in a team to create a website to present participants with a selected item pair and ask which item was 'better;' by leaving the criteria for 'better' open to interpretation, the study aimed to assess the community's holistic sentiment rather than specific utility. I led community outreach, ultimately engaging over 15,000 volunteer participants in the study. To determine the ordering, I aggregated the pairwise comparisons and ranked items based on their normalized win rates across all trials. In spite of the ambiguity of the prompt, the data converged to a clear consensus, successfully identifying the hierarchy of perceived value in the game's ecosystem.

Key contributions

Conceived and led the study design, defining the research question and overall methodology.
Analyzed SQL database by normalizing win rates, identifying invalid data, and exporting final results.
Coordinated community outreach: contacted community ambassadors, posted calls for participation on major online platforms, and leveraged creator amplification.
Scraped the official wiki to gather candidates and filtered collected items.

Methods

This project aimed to identify every 'thing' in the video game Minecraft, and discover the perceived relative value of each item. To do this, I used a pairwise comparison study to query thousands of participants and collect enough data to confidently rank each item.

First, I used Selenium, a Python library, to scrape the game's wiki site for the titles and relevant metadata of each article. This produced a list of 9,000 articles, which I algorithmically reduced to 2,000 by removing articles about specific employees, sub-versions, and other identifiable categories that did not fit the desired data set. I then manually reduced it to 971 items, keeping those that could be logically evaluated in a pairwise comparison.

I led the development of the interactive site: a platform that served users a pair of items from the data set and presented the option to pick either one as 'better'. I worked in a team of developers to build the website using Flask and React, with Python handling the core data pipeline. To support the study's 15,000+ participants, I hosted the application on a dedicated Linode server, configuring the environment to handle high traffic and ensure consistent uptime while the data was being collected.

Due to the high number of items being evaluated, an exceedingly large sample size of datapoints was necessary to produce confident results. To realize these goals, I reached out to the Minecraft community through a variety of channels; notably, a few posts on major social platforms garnered hundreds of thousands of views, and a popular creator agreed to showcase the project to their audience. The data collection period concluded after seven days; the project amassed 417,702 total votes across dozens of countries worldwide.

I wrote SQL scripts to filter out datapoints flagged as suspicious or invalid and used Python to calculate a normalized win-rate metric used to rank all 971 items relative to one another. The data was then exported to a spreadsheet and released for community review on social platforms as an informal validation and bias-checking step, ensuring transparency in the data collection process.