Co-Creative Data Sets

A series of co-creation workshops to teach machine learning literacy.

My Roles

Prepare & Host Workshops
Pre-Process Data Sets
Train ML models

Outcome

A workshop with Highschool Students ↓
at the SFU Summer School '23
6 Data Sets with up to +34 000 Images
6 trained styleGAN2 models

Keywords

ML & AI Literacy

Co-Creation

Co-Speculation

Generative Machine Learning

Style GAN's

Training Data Sets: Challenges and Potential

Creating ethical AI tools is a difficult task. Various harmful effects, such as mass extraction, labeling, bypassing consent, and the loss of context, have been recorded in recent research [2,3,10]. Other work investigates explicitly how racist prejudice and misogyny are encoded into training data sets that subsequently inform machine learning models [5,8,9,11]. Moreover, not only does misclassifying categories cause harm, but also the failure to include marginalized voices altogether [1,6].

⚪ Focus
As AI tools rapidly disrupt and inform our daily lives, the notion of Machine Learning literacy is becoming more crucial than ever.

Project Framing

In the current data economy, information about us is often appropriated in goal and outcome-oriented ways. Work across the HCI community argues for alternative avenues to engage with data in ways that are ambiguous [4], soft [12], or slow [7]. Data sets are often very uniform as they serve the purpose of creating efficient outcomes.

◉ Research Question 1
How can we foster machine learning literacy and create meaningful dialogue between archives?

◉ Research Question 2
How can we create data sets as a community?

◉ Research Question 3
How can we intentionally refrain from making sense of data archives and use them in more tactile, irrational, and evocative ways?

Piloting & Refining Workshops

To explore these inquiries, I undertook a design research expedition utilizing techniques such as co-speculation and co-creation. The preliminary workshops consistently included a tangible device that facilitated participants in creating image data sets, both small and vast.

Creating Dialogue

Once the StyleGAN2 models completed their training on the co-created data sets, individuals began experimenting with the results. The aim was to recognize specific similarities between the input and output outcomes. One participant utilized the styleGAN2 model to create a video sequence, and then matched it with a custom GIF made from images in the original dataset that displayed a visual symmetry.

↑ Creating GIF's of matching IN and OUTPUT

SFU Summer School Workshops

As part of the 2023 SFU Summer School held in Burnaby, I organized a one-day workshop in collaboration with the Imaginative Methods Lab and Communications Department at SFU. The program focused on developing critical digital literacies among students in grades 10 to 12. The aim was to encourage them to explore data in a fun and critical manner. Specifically, during the data selves week, I hosted a data collection workshop.

How to make a Data Set Fair?

In this first exercise, the goal was to create a data set based on personal objects that the students brought to class they felt represented by (e.g., their skateboard, a doll representing their heritage, or a ribbon for hokey triumphs).

Together we investigated how to ensure equal representation in data sets by discussing the basics of biased training data and harmful effects.
Each student took 300 images with a DSL camera set to a 1x1 ratio (the standart format for image data sets).
The result was a data set of 3,712 images.

Exploring a sense of scale

In the second exercise, we discussed the challenges of comprehending the scale of extensive training data archives. Together, we investigated what a unique hand archive of this class will look like.

Student's collected expressions, such as ways to count, unique handshakes, or games like thumb-wrestling or patty cake.
We encouraged the students to take a ridiculous amount of images.
There were over 19,300 images in the final hand archive.

"Be" a Data Scraper

For the last exercise, we described data scraping and how large image training sets are created. As a metaphor, the students swopped into the role of a data scraper, extracting images all around the SFU Burnaby Campus.

They used white backdrops and boards to "isolate" objects if necessary.
A list of objectives similar to a scavenger hunt was given.
A total of 7,800 images were collected.

↓ Find more in my Annotated Portfolio

Sources

Rediet Abebe. Forbes Insights: Why AI Needs To Reflect Society. Forbes. Retrieved January 8, 2022 from https://www.forbes.com/sites/insights-intelai/2018/11/29/why-ai-needs-to-reflect-society/
Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Advances in Neural Information Processing Systems, Curran Associates, Inc. Retrieved February 2, 2022 from https://proceedings.neurips.cc/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html
Brian Christian. 2020. The Alignment Problem: Machine Learning and Human Values. W. W. Norton & Company.
Chris Elsden, Mark Selby, Abigail Durrant, and David Kirk. Fitter, Happier, More Productive: What to Ask of a Data-Driven Life. 5.
Adam Harvey. Exposing.ai: Brainwash Dataset. Exposing.ai. Retrieved January 8, 2022 from https://exposing.ai/datasets/brainwash/
Os Keyes. 2018. The Misgendering Machines: Trans/HCI Implications of Automatic Gender Recognition. Proc. ACM Hum.-Comput. Interact. 2, CSCW (November 2018), 1–22. DOI:https://doi.org/10.1145/3274357
William Odom, Ron Wakkary, Jeroen Hol, Bram Naus, Pepijn Verburg, Tal Amram, and Amy Yo Sue Chen. 2019. Investigating Slowness as a Frame to Design Longer-Term Experiences with Personal Data: A Field Study of Olly. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, ACM, Glasgow Scotland Uk, 1–16. DOI:https://doi.org/10.1145/3290605.3300264
Lauren Rhue. 2018. Racial Influence on Automated Perceptions of Emotions. Social Science Research Network, Rochester, NY. DOI:https://doi.org/10.2139/ssrn.3281765
Rashida Richardson, Jason Schultz, and Kate Crawford. 2019. Dirty Data, Bad Predictions: How Civil Rights Violations Impact Police Data, Predictive Policing Systems, and Justice. Social Science Research Network, Rochester, NY. Retrieved January 8, 2022 from https://papers.ssrn.com/abstract=3333423
Excavating AI. -. Retrieved January 10, 2022 from https://excavating.ai

Credits

Team: Nico Brand
Consulting: Gillian Russel, Samein Shamsher
University: Simon Fraser University , Fall 2023
Special Thanks: All the Kids & Participants, Samuel Barnett, Matt Desimone

↑ back on top → next project ← previous project