A.I. Grading Apprentice
Mixed-Methods Case Study
Generative foundational research to understand what problems A.I. could solve in university science courses and what constraints exist on the design space.
The outcome was a roadmap for an A.I. grading apprentice that both supports TA grading efficiency and provides students with individualized feedback.
Β
Student frustrated by the unclear criteria that the A.I. is scoring her work against
"I am an extremely hard-working, motivated student...[BUT] I am still being given dreadful grades...by a COMPUTER."
β University student
Β
Challenge
A.I. is disrupting business models in the EdTech space
There are a few new products leveraging A.I. for student assessment, but limited understanding of industry best practices exist to evaluate the utility of A.I. in grading
Β Similarly, little is known about how A.I. impacts student, TA, or instructor experiences
Dane setting out to tackle the big challenge of A.I. in university science education
This gave me my most important, unique business challenge to date:
Can I translate insights into roadmap priorities by understanding from users:
How A.I. should be used to support lab instruction,
What negative user experiences should we avoid before developing a proof of concept?
Objective
Leverage foundational research to identify areas of Labflow experience that could be enhanced with ML/A.I. from internal stakeholders and current customers
Synthesize sentiments from users of a rival product using A.I. to grade student work
Identify salient positive and negative perceptions of A.I.
Work directly with product & engineering to articulate use cases aligned to user research insights and prioritize them for business alignment
Project Outline
This is an ongoing research project that started in Summer 2023 and will involve the creation of a proof of concept implementation of A.I. in the Labflow product.
Stakeholder interviews: Jun. β Jul. 2023
Analysis of User Experience in Rival Product
Questionnaire: Nov. βΒ Dec. 2023
Interviews: Nov. 2023
π Scope
Document internal/external stakeholders challenges and understanding of A.I.
Explore user sentiments to A.I. in competitor product
Distill user insights into roadmap priorities for proof of concept
π¦ Deliverables
Presentation deck of UX insights from market competitor using A.I.
Prioritization activity for scoring A.I. concepts
π₯ Roles
Mixed-Methods UX Researcher
How do human conceptions of ML/A.I. impact their expectations & desired behavior of an A.I.-powered grading system?
Research Objectives
Uncover foundational insights about user mental models of ML/A.I. and what tasks A.I. might improve within the context of Labflow
Identify the strengths and weaknesses of competitor products leveraging A.I.
Rank concepts for integrating A.I. into Labflow with business priorities.
Research Methods & Findings
Method: In-depth interviews, Questionnaire
Findings:Β
Large Language Model powered A.I. technology evokes a strong ELIZA effect, giving the impression of human knowledge and ability to understand and respond to intent. (#1)
Interviews with internal stakeholders & current customers identified areas where Labflow might leverage A.I.: administrative tasks, content authoring, student assessment, TA coaching, and enhancing analytics (#1)
Analysis of questionnaire data (#2) revealed some important UX factors, importantly:
Grading consistency must be improved to combat the stochastic nature of LLMs
A.I. grading criteria need to be transparent and understood to students
Students find A.I. to be beneficial in coaching them as they learn
Assessment should not rely on post-hoc human review as a QA mechanism
1. Stakeholder Interviews
Participant explains in her own words how how ML/A.I. work, displaying elements of the ELIZA effect
"[Artificial Intelligence] is a way over time to teach a computer how to answer questions for you and learn [what you need]."
Β β Carmen, internal stakeholder
Synthesizing across all of the internal an external stakeholders led to insights on what areas of the Labflow product could be improved by A.I.
ποΈ
Admin tasks
Admin tasks
Repetitive operations like setting dates and course initialization could be made easier with A.I.
βοΈ
Content authoring
Content authoring
Report grading code, PDFs, graphics with alt text could be generated by an LLM to provide a useful starting point for content team
π¨βπ
Student assessment
Student assessment
Student work could be graded by A.I. to speed up the feedback process
π©βπ«
Grader coaching
Grader coaching
TAs often get little guidance on how to be good graders. A.I. could scale up the ability to provide good professional development to graders
π
Enhanced analytics
Enhanced analytics
Instructors have access to course analytics in Data Insights, but they have to seek them out. A.I. could make finding insights in data more automated
2. Analysis of User Experience in Rival Product
Faculty member who adopted market rival product explains what issues he thought A.I. could help him solve
"We recruit undergraduate TAs to run first year chem labs. Their content knowledge is not the same as senior graduate students, but I don't have enough time to train them properly.
I still want to be able to ask rigorous questions and know responses will get graded correctly for all students."
Β β Chemistry Professor & Coordinator of Undergraduate Laboratories
Students who used the market rival product adopted were also given a questionnaire about their experience. This included fixed choice and open-ended responses.
Dane a bit skeptical that A.I. grading is even a good approach, given that it so strongly detested
To put it bluntly, students had strong feelings about A.I. grading.
Some felt it was a double standard ("why can you use it but I can't?"). Others felt it cheapened the value of their university degree ("I'm paying to be taught by people!")
A.I. grading felt like it might be a risky path.
Q: For graded work within [competitor], do you think you were fairly graded?
43%
YES
47%
NO
10%
UNDECIDED
Qualitative analysis of student responses for reasons why A.I. did not grade them fairly
Underlying reasons why A.I. feedback was helpful or not
(Positive = reason it helped; negative = reason it didn't help)
One "Aha!" moment I had in this data is A.I. was not universally negative for students. Specific aspects of A.I. drove both negative and positive sentiment.
Negative:
Consistency: A.I. did not grade the same inputs identically
Expectations: The grading criteria the A.I. uses to evaluate responses are unclear
Positive:
Coaching: A.I. is good at guiding students around unfamiliar content as they are learning it for the first time
Research Impact
Used stakeholder and market insights to articulate multiple candidate A.I. features in Labflow
Scored candidates with a rubric aligned to company goals and identified top candidates for an initial proof of concept
Example of A.I. concepts ranked against a business prioritization rubric
The prioritization activity yielded some clear winners for an A.I. proof of concept in Labflow.Β
The insights I distilled from interviews and questionnaire data really pointed to an A.I. assistant that helps grade rather than replace the TA as having high value.
For now how we plan to implement the specifics are secret, but we'll be excited to share something soon!
A.I. grading assistant
Dane getting excited to build some A.I. goodness
The next step is to roll up our sleeves in someone's garage and build this thing!
Stay tuned...