Time on Task
Mixed-Methods Case Study
Customers have expressed desire for analytics to estimate how much time teaching assistants (TAs) spend grading in Labflow
Time spent grading is a valuable metric for science departments to keep track of whether TAs are staying within their contracted appointment
Estimation techniques, once developed, could also be extended to measuring other user actions like how long students spend engaged in learning activities
Time on task makes Labflow more competitive as the first entrant in the market to offer such analytics
Dane thinking about this work within the broader social and political context
As a former graduate student in a union, I was tracking the waves of collective action happening on university campuses.
A number of prominent universities using Labflow had TAs or faculty striking, or threatening to do so.
I wanted to be sure that we at the company level were fully aware of our legal responsibilities under states' and federal law for providing data that potentially be misused in labor relations.
As a Data Scientist & Mixed-Methods UXR, I worked across management, product, and engineering to define the goals and risks of developing time on task measurement techniques.
I worked closely with some early adopter universities to refine and hone the quantitative approaches to modeling the data and validate its outputs. I also developed a research & development plan for rolling out the functionality and measuring user sentiment.
This is an ongoing project that started in Fall/Winter 2022. The project if currently focused on developing and validating a time on task model for estimating grading time.
Design, iterate, validate time on task model
Identify how graders and instructors define and perceive active time on task
Compare operational definition of time on task with users' perception of effort (planned)
Production-ready quantitative model to estimate grading time
Report establishing the ecological validity of the estimation technique
Mixed-Methods UX Researcher
How can we estimate user time time spent grading and to what extent does that align with their own self-reports and perceptions of effort?
Develop a fast, reliable, and accurate computational model to quantify user time on task
Validate assumptions about grading time against users' expectations
Explore relationships between users' perceptions of effort, time estimates, and product satisfaction (planned)
Methods: In-Depth Interview, Self-report / Member checking, Survey (planned)
Instructors feel friction with administration to make sure graduate students are working within the bounds of their university contract (often 10–20 hrs/week)
Data can also provide data to facilitate 1:1 coaching for graduate students
Data should be both aggregated as a mean/range and presented by individual grader
Most instructors had not thought of the legal implications of how they use time on task data, but many were sensitive to its potential for misuse
Grading time on task could be modeled with good resolution data capture + 20 minute sliding window Kernel Density Estimation (KDE)
Estimates align with self-reports, discrepancies tend to be <10%
"We often get admin [department] pushback. They want to know if we're assigning a balanced workload. I also want to be able to identify my inexperienced TAs."
— Dr. Jackie Powell, University of Pittsburgh
"I want to be able to see summaries of grading time like averages [for activities], but also have it broken down by individual TAs."
— Dr. Angela Bischof, Penn State University
Notes from a one-on-one interview with a faculty member interested in grading time data
Conversations with faculty revealed that there are at least two types of uses for this data.
The use of the data at the departmental level presents the clearest case for legal problems.
Most instructors had not even considered the legal implications of this data and were not aware of their university or state regulations.
Grading events are emitted for discrete actions performed by TAs like:
Typing personalized feedback
Selecting from pre-defined rubrics
User events also augment this to fill in gaps in time not caught by grading actions.
"Quick Grade" interface in Labflow. Assigning point values, typing feedback, and selecting rubric items all emit individual grading events.
Discrete grading events are captured in BigQuery. I then developed a few computational techniques (Kernel Density Estimation depicted).
Dane considering his model options for estimating time on task
In my R&D, I needed to identify performant computational models for estimating the time on task.
I iterated on a few different possible models and settled on using a KDE model for its handling of distributed clusters.
Clustered bar chart comparing the self-report grading times for three reports against two estimation methods (histogramming, Kernel Density Estimation)
Built and deployed a time on task model validated against user self-reports
Updated Data Privacy and Terms of Service language with the legal team to ensure proper indemnification against misuse of data
Example of grader data parsed with Kernel Density Estimation to identify divisions between logical clusters in grading events
Prototype of grading time across time separated by individual activities
Prototype of activity level grading data separated by individual grader