Alegion Video Annotation Tool
Led product design in an Agile environment resulting in the creation of Alegion’s proprietary Video Annotation tool. Established user research practices and held weekly interviews with users. Conducted experiments to validate efficacy of ML tooling in our platform leading to increased workforce efficiency by 300%.
WHO
Data Annotators
WHY
Alegion’s clients have large amounts of image and video data that needs to be labeled accurately at the lowest possible cost. Alegion’s workforce needs to be able to annotate this data efficiently.
WHAT
Video annotation tooling on the Computer Vision Platform aka Worker Portal.
Building Video Annotation
Not long after we released Phase 1 of Image Annotation, the market and potential clients began requesting video annotation capabilities. This seemed like a very natural progression as a video is simply a series of images.
Wanting the annotation experience to be as cohesive as possible I used Image Annotation mocks as a base and completed a few variations of a timeline structure to validate with users. Collaborating with Product Owners we came up with the following tooling requirements:
Requirements
-
Timeline
-
Video player tools
-
Support video playback
-
Localize a target object
-
Edit an existing shape
-
Adjust view space to see fine details
-
Classify a localized object
-
Add other defining details about a localized object
Initial Mock-ups
While the initial feedback on the static mock-ups was positive, it was a known risk that without the ability to test a prototype that could simulate live video playback, it would be difficult to obtain accurate results. Additionally, after many discussions with the Engineering team, it was determined that to implement Video Annotation in the Worker Portal would take months.
Version 0
Instead of building our own tool from scratch in our platform, we decided to customize an open-source annotation tool to deliver results for our first video annotation client. It turned out to be a great learning opportunity from a Product design perspective. We learned what worked and what didn’t.
Key Takeaways
-
Annotators tended to advance the video frame by frame rather than just pressing play to make sure they weren’t missing anything and to check their work.
-
VATIC’s interface had the user identify the object after they had drawn it and listed each instance on the right in chronological order. This made it challenging for the user to know what he/she was looking for in the video. The chronological listing also made it difficult to find an instance again without finding it in the video canvas first which often required scrubbing through frames.
-
VATIC’s interpolation feature was helpful when tracking objects that moved on a linear trajectory through the video; however, any non-linear movement rendered interpolation useless.
Version 1
We used the insights from VATIC (Video Annotation Tool from Irvine California) to adapt our first release of video annotation in the Worker Portal and after a lot of research on video formats and encodings, we launched version one.
The first client project using the new functionality was a large home security camera company. With a tight deadline for a high volume of videos approaching, Alegion called an all-hands-on-deck day to help get the videos annotated. Originally people were upset about the disruption from their daily planned work, but it turned out to be a blessing in disguise for the Product team. We were able to get a massive amount of authentic feedback from first time users and identify a number of bugs all in one day. We met the deadline comfortably and our results from the client met their highest expectations for accuracy.
Even with such high marks, the client decided they did not have the budget after all, but assured us they would keep us in mind for the future. Clients wanted high-volume, high-accuracy, at low-cost and delivered yesterday. It was an ongoing challenge to communicate all the time and effort that goes into setting up and testing a workflow, training workers and post-processing the results in a format that the client needed. Even within the Product team, not everyone truly understood the complexity of completing even a single task. What were we actually asking the worker to do?
Video Annotation Journey Map
I set out to understand and document the cognitive complexity of doing a video annotation task for our biggest client. Watching multiple user sessions in Fullstory and interviewing Annotators about their individual strategies to complete a video annotation task I was able to document a step by step user journey that tracked each action, decision and even which part of the screen the user needed to focus.
Video Annotation Journey Map
-
# of frames: 1145
-
# of actions: 108,920
-
avg time to complete: 2,879 minutes
-
The green highlights create a heatmap representing the areas of the screen the annotator is looking and/or has the cursor for each action.
The journey map revealed four different types of decisions.
Using the heatmap information, I grouped each one to a defined location on the screen depending on the information the user would need to make that decision.
This unbiased account of how Annotators were getting their work done visualized where the highest points of friction and redundancy were, pinpointing exactly what our design priorities should be: setting, tracking and reviewing relationships between one or more entities.
Video Annotation Northstar
The Product Design team was now able to begin ideating on the North Star, or ideal Video Annotation experience. Each of us diverged to build low-fidelity mocks of what we thought would help resolve the existing pains related to object relationships. After about a week of iterating on our own we converged to share ideas. The Head of Product Design conducted a Six Thinking Hats workshop that helped us determine what ideas we thought would work best, what was confusing and what needed more development.
Initial Mock-up Feedback
-
Left side sheet (Independent Permanent/IP) doesn’t have a clear hierarchy
-
Timeline affordances made sense but unclear what the black bar represented and how users should interact with it.
-
The temporal area (timeline) seemed too restrictive—”I want to compare a group or pair of entities!?”
-
Unclear how to create a new object.
Northstar Iterations
Using constructive criticism from my team (see feedback above) I created another iteration in Figma. It was clear I needed to focus on how someone would set and change relationships in the bottom third of the screen. The next prototype was going to be tested with internal team members, as well as actual Annotators so the experience needed to feel as realistic as possible. In less than a week I created the following iteration.
Second iteration Feedback
-
Intuitive flow to create objects.
-
Slight confusion to create a group on users’ first try.
-
Temporal relationships not clear.
-
Visibility setting was more intuitive than before.
-
Filter mode was not clear or obvious.
-
What information is the left side sheet providing that isn’t in the temporal area?
After the first usability session I made more tweaks based on feedback to create an updated, higher fidelity prototype to test with Annotators in Malaysia.
View the clickable Figma prototype here.
Third iteration Feedback
-
Feels cleaner with only two main sections.
-
Clear hierarchy of parent and children.
-
Didn’t get to test with users.
Key Takeaways
-
Having and socializing a Northstar design creates alignment on the big picture goal, helping to connect the dots and see the value from sprint to sprint as teams work toward the ideal experience.
-
Observing and documenting the existing experience creates understanding and empathy by highlighting main points of friction. This helps to align everyone involved on why design decisions have been made and why specific features need to be prioritized over others.
-
Using realistic data and functionality while testing would’ve helped us learn more, earlier in the process enabling us to iterate faster.
Conclusion
Designing the video annotation experience at Alegion was incredibly rewarding and the most I’ve ever grown professionally. It was so exciting to begin the project as the sole designer, from teaching others about computer vision and designing the initial mocks to building the design team and seeing how quickly we were able to iterate together to create an experience that increased annotation efficiency by 300%. I’m proud to say that my designs are currently being used by thousands of people for 8-12 hours a day and ended up being the company’s highest grossing tool. I lived, breathed and dreamt video annotation for nearly two years so I was pretty bummed not to be able to continue building and validating the Northstar iterations and to see the final implemented project.