Glossy: Audio Annotation In Virtual Reality
Glossy is an asynchronous audio annotation tool for virtual reality.
Over the past year, I worked with BuzzFeed News reporters and other collaborators on a variety of VR projects, from 360-video to photogrammetry and LIDAR. However, on newsroom projects with short lead times, VR production is hard. Writers and editors are adept at using timestamps and scene descriptions to quickly communicate with video or audio producers on edits, but even VR producers don’t have an established vocabulary for discussing edits in a fully immersive space. I wanted a collaboration tool that would fit into an editorial workflow. Not just a tool to make VR production easier, but a way to conceptualize, edit, and polish a VR piece collaboratively within a 3D space.
I also wanted to illustrate a modular approach to VR, that focuses on the intensity of a particular activity over general immersion and uses VR for specific purposes with minimal visuals.
Glossy is a prototype that I developed to test this concept.
How it Works
Glossy allows users to leave audio annotations anywhere in a VR environment.
Glossy is a webVR project, which uses threejs, recorderjs, websockets, and currently has a mongoDB backend. I elected to use webVR instead of creating a Unity app because the ecosystem for development is more transparent and adaptable.
In its current incarnation, Glossy annotations only work with the HTC Vive. A user begins recording by pressing a trigger on an HTC Vive controller. The mic in the Vive’s headset records until the user presses the trigger again to stop the recording. Glossy then deposits an audio clip at the coordinates of the second trigger press.
When the recording is complete, Glossy records the x, y, z coordinates of the controller, and streams the audio file to a binary object. The software then both stores that bundle to a central database and sends it out to any other clients running Glossy.
The first time a user opens the prototype in a browser window, existing audio from the database populates the 3D space with all pre-existing recordings. Then, the prototype listens for additional recordings as they are made.
Anyone who is using the prototype can walk through the environment, hear one another’s recordings in space and leave new recordings. Clients don’t encounter one another as avatars, but rather as recordings or traces.
I wanted to make a flexible tool, useful for quickly prototyping ideas for sound-based spatial narratives, as well as something that could potentially be expanded for a production-level project. Projects that enable vr-native composition (like Google’s Tiltbrush) are interesting because they let you work and create in the space your project is intended to live inside of.
This project is also an exploration of a new perspective on presence, specifically how users in a VR environment experience the presence of other people. Facebook's social VR demo and Altspace VR are examples of co-presence that relies on avatars. I wanted to try introducing presence by registering attention and focus, and allows people to work asynchronously. Audio recordings work surprisingly well, because so much context about identity, personality, mood, environment, and more is conveyed through sound.
Speech and Sound
I’m interested in how speech commands, speech to text, and vice versa will be integrated with VR work. Given the attention being directed towards voice commands on phones & IoT devices, I think it makes sense to think of voices as quite powerful controllers. I really like this VR natural language project, and the ability to create speech-to-text transcripts that also function as a timeline, like this project for speech/transcript aligning.
I have been thinking about “public speaking”, when it’s not a conversation and it’s not for a clear audience. There's a spectrum, from where a system listens specifically to commands and doesn't act until it hears them, to where a system is constantly presenting new information based on what it hears.
The UI for this project is still rudimentary, and I’d like to make the recording process a bit smoother. I’m also aiming to support more platforms, as well as include non-headset friendly locomotion so that someone outside of VR can use the tool as well.
Another thing I want to try is to support movement in the audio recording, so that the audio also has an “animated” component. If you move and speak, it is recorded as moving with you.
The most important upcoming change will be the ability to create chains of recordings, like an email thread, as well as creating a more robust backend so that I can maintain a live demonstration site. Following that, I want to start experimenting with speech commands and speech to text. It would also be great to start using the tool in a non-empty environment, like a photogrammetry model or architectural rendering.
I'm very grateful to the awesome projects that inspired and helped me, especially the webVR projects of Arturo Paracuellos, Brian Chirls, EleVR, and Boris Smus as well as the greater threejs and webVR communities. I'm glad so many talented people are making VR weird and wonderful (and making it work in web browsers)!
P.S. This is my last post for BuzzFeed as an Open Lab fellow. It's been a great year and a singular opportunity. If you want to keep in touch as I continue to develop the projects that got off the ground this year, you can find me at this web address or on Twitter.