Paper Abstracts for Monday, August 20

11:30 - 11:55 Monday, August 20 Room: Michelangelo, Paper Session: Production & Tools

AUTHORS: Michael Costagliola, Yale University

TITLE: Multi-user shared augmented audio spaces using motion capture systems

This paper describes a method for creating multi-user shared augmented reality audio spaces. By using a system of infrared cameras and motion capture software, it is possible to provide accurate low-latency head tracking for many users simultaneously, and stream binaural audio representing a realistic, shared virtual environment to each user. Participants can thus occupy and navigate a shared virtual aural space without the use of head-mounted displays, only headphones (with passive markers affixed) connected to lightweight in-ear monitor beltpacks. Potential applications include installation work, classroom use, and museum audio tours.

12:00 - 12:25 Monday, August 20 Room: Michelangelo, Paper Session: Production & Tools

AUTHORS: Giordano Jacuzzi, Sennheiser Electronic GmbH

TITLE: Augmented Audio: An Overview of the Unique Tools and Features Required for Creating AR Audio Experiences

What a user sees in augmented reality is only part of the experience. To create a truly compelling journey, we must augment what a user hears in reality as well. In this paper, we consider Augmented Audio to be the sound of AR, and discuss a technique by which the binaural rendering of virtual sounds (Augmented) is combined with the manipulation of the real-world sound surrounding a listener (Reality). We outline the unique challenges that arise when designing audio experiences for AR, and document the current state-of-the-art for Augmented Audio solutions. Using the Sennheiser AMBEO Smart Headset as a case study, we describe the essential features of an Augmented Audio device and its integration with an AR application.

12:30 - 12:55 Monday, August 20 Room: Michelangelo, Paper Session: Production & Tools

AUTHORS: Jukka Holm and Mark Malyshev, Tampere University of Applied Sciences, Finland

TITLE: Spatial Audio Production for 360-Degree Live Music Videos: Multi-Camera Case Studies

This paper discusses the different aspects of mixing for 360-degree multi-camera live music videos. We describe our two spatial audio production workflows, which were developed and fine-tuned through a series of case studies including rock, pop, and orchestral music. The different genres were chosen to test if the production tools and techniques were equally efficient for mixing different types of music. In our workflows, one of the most important parts of the mixing process is to match the Ambisonics mix with a stereo reference. Among other things, the process includes automation, proximity effects, creating a sense a space, and managing transitions between cameras.

2:00 - 2:25 Monday, August 20 Room: Michelangelo, Paper Session: Perception & Evaluation

AUTHORS: Angela Mcarthur, Mark Sandler and Rebecca Stewart, Queen Mary University of London, UK

TITLE: Perception of mismatched auditory distance - cinematic VR

This study examines auditory distance discrimination in cinematic virtual reality. Using controlled stimuli with audiovisual distance variations, it determines if mismatched stimuli are detected. It asks if visual conditions - either equally or unequally distanced from the user, and environmental conditions - either a reverberant space as opposed to a freer field, impact accuracy in discrimination between congruent and incongruent aural and visual cues. A Repertory Grid Technique-derived design, whereby participant-specific constructs are translated into numerical ratings, is used. Discrimination of auditory event mismatch was improved for stimuli with varied visual-event distances, though not for equidistant visual events. This may demonstrate that visual cues alert users to matches and mismatches, but can lead responses toward both greater and lesser accuracy.

2:30 - 2:55 Monday, August 20 Room: Michelangelo, Paper Session: Perception & Evaluation

AUTHORS: Hanne Stenzel, Philip J.B. Jackson, University of Surrey, UK, Jon Francombe, BBC Research and Development, UK

TITLE: Reaction times of spatially coherent and incoherent signals in a word recognition test

Using conventional sound design, the audio signal in VR applications is often reduced to a static stereophonic signal that is accompanied by a visual signal that allows for interactive behavior such as looking around. In the current test, the influence of spatial offset between the audio and visual signals is investigated using reaction time measurements in a word recognition task. The audio-visual offset is introduced by a video presented at horizontal offset angles between +/-20 degrees, accompanied with a static central audio. Measurements are compared to reaction times from a test where both audio and visual signal are presented with the same offset. Results show that the spatial offset introduces changes in the reaction times exhibiting greater within-participant differences.

3:00 - 3:25 Monday, August 20 Room: Michelangelo, Paper Session: Perception & Evaluation

AUTHORS: G. Christopher Stecker, Vanderbilt University

TITLE: Toward objective measures of auditory co-immersion in virtual and augmented reality

"Co-immersion" refers to the perception of real or virtual objects as contained within or belonging to a shared multisensory scene. Environmental features such as lighting and reverberation contribute to the experience of co-immersion even when awareness of those features is not explicit. Objective measures of co-immersion are needed to validate user experience and accessibility in augmented-reality applications, particularly those that aim for "face-to-face" quality. Here, we describe an approach that combines psychophysical measurement with virtual-reality games to assess users' sensitivity to room-acoustic differences across concurrent talkers in a simulated complex scene. Eliminating the need for explicit judgments, Odd-one-out tasks allow psychophysical thresholds to be measured and compared directly across devices, algorithms, and user populations. Supported by NIH-R41-DC16578.

3:30 - 3:55 Monday, August 20 Room: Michelangelo, Paper Session: Perception & Evaluation

AUTHORS: Olli Rummukainen, Thomas Robotham, Sebastian J. Schlecht, Axel Plinge, Jürgen Herre and Emanuël A. P. Habets, Fraunhofer, Germany

TITLE: Audio Quality Evaluation in Virtual Reality: Multiple Stimulus Ranking with Behavior Tracking

Virtual reality systems with multimodal stimulation and up to six degrees-of-freedom movement pose novel challenges to audio quality evaluation. This paper adapts classic multiple stimulus test methodology to virtual reality and adds behavioral tracking functionality. The method is based on ranking by elimination while exploring an audiovisual virtual reality. The proposed evaluation method allows immersion in multimodal virtual scenes while enabling comparative evaluation of multiple binaural renderers. A pilot study is conducted to evaluate feasibility of the proposed method and to identify challenges in virtual reality audio quality evaluation. Finally, the results are compared to a non-immersive off-line evaluation method.

4:00 - 4:25 Monday, August 20 Room: Michelangelo, Paper Session: Perception & Evaluation

AUTHORS: Gregory Reardon, Andrea Genovese, Gabriel Zalles, and Agnieszka Roginska, New York University, Patrick Flanagan, THX Ltd.

TITLE: Evaluation of Binaural Renderers: Multidimensional Sound Quality Assessment

A multi-phase subjective experiment evaluating six commercially available binaural audio renderers was carried out. This paper presents the methodology, evaluation criteria, and main findings of the tests which assessed perceived sound quality of the renderers. Subjects appraised a number of specific sound quality attributes - timbral balance, clarity, naturalness, spaciousness, and dialogue intelligibility - and ranked, in terms of preference, the renderers for a set of music and movie stimuli presented over headphones. Results indicated that differences between the perceived quality and preference for a renderer are discernible. Binaural renderer performance was also found to be highly content-dependent, with significant interactions between renderers and individual stimuli being found, making it difficult to determine an "optimal" renderer for all settings.