Gold Sponsors
Array Telepresence Logo   Human Productivity Lab Logo   Ashton Bentley Logo
Silver Sponsors
Bronze Sponsors
Telepresence Options Magazine

Latest Telepresence and Visual Collaboration News:
Full Article:

Highly Immersive Telepresence - The Keys to Creating Immersion

November 14, 2011 | Howard Lichtman
A rendering of the upcoming highly immersive SurroundPresence environment from Array Telepresence

Organizations are spending hundreds of thousands of dollars per room for highly immersive telepresence conferencing environments.

What are the keys to creating highly immersive telepresence environments? What is the ROI for end-users? What is the future of highly immersive conferencing?

This new Research Brief by Human Productivity Lab President Howard Lichtman examines what makes highly immersive telepresence environments "highly immersive" and gives a sneak peak at the first new highly immersive telepresence environment to come to market in years: SurroundPresence.  The new environment features a new patent pending optics and image processing system called Equal-i that brings remote participants "up close and personal". 

Highly Immersive Telepresence - The Keys to Creating Immersion
A Human Productivity Lab Research Brief sponsored by Array Telepresence
       By Howard S. Lichtman, President - Human Productivity Lab

Download the Printable and Clickable PDF Here.

Background and History
My short definition of telepresence conferencing is "Visual collaboration solutions that address the human factors of participants and attempt to replicate, as closely as possible, an in-person experience". Highly Immersive Telepresence is a sub-set of telepresence.
The father of modern immersive telepresence conferencing is TeleSuite (now the Polycom RPX) developer Herold Williams, the first visual collaboration architect to significantly address the human factors of the visual experience, hide the technology and integrate it into a format that would create immersion. The Result: usage went through the roof. While traditional videoconferencing systems averaged 5-15 hours per endpoint per month, TeleSuite systems were averaging 60, 70, 80, even 100+ hours per system per month! By focusing on the human factors of participants and creating a highly immersive experience, Herold had tremendously improved the end users acceptance and preference over what could be achieved by a traditional, observant videoconferencing experience alone.

An early TeleSuite Enterprise 408 Telepresence Environment Circa 2004. Herold Williams sits center, to the right of Scott Allen, now CEO of telepresence managed service provider Iformata Communications.

Early adopters included AOL, Cigna, 3COM, PriceWaterhouseCoopers, GlaxoSmithKline and CapitalOne, who proved that enterprises would pay hundreds of thousands of dollars for visual collaboration solutions that met their needs.

Polycom bought the TeleSuite intellectual property and manufacturing capabilities for over $50 million in 2007. Renamed the Polycom RPX, the offering has gone on to be one of the most successful visual collaboration solutions in history. Hundreds of Polycom RPX systems are deployed globally, ranging from $299,999 for a four-seat environment to over $700,000 for a 28-seat environment.

The Return on Investment

By greatly improving end-user acceptance, highly immersive telepresence dramatically increases usage, ROI and customer satisfaction over traditional videoconferencing. In surveys, employees consistently say they prefer the more humanistic, natural experience that highly immersive environments provide over traditional videoconferencing. They like the life-size body language and non-verbal cues that come across so clearly from remote participants, and how the comfortable environments let their meetings go longer without fatigue.  Remote participants are represented more faithfully in the local space and are included in discussions as equal participants. They use the systems more willingly, without mandates. As a result, they travel less and produce more, giving organizations a time-to-market advantage from accelerated decision making, faster merger and acquisition, more flexible business models, and improved personal productivity of individual executives that can be more effectively leveraged around the world at the speed of light.

What Makes a Telepresence Environment Highly Immersive?

Anyone who has ever fallen in love on a first date can understand immersion. When immersion happens between two people, they connect so well they block out all other stimuli in their environments. This isn't to say that executives need to fall in love to improve business communications, but they'll produce better work in more immersive environments.

When the brain isn't distracted by the Medium (the visible screen, the obvious camera, low-quality audio, space, etc.) it's freed up to focus on the Message (what's being said, body language and social cues). Immersive environments produce superior end-user acceptance, compelling participants to stay longer without fatigue.

david_danto_quote.jpgIn telepresence and visual collaboration immersion is best thought of as a continuum where the graduations are not noticeably different from their adjacencies, although the ends or extremes are very different from each other. The more elements you address, the greater the immersion you achieve.

The most important components of creating immersive environments include the following:

Wide horizontal seamless displays to address human's wide horizontal field of view and peripheral vision -- Peripheral vision is the part of vision that occurs outside the center of gaze (or foveal vision), providing the brain a sense of context and space in support of foveal vision. Human beings have fairly narrow vertical field of view, about 135 degrees, and a much wider horizontal field of view of about 200 degrees.  

A visualization of the vertical vs. horizontal field of view and visual overview of the eye.

Using a wide horizontal seamless display, telepresence architects can better immerse participants in the remote scene by duplicating the usual and expected visual cues. 

The Polycom RPX 400 Series features a 16' x 4' wide format display to supplement 
participants' peripheral vision and a hidden eye-level camera

Absence of visible cameras -- Another key aspect of immersion is hiding or removing cameras, screen bezels, and electronics that create a sense of artificiality in the environment.   A visible camera leads to a phenomenon known as the "Documentarian's Curse," the tendency of people to behave differently on camera.  Most people are familiar with this experience from family events where friends and relatives behave differently when being recorded with a video camera.  In visual collaboration environments, participants may focus on how they look on camera or be less open as the brain registers the potential that the conference could be recorded. A camera out in the open also makes things seem artificial.

Solutions from Digital Video Enterprises (DVE) and TelePresence Tech hide the camera behind a piece of silvered glass called a beam splitter. The Polycom RPX hides the camera behind the screen. 

Absence of Screen Bezels -- Human beings have been socialized to interact with visible screens in a very particular way since birth. Visible screen bezels around flat-panel displays reinforce an observant experience. Dr. Steve McNelley, the Co-CEO of DVE, is a clinical psychologist who did his doctoral research on videoconferencing. He calls the visible bezel, "the chief psychological cue that you are looking at a television screen."

DVE's telepresence solutions use seamless beam splitters to eliminate the visible bezel. Polycom's RPX uses a rear projection screen. 

The Digital Video Enterprises Huddle 70 eliminates the bezel, hides the camera at eye-level and replicates architectural cues where possible. Note the placement of the plant in both locations. 

Stand-Up Environment -- Nothing destroys an immersive experience faster than headless participants. Traditional videoconferencing rooms (and many telepresence group systems) can't capture participants standing to enter, leave, stretch, make a point, or use a whiteboard.

dve_immersion_room_with_howard.jpgThe author standing in a DVE Immersion Room with a 120-inch seamless beam splitter and a hidden eye-level camera behind it

Eye contact and excellent eye lines -- Eye-contact is chief among the body's non-verbal cues. From infancy, we are biologically drawn to the gaze of our parents, establishing a preference for personal communication that continues throughout life. Eye contact between humans is physiologically powerful, eliciting changes in blood pressure and heart rate and increasing brain activity. The information transmitted through eye contact is rich and varied:

  • Eye gaze provides many communication fundamentals, including feedback, conversational regulation (turn taking) and the expressions that punctuate emotion.
  • Mutual eye gaze has been described by psychologists as "the key to the awareness of the thoughts of another." People with strong eye contact are perceived to be more honest, attractive and successful. Conversely, psychologists call people with poor eye contact "gaze-avoidant personalities," rated less favorably in the eyes of others.

Addressing Eye-Contact in Highly Immersive Environments
While beam-splitter solutions from DVE and TelePresence Tech can almost perfectly replicate eye contact between single participants sitting directly in front of the display/hidden eye-level camera, it's impossible to maintain truly perfect eye contact in multi-party conferences. However, considering that eye contact is made up of the vertical Y component and the horizontal X component, optimal camera placement can go a long way in promoting eye contact. The most annoying aspect of eye contact is the mismatch of the Y component.  Placing the camera at eye level behind a screen or beam-splitter display will significantly repose the gaze-angle differential in the vertical Y component. But the horizontal X component is literally impossible to achieve. Some patents propose using multiple cameras to capture different angles of the head and then averaging them so that the person appears to look straight ahead. This does not appear effective. You want to see people turn their heads to address someone in their room -- our heads are perfect panning mechanisms already, so the X component is much less important than the Y.

Significant gaze-angle differential using standard cameras mounted over
flat panel displays makes participants appear to look down.

Augmented Reality -- Another powerful technique to create a sense of immersion is augmented reality, which layers virtual images into a physical space to create the illusion that participants are in the same physical space. Digital Video Enterprises and TelePresence Tech use beam-splitter displays and an extreme low reflectance black velour background to achieve this effect. The black background turns invisible as it absorbs light. With no visible background as a frame of reference, the remote participants appear as volumetric images layered into physical space with the right proportions. 

The TelePresence Tech TPT 4000 layers virtual images of remote conferees into the physical space of the local environment while hiding the camera at eye-level.

High-definition Video with a High Frame Rate -- High definition video is classified as video with a screen resolution over 1280x720 pixels at a 16:9 ratio (typically referred to in industry parlance as 720p). The highest screen resolutions offered in commercial telepresence and videoconferencing is 1920x1080 pixels at a 16:9 ratio (or 1080p). The higher the camera, codec, and screen resolution, the better the realism of the remote scene, especially with respect to facial features and minute nuances of individual expression. The larger the screen, the more important higher resolutions become in order to faithfully scale the image over a larger area.  The videoconferencing industry's move to high-definition images is one of the leading reasons visual collaboration has caught on over the past five years.

Video conferencing resolutions

Frame rate is the frequency which a videoconferencing system produces unique consecutive images known as frames. The higher the frame rate, the smoother and more realistic the motion. The current standard for most videoconferencing systems is 30 frames per second (fps), though a significant percentage of newer systems are capable of 60fps. 

Life-size Images of Remote Participants -- Life-size images does not mean full human scale. In a typical conference room, we sit about five to six feet apart, which establishes our perception of what is life size. In most immersive telepresence environments, we sit about eight to 10 feet from the screen plane. Accordingly, "life size" can be defined as 80 to 85 percent of human scale to maintain a sense of realism. 

Engineered environments, including architectural elements, colors, furniture, etc. -- Engineered environments address many of the subtle nuances that improve the quality of capturing video an audio including room color, lighting, and acoustics.  Most environments precisely position participants so they are ideally captured for effective display on the other side.  People feel more comfortable occupying a space that maintains attributes of a conference room environment.

In this picture of a TeleSuite circa 2004, you can see all the elements of an engineered environment: participants are properly lit and precisely positioned for video capture, the furniture and architectural elements are mirrored on both sides, including attaching the table to the screen to create the illusion that both sides share the same physical space.

Lighting -- Lighting has and will always be critical to creating immersive environments even with advances in sensor and camera technology. Low lighting will always add noise to the image--too much will add glare and washout parts of the scene. Direct overhead light will cast a shadow over the brow, darkening the eyes. Proper lighting includes key, fill, and back lighting. A balance of all three is critical with a subtlety of an office environment so it doesn't look and feel like a television studio.

Audio -- Accurate voice capture is the most important aspect of telepresence, yet it often gets the least attention. Reverberation is the real killer. You can have the greatest audio compression and algorithms, but a lousy voice capture and poor replication are hard to overcome.

Microphones -- Much of the problem boils down to microphone placement -- where is the microphone in relation to the source voice. Lapel microphones achieve the best possible quality. Using today's technology requires "micing up," which has problems with naturalness, battery life, capacity and recharge. Steering microphone arrays are in the works. Because immersive rooms can become pretty active, table microphones can be problematic for standing participants. That leaves multiple overhead placements as the best option. This requires a high-quality DSP mixer, preferably with a de-reverberation algorithm.

Spatial Acoustics -- Another aspect of highly immersive environments is faithfully replicating the direction from which sound is produced. If a remote participant on the left-hand side of the screen speaks, the sound should come from that direction to effectively mimic an in-person experience. 

Acoustics -- Good acoustics lower reverberation. Immersive environments use acoustical materials to absorb sound and keep it from reflecting off the ceiling, walls and floor.

The Future of Highly Immersive Environments

The "Holy Grail" of immersive telepresence is creating an environment with photo-realistic 3D representations of remote participants projected into a physical space. Ideally, these spaces would be mirrored so both local and remote participants would be able to sit together and collaborate on the same information. A participant in one location would work on a shared whiteboard, his digital persona replicating the content on an identically positioned whiteboard in realtime at the other location(s). 

This requires capturing the image of the remote participants in three dimensions, compressing that information, sending it across a network, decompressing and then projecting that representation as an ultra-bright, high resolution three-dimensional photo-realistic image that appears solid. 

Here's what makes the procedure even more complicated: to have a natural, humanistic interaction with the projected remote participant, you must align the local camera, capturing the image to provide eye contact or the approximation of eye-contact to satisfy the innate expectations that human beings have with respect to inter-personal communications. A system might achieve this effect with vast arrays of low-cost, high-quality cameras ringing the space and adjusting their capture based on predictively tracking the position of participant's heads in real-time. Suffice it to say, the technology isn't there yet.

Cisco TelePresence did a product placement of its vision for the future in the 2009 movie G.I. Joe: The Rise of Cobra, in which photorealistic 3D representations of remote participants were projected into physical space.

Interim Solutions

Until photo-realistic 3D avatars are a reality, Herold Williams is at it again... His company, Array Telepresence, has filed patents on a number of new technologies that capture and process the images pre-compression and has designed the SurroundPresence telepresence environment to optimize the effect.

The SurroundPresence SP-8 environment is optimized for the Equal-i system to create a highly immersive meeting experience for up to eight participants.  The environment incorporates all of the techniques for creating high immersion: wide-format screen that takes up significant peripheral vision, high-definition, life-size remote participants, hidden eye-level camera, stand up presentation, and an engineered environment with optimum lighting, acoustics, and architecture.  SurroundPresence is also unique in that it doubles as a conventional conference room (albeit with enhanced data collaboration capabilities) and can be deployed at 1/3 the cost and a fraction of the space required from other highly immersive telepresence environments.

SurroundPresence uses many of the techniques listed above: high-definition life-size images, wide format seamless screens, hidden eye-line camera, stand up presentation and an engineered environment that addresses acoustics, positioning, and lighting among other factors. The big difference is Herold has found a way to deliver the experience at 1/3 the cost of competing highly immersive solutions with an environment that only requires a 13' x 19' room -- a fraction of the space required from competing highly immersive environments.  Herold's patent-pending optical and image pre-processing technology, Equal-i, can also improve the experience of any visual collaboration session done in a room with an elongated table with or without the other aspects of the SurroundPresence environment. This problem was succinctly summed up by Cisco's Chuck Stucki at the Wainhouse Research 2010 Summit highlighting the challenge of a typical boardroom configuration

"How can manufacturers and installers solve the problem of supplying high-quality video, where facial gestures are clearly visible, across a long boardroom table?  Assuming that the design of boardrooms is going to change, then the equipment serving it will have to.  One solution is cameras with zoom facilities, but what is not without its problems -- who will control the zoom, for example, and what happens if two people speak at once?"

The Equal-i system brings the farthest participants in a conference with an elongated board room table up close and personal (without pan/tilt/zoom cameras) and distributes them across either a wide-format seamless screen or a standard flat panel monitor.  This can help improve the conference experience of organizations that have non-negotiable space or furniture constraints like $100,000 mahogany board room tables. 

Williams is currently looking for a distribution partner to help take his products to market by the time he showcases SurroundPresence and Equal-i at InfoComm 2012.

About the Author
Howard Lichtman is the President of the Human Productivity Lab, a telepresence consultancy and research firm that helps organizations design telepresence and visual collaboration strategies and deploy and future-proof investments.  He is also the publisher of Telepresence Options, the #1 website on the Internet covering telepresence and visual collaboration technologies and the Editor of the monthly Telepresence Options Telegraph and the bi-annual Telepresence Options Magazine, the world's most widely read publication covering telepresence technologies.

Mr. Lichtman is also the author and/or co-author of The Telepresence Options 2011 Yearbook, The Inter-Company Telepresence and Videoconferencing Handbook (2009), The Telepresence and Videoconferencing Exchange Review (2010), Telepresence, Effective Visual Collaboration and the Future of Global Business at the Speed of Light (2006), and Emerging Technologies for Teleconferencing and Telepresence (2005).  He is currently working on Telepresence Options 2011.

Mr. Lichtman is a frequent commentator on telepresence, videoconferencing, and effective visual collaboration and his writings on and analysis of the industry have been featured by US News and World Report, Telephony Magazine, CXO Magazine, The Chicago Tribune, Reuters, Pro AV Magazine, Killer App Magazine, ABA Banking Journal, Bank Systems and Technology Magazine and CFO magazine among others.

Download the Printable and Clickable PDF Here.

Highly Immersive Telepresence Research Brief

Add New Comment

Telepresence Options welcomes your comments! You may comment using your name and email (which will not be displayed), or you may connect with your Twitter, Facebook, Google+, or DISQUS account.