Hey everyone, thanks for coming.
It's my pleasure to introduce Peter Lincoln
from UNC Chapel Hill, where he's worked with Henry Fugues on
designing some very cool,
very low latency augmented reality displays.
Peter?
>> Thank you.
Thanks for the introduction.
So now I'd like to talk to you about developing
low latency displays for augmented reality.
And over the course of this talk I'd like you all to learn about
some of the problems faced by AR displays and
how to go about solving them.
More specifically, by the end of this talk,
you should have an idea of how we turn the results on the left
into those on the right.
These two checker boards, you see moving back and
forth physical and
virtual, became aligned as the display turns.
And in addition how we transform display with
inconsistent brightness into one that adapts to changing lighting
conditions in the physical scene,
adjusting the brightness of the virtual objects in that scene.
But what do I mean by virtual?
Some of this may be a little bit,
little bit low level background, but just to give a framework.
I'm sure you're all familiar with the art slice.
It presents a purely synthetic environment,
creating the computer and more often now that every form of VRs
have taken the form of games, a few which are pictured here.
This recent increase has resulted from the recent release
of a few consumer grade VR headsets which have brought VR
out of the lab or museum into the home.
However a common element of VR displays is a total blockage of
the real world.
The headmount display covers the users view,
field of view disconnecting him or her from reality.
Other types of displays, on the other hand, allow for
the mixing of the real and the virtual.
And these types of displays comprise, AR, or
augmented reality.
They keep most of the real environment visible,
adding little components here and there.
As a result, the user is able to maintain
a real world context in the mixed reality environment.
AR displays are also a lot more diverse.
I've always had cesium.
You might be more familiar with optical see through on
a hallow lens.
But video see through
has been popularized by things like Pokemon Go,
which is downloaded 500 million times on people's smartphones.
Spacial or projective AR, like the one on the left, uses
digital projectors to illuminate an otherwise blank white object
into a different, more colorful synthetic appearance.
But optical see through is going to be the focus of
my work presenting here.
Optical see-through doesn't block the real world in any way.
The real and virtual optically combine with translucent
displays, mirrors and or lenses.
Several commercial products, is this necessary?
I'm sure you're all familiar with the one on the left.
But some other ones can be either self contained or
require tethering to a PC.
Now there's two main challenges on optical see-through AR that I
focused on in my research.
The first is registration error where the physical and
virtual Worlds become separated or misaligned.
And when a user experiences this problem it can be distracting or
cause nausea.
Neither of which is very good.
A second issue is appearance in consistency.
And this is where the physical and
virtual objects don't appear to exist in the same world.
In this example the right most teapot is way too bright for
its background.
And this can be distracting or
blinding if the virtual object is too dark.
Then it can be indiscernible.
Now the combination of these issues breaks presence, or
the user sense that they are in the next environment.
And we want people to feel like they're
in that next environment.
So the question is, how do we alleviate these problems?
Now, this statement from my PhD was.
In the augmented reality displays,
the combination of performing post-rendering warps in display
hardware to mitigate latency registration error and
using scene-adaptive, high-dynamic range color
to improve the consistency of virtual and
real imagery decreases the disagreement of the two worlds.
Now in order to demonstrate my thesis,
I wanted to do the following.
First I wanted to resolve registration error.
And for that, I needed a low latency, high-frequency display.
And I performed post-rendering warp in the display.
Second, to maintain appearance consistency,
I needed an HDR color display system.
And so I did.
I created a low latency, optical see-through, augmented reality,
color high-dynamic range, head-mounted display.
And this particular display had an average motion to photon
latency, that's when the user turns and the display updates,
for a grayscale of 80 microseconds, yes, microseconds.
And for 16-bit RGB FCR a 126 micro seconds an average.
For comparison a normal conventional display around
the order of 10 milliseconds or 10,000 microseconds.
So, in order to discuss this implementation and design
consoles behind it, I'm gonna cover the following topics.
Mitigating Registration Error, Generating Appearance,
Maintaining Appearance Consistency, and
then Conclude the same summary.
So let's begin with mitigating registration error.
What we're outlining here.
So, start with some causes and effects of registration error.
Some prior work that others have done in trying to mitigate it.
Talk about the render cascade pipeline,
implementation details, and some visual results,
and some latency results.
Let's first take a look at registration error.
Now, there's a lot of sources of registration error in systems.
Static error, which tends to be related to calibration or
physical properties of the system.
Dynamic error, perhaps the quality of the tracking system.
But the greatest source of error,
outweighing all others combined, is latency.
And that's the tracking system latency,
the rendering system latency, the system overall latency and
getting the images out onto display.
So, but latency is, overall is the greatest source of error.
Now, how do our users perceives this kind of error?
Basically, virtual objects seem to move discontinuously from how
when the user moves.
So let's take a look at an example, or
a pair of examples rather.
Suppose that we have a blue ball that we're gonna add an orange
outline to.
And now suppose that the user is looking at this ball and
turning their heads,
such that it seems that the ball is moving to the left.
Now in the best case,
the real and virtual object would seem to move together.
However, there's a couple of issues in reality that makes it
not so simple.
The first sort of way that this error can appear is swimming,
and that's where the virtual object seems to lag
the real object.
So here's our physical and virtual objects again.
We're gonna take a look at it and how it moves over time.
And so, as that ball moves to the left,
the virtual outline around it is delayed behind it.
And a judder is sort of a different effect,
that's where the virtual object seems to smear or strobe.
And so let's take a look at an example of that.
And so now as the oop, sorry, jumped ahead.
So now as the ball is moving,
that virtual appearance seems to catch up to it in stutter steps.
And it doesn't take very much latency to cause these effects.
Say in just 10 milliseconds of lag, looking at an object 2
meters away while turning 150 degrees per second.
Which is not very fast for
a head turn speed, that will result in five centimeters
of error which I'd say is quite visible to the user.
So how much latency is too much though?
Now, for VR,
Oculus has recommended less than 20 milliseconds.
But for optical see through AR, the tolerances are a lot
smaller because the user can still see the real world behind
the virtual, and suppose differences are very noticeable.
Now there is an article that discussed for
aviation purposes that the latency needed to result
in no more than 5 milliradians of discrepancy.
Now for a very slow head turn speed, okay,
that's pretty good latency tolerance.
But as the user starts turning their head faster and
faster, and 437 degrees per second is
sort of a speed that someone would turn and
look if they know where they're turning to look at.
That's a very small latency tolerance.
So how have people in the past gone about trying to resolve
these issues?
I'm gonna talk about three main methods.
The first is frameless rendering and
the key detail in this is that while normal rendering
is a series of complete frames, in frameless rendering
you're doing fast incremental updates to the scene.
It's less like a movie and
more like a bunch of small tweaks to the image.
And in those tweaks, the aggregation as perceived
by the user, yields a perceived image that should be better.
Another idea is a post-rendering warp, and
the idea in this one is, after someone performs a normal slow
rendering operation, expensive 3D rendering operation,
they follow it up with much faster warping operation that
makes a correction of the prospective.
The key idea is that while the slow rendering operation used
both post data, the correction which can operate quickly can
just grab the latest track information and
hopefully produce a nicer scene.
And as far as I know, the Oculus, the VIVE and
video graphic's drivers, all support in the driver,
post-rendering works for VR purposes.
And these post-rendering warps are made possible because of
recent GPU improvements.
More specifically, GPU preemption.
As a result, we're able to delay that post-rendering warp
operation as late as possible in the GPU, minimizing the latency
between the measurement and the transmission to the display.
But this is still happening in the PC.
And we can benefit by making this correction, even later.
So, to combine each of these ideas,
conventional rendering is expensive.
And it delays the output.
And if we perform incremental updates,
we can reduce the complexity and be able to update faster and
more frequently.
And lastly, displaying that post rendering warp operation
as late as possible reduces errors due to latency.
So from all this,
I present the Low Latency Render Cascade Pipeline.
In this, we start with say,
conventional rendering on the PC and
follow it up with a series of post rendering warp operations
from before But after that, we could do even more warps, each
of which is operating faster and faster than the one before.
Some sort of warps like 2D warps or skew,
rotation, offset translation,
those can happen later because they can operate very quickly.
At very end, we'd like to perform radio distortion
correction to correct for any optical issues, In the display.
Now this is sort of a complicated allocation of it.
In order to actually test this idea, I take this national
pipeline and simplified it into a subset of all these warps.
Simplification is that in order to examine post random work in
the context of the display hardware, and partly due
to the low latency tracking systems that we had available.
So more specifically, we're gonna do normal rendering in
the PC, do radial distortion correction on the PC, then
perform a 2D offset correction in the display hardware itself.
So let's take a look at some of the hardware used to implement
our prototype system.
The first was the tracking system.
For a low latency, we used a pair of shaft encoders,
so we could measure the user's pitch and the user's yaw.
And these particular shaft encoders gave us a latency of
about 30 microseconds, at most.
As we would, the system was intended for use by a person but
for all the videos that will follow,
we put a camera where the user's eye would be.
All the post rendering transforms I described,
are operating in an FPGA that's attached to the display.
And more specifically,
the operation we perform is a 2D image translation.
And lastly, the display itself, for the high frequency,
low latency display, we use a DMD Projector which was capable
of binary frame rate of 16 kHz.
What exactly is a DMD?
A DMD is a MEMS device, a Micro-Electro-Mechanical Systems
device, and it's basically a grid of mirrors,
each of which represents a pixel on a chip.
And each of those mirrors can be tilted electromechanically to
two different positions.
Either say to on, where the light bounces off the mirror and
out the lens, or off, where the light bounces into an absorber
or back towards the lamp itself.
Now typically in these systems,
the light source is constantly on.
So in order to get gray levels out of the device,
one must flip these mirrors back and forth very quickly.
In order to get color out of the device,
one often divides either with time by using a color wheel or
multiple colored lamps, like LEDs.
Or with multiple DMD chips, each one with a separate lamp.
But how do we go about controlling a DMD device?
Well, a DMD is basically a double buffered device
where you load the back buffer with a series of binary pixels,
and then at some point commit all those pixels to mirrors and
they all flip together.
But there's some constraints with those sort of operation
because it's a physical system.
Those mirrors require a little bit of time to move and
stabilize.
And the mirrors update in regions of the display.
So as a result, the optimal loading strategy for
updating this display is basically raster order,
top to bottom, back left for loading, and
interleave the commits of those mirrors to the display.
Now all of this brings us to the overall data flow in the system.
So there's some items to note in here, specifically that
the tracking data from our tracking system is supplied in
excess of both the PC and the display's frame rates.
The PC is supplying an image that's a lot larger than
the output image of the display.
And the display's FPGA system is transforming the image and
feeding binary pixels to the DMD.
Let's take a look at how that transformation works, and
why we need the PC to render a larger field of view.
So this is how we're generating the images.
Suppose first, that there was no difference between the live
tracking pose and the GPU rendered image tracking pose.
Now in this case,
we're gonna just take the center of the image, and
that data is gonna get passed on to the modulator.
So they're split up into multiple regions to illustrate
that incremental update of the DMD.
On the other hand, if the poses were different, mainly because
the users moved, and users are gonna often be moving.
Since the PC rendered that image image,
we're gonna pick a different region and
send that image onto the modulator.
Essentially, it's very straightforward.
We're using the difference in the pose angles for
the tracking systems.
Off, and multiply it by a calibration ratio that's
dependent on the display's resolution, field of view,
and so forth, to get the offset from the center of pixels.
Now the field of view requirements needed to actually
over render the image,
is dependent on the frame rate of the PC,
the latency in the system, the field of view of the display.
The output resolution of the system and the maximum
head turn speed that we're expecting the user to move with.
So in this case,
we were rendering 1080p video at 60 hertz.
Which gave us the ability for
the user to turn their head left and
right 440 degrees per second.
Which is not a bad speed and
vertically 150 degrees per second.
So what does all this look like in practice?
So for this first system I'm gonna present,
it's a 6 bit per pixel grayscale system.
And which is the first one we got working.
The physical scene consists of a small checkerboard that's
located about two meters away from the display.
And the virtual image is the corresponding checkerboard.
So if no error's in the system,
those two checkerboards should be exactly aligned.
Well, let's first take a look at what an uncompensated system
looks like with conventional latency effects.
So in this example, I'm moving the viewpoint back and
forth at about 50 degrees per second.
And this bottom portion of the video is recorded with
the camera looking through an optical see-through display.
And the other two are just some observational cameras.
So in that, oop, let's play that again.
Come on, let's play it again.
So in this video, take a note at the bottom that that virtual
overlay is lagging significantly behind the physical counterpart.
Now here's the result using our algorithm, running at the same
back and forth speed of 50 degrees per second.
And notice how those physical and virtual overlays stay much
better aligned then they did before.
>> Peter.
>> Yes?
>> Do you make any effort to predict what's gonna happen?
>> Nope, it's all live data.
So there's no prediction in the system.
And here's the same two played back at one-eighth speed.
So in this video, the top one is the uncompensated system, and
the bottom one is our algorithm.
And so take a note in the difference in divergence
while the HMD is slewing and
the similarity in convergence while the HMD is idle.
Also note that in the conventional display that
the overlay shows both swimming and judder effects.
Whereas in our display the overlay seems to move smoothly
and consistently.
>> You could try to predict and stick the virtual up there
at the predicted place where the camera is gonna be right?
That should possibly help things.
>> So in my experience,
when I was trying to do prediction stuff before,
it always ended up looking worse than not actually predicting.
>> Okay, I mean, it's just tough because you don't know where
you're gonna stop, accelerate and then go back.
>> Yeah, if someone is moving at a consistent speed,
then you can jump ahead very quickly.
But it's when someone is changing their acceleration
a lot, that you tend to overshoot or undershoot.
Undershoot when they're starting and
overshoot as they're stopping.
And so this particular system doesn't use any prediction
in either system.
There is actually some possibility for
prediction on a different time scale.
That might be helpful in that if you predict where it's going to
be, you might be able to make a smaller compensation
operation in the display.
Unless you still run into the same issues
where overshooting or undershooting and
you end up picking different regions anyway, so.
>> [INAUDIBLE] what was the motion photon latency?
>> 50 milliseconds and
the bottom one was around 80 microseconds.
>> Can you talk a little bit about why even in your algorithm
when you switch directions you see some swimming?
>> Yes, I can actually.
>> Can you go back?
I didn't notice that, sorry.
>> Okay, I'm sorry.
>> Yeah, some of them [INAUDIBLE].
>> [LAUGH] >> Is there an adjustment
camera?
>> So this is done with a GoPro.
We're running at 240 hertz by the way.
>> See, they're swimming right now?
>> Yeah, I see it.
>> Now it's [INAUDIBLE].
>> Same thing, let me pause it, yeah.
>> Now it's- >> So,
you're talking about this?
>> Yes.
>> Right there?
>> Yeah.
>> So in this particular system we used all aluminum pieces for
mounting everything.
And as a result the display was moving along axes that weren't
being tracked.
Because the aluminum was deforming,
that it was mounted on.
And so, we later replaced that with steel [LAUGH] to
fix that particular issue.
So if we had actually stopped moving and
added in these particular cases,
both images would diverge in a similar amount.
So this particular offset that you're seeing is actually
because of the camera.
The view point post is not where the tracking
system thinks it is, even live.
And so, it's because of the deformation that we're, but
the viewpoint isn't aligned.
>> Looks good on the right but not on the left. Yeah.
>> Yeah.
>> That's what you'd expect.
>> And similarly,
when the image is also disappearing on the right,
that's cuz we're leaving the field of view of the display.
It's only 30 degrees along the horizontal.
>> So you want to fix this issue?
>> Fix that issue.
When it's a lot more we're actually tracking directly where
the display is moving then that fixed that issue.
Any other questions at this point? Yes.
>> You said the registration
model was 2D translational.
>> Yes. >> That's just one 2D
translation for the image?
>> Yes. >> Not any per pixel.
>> [INAUDIBLE] >> Every
pixel moves on a single node.
>> [INAUDIBLE] >> The idea is we wanna,
in order to be able to run the system very fast,
we wanted a one to one mapping of pixels.
>> All right, I mean in realistic systems that?
>> Right. >> Is it okay if you comment
a little bit on If you wanted a more complex quality model-
>> Sure so
the idea along that is more along the lines of that notional
pipeline I showed earlier in that while this
operation could be the very last operation you performed.
You probably want something that's more depth dependent.
This also doesn't support this axis,
either of rolling your head.
That you would want other operations to occur.
Now, those operations may be more expensive, computationally.
And so they can't operate as often.
In this system, we were performing this operation
on every binary frame that went to the DMD.
So it's 16 kilohertz.
So it you were trying to do a depth dependent warp
that required more complicated math on a per pixel basis-
>> It would drop the frame rate?
>> It would drop the frame rate on those operations.
But the advantage is it brings it closer into alignment so that
in theory, the 2D Correction at the very end, Compensation.
Correction's a little bit of a strong word.
Compensation which should have a smaller operation to make.
The whole idea of these [INAUDIBLE]
ideas is that as long as
the operation that you're making is a small enough change,
then it should be better than not making the change.
And we only explored this 2D operation
because the tracking system only supported the 2D operations.
And we wanted to be able to look with a sort of gold standard
tracking system
as opposed to one that adds a lot of latency into the system.
>> Any questions at this point?
>> Okay.
>> So, there's that image, again, unstill.
Of course, this one I picked one of the better looking ones.
But, you guys caught the right issues.
And I had an explanation for it.
All right, so how fast is our system?
And I seem to have lost focus.
Come on there we go.
So a typical way of measuring motion-to-photon
latency is by using some repeating event that can be
accurately measured in multiple points in the pipeline.
And for this system, we use an oscilloscope and
measure the time between crossing 0 degrees on
the tracking system And the change in the output.
Now in order to see a change in the output,
we modify the system so that it would draw black
if we are negative and white if we are positive.
And that was just the actual measured angle not
a difference in angles between live and rendered
time but all their components of the system remain the same.
Now in order to classify the measurement we separate out
several components of latency and
capture four different events.
The initial time when the zero crossing occurred,
this is initiation, the motion.
The time when the display system received confirmation
the zero crossing occurred.
The time when the display system commanded the DMD in
order to change the output intensity.
And lastly a time when the DMD actually changed the emitted
light in response to that command.
Now in the example on the left which is a single sample, each
of these events is represented by arrays in the wave form.
And in the exists pickle one the n to n latency was
97 microseconds.
Now in order to examine sort of the range of possible latencies
we use a different mode of the telescope where the bands of
vertical risers like for example this, represents the minimum and
maximum possible situations.
And our particular system actually has a uniform.
Distribution of latencies.
So the average is actually in the middle.
So and the average latency of this particular implementation
was 80 microseconds.
Now a reminder for the uncompensated system
was 50 milliseconds or 50,000 microseconds.
Our system on the other hand, nearly 80 micro seconds.
Now, it's worth noting a few things.
First, a significant portion of the end to end latency was still
due to the tracking system we were using,
which ranged between 15 and 30 microseconds of lag.
And the variation of latency was mostly due to
the phasing between when the motion was detected, and
transmitted, and when the frame is going to be transmitted.
And operated best when the motion occurred
just before we're about to send the frame to the DMD.
And operated worst when the motion
event was received just after we start transmitting.
We didn't change the image part way through in it.
So every binary frame was a single post decent information.
We don't wanna deal with any image tearing artifacts.
So you're turning back to that system data flow diagram.
The next problem is how do we take these PC generated with
images and generated a series of binary frames on the DMD?
And that brings us to generating appearance.
Now, these generating appearance,
on this binaries list, is a task from modulation,
and we'll discuss some classic modulation schemes first before
we talk about my new algorithms and its result.
But why do we care about modulation at all?
Recall that a DMD is a binary device,
each pixel can only be on or off.
And so in order to generate that grayscale we need
to flip that mirror back and forth at a high frequency.
Now there's a bunch of prior strategies for modulating these.
A very simple one is Pulse Width Modulation, or PWM.
Now in these graphs at the bottom, the orange color
represents the on times and the black represents the off times.
And so I'm trying three cycles of four bit
Pulse Width Modulation.
Now, Pulse Width Modulation works by performing two
part pulse groups, the on pulse for some amount of time and
then the off pulse for some amount of time.
And those on and off times are proportional to amount of
brightness that you want.
So it's directly with proportional to the exact
intensity.
Now, the Pulse Width Modulation, very simple to implement.
Disadvantage of the Pulse Width Modulation,
it produces low frequency harmonics and more specifically,
you can see a visible flicker at particular operational rates.
Velocity doesn't require any memory issues.
Delta-Sigma Modulation which is a type of
Pulse Density Modulation tries to Optimize the Pulse Sizes.
It still as the same numbers of ons and offs for
as Pulse Width Modulation but each of those on and
off transitions occur more frequently.
So if we compare these two graphs,
you can see the amount of on and off time is the same between
them, just we're changing it more frequently.
We're maximizing those transitions.
Now, there's some issues with this.
That it requires a lot of additional memory,
you now need to basically keep in accumulation value in order
to operate.
But it reuses those high frequency harmonics so
you get rid of that potential flicker issue.
So how can we approve upon these?
So we want a new algorithm that was compatible with the low
latency rendering operation.
It has the memory cost of postal modulation,
meaning no additional memory.
And but how behavior's similar to pulse density modulation,
high frequency modulation to keep that flicker away.
And so we created pseudo-random pulse density modulation.
Where, essentially, we use a looping random number generator.
It produces the same numbers of ons and
offs as either of those previous two methods, but
it produces a random permutation of those ons and offs.
So we get good numbers of transitions, but
it doesn't require any additional memory.
So the way this works is,
we compare a random number with the desired intensity.
If it's less than the desired intensity, we choose on.
If it's more than the desired intensity, we choose off.
Or if it is the desired intensity we also choose off.
Makes it very simple to implement and
it produces high frequency harmonics.
So we avoid that flicker issue.
And so with this particular modulation scheme, for 6-bit
per pixel grayscale, we got 15,552 Hertz binary frame rate.
For 6-bits,
we only required 63 binary frames per integration cycle.
That's the cycle required to produce a grayscale value.
And that was from a speaker system about four milliseconds.
And so visually, it looks pretty good.
So this is a demo of PR-PDM with the grayscale image.
And more exciting checkerboard.
In this video, the still back and
forth 50 degrees per second but
the photo object is registered to that checkerboard so
you can still see the tracking.
Note that the portion of the overlay that's intended to be
aligned continues to remain aligned.
Despite this fast motion.
Now while you have now a low latency display, and
we're looking at algorithms for low latency operation,
we still want them to look appropriate for the real world.
We want them to be a little more exciting.
And so that brings us to one of the last major topics, and
that's maintaining appearance consistency.
And the motivation is high-dynamic range.
So I'm gonna talk a little bit about what it is.
Some prior work in high-dynamic range.
Another new algorithm for
modulation and seeing it after brightness.
So why do we care about color and HDR?
Well we want to match the real world.
The image on the left isn't quite going to cut it,
it's kind of a dull grayscale.
And the image on the right isn't quite right either because that
right-most teapot is way too bright for its dark background,
it would kind of blind the users who are looking at the display.
But the real world is also a high-dynamic range environment,
with great variation in scene brightness.
And the variation can vary both spatially, both typically in
the form of shadows, temporally with the time of day, or
extremely with unique environments or
perhaps maybe the eclipse coming up.
Now for compelling augmented reality,
the virtual image must match the real world.
Otherwise the virtual is invisible, blinding, or
simply just disconcerting.
Even this simple meeting room, pictured here,
has vastly different brightnesses across the scene,
across multiple orders of magnitude.
But how have people created HDR displays in the past?
Typically, by combining spatial light modulators.
The example on the left uses a low resolution LED backlight
with a high resolution LCD panel.
It's actually similar to, I think, how current TVs work.
And basically is they split a 16-bit HDR response between that
LED grid and the high resolution LCD.
The system on the right combined a high resolution projector and
a high resolution LCD panel.
In this particular example, they specifically misalign the two so
that you could see where those are each contributing.
And they were able to get a contrast ratio of about 700 to 1
with their system.
Now, all the previous modulation schemes I talked about,
pulse width modulation, pulse-density modulation,
the pseudo-ran pulse-density modulation.
They all require exponential time in order to modulate.
Now, that begins to be a bit of an issue when you wanna start
doing HDR.
Say a 16-bit RGB HDR,
well then almost 200,000 binary frames will be needed
to show all the possible variations you could get.
And an integration time of 12 seconds [LAUGH] is not very good
for a AR display.
So what can we do to alleviate this problem?
Well, we need another new algorithm for
controlling the display.
We want it to be still compatible with low latency
rendering.
We want it to still have the zero additional memory cost.
And we need it to be faster than exponential time with regards to
the bit depth.
So introduce direct digital synthesis, DDS,
which combines the binary modulation of a DMD with a fast
variable brightness illuminator.
And with this system, we were able to get a integration
interval that was linear with regards to the bit depth.
So let's take a look at what the algorithm looks like.
Let's take a look at a small example.
Basically you start with the most significant bit,
and for every binary frame, for every pixel,
we examine the bit value for that.
If it's on, if it's a 1,
then we turn the mirror to an on position.
If it's off, then we turn it to the off position.
Set the illuminator that we're gonna shine on this DMD to
the appropriate power-of-2 brightness.
For that MSV, we want maximum brightness, then half,
then quarter, and
so on and so forth, and then advance the bit rate in cycles.
So let's say we're trying to produce the intensity 10 as
a 4-bit number, which could have a range between 0 and 15.
Well in binary, 1010.
And so we need to flip the mirrors, control the brightness
illuminator and that contracts what the eye is gonna see.
And so basically, so
the illuminator's gonna do the power-of-2 intensity.
We're gonna cycle on/off, on/off based on that binary value, and
so the user's gonna see a combined brightness that is
about five-eighths or which is about ten-fifteenths.
So let's compare all of these modulation schemes and
kind of what they look like.
In each of these graphs, we're seeing 4 cycles,
of 4-bit pulse width modulation, delta sigma, and PRPDM.
But in that same amount of time, we're able to actually perform
16 cycles or rather 15 cycles, of direct digital synthesis.
And so note that these heights of all these graphs have been
normalized so that the all four algorithms are outputting
the same actual energy intensity.
And while these schemes all use a constant illuminator,
this one uses a variable brightness illuminator.
The basic idea is directional synthesis is linear time
as opposed to exponential time.
Now, because we're starting to mess with the illuminator,
direct digital synthesis required a few hardware changes.
We needed that high frequency controllable light source.
So very fast, pulse of light, variable brightness,
and synchronized to the DMD.
We really get some good results with it.
More specifically for a 16-bit RGB we were able to achieve
a 50- >> Sure.
>> How the hell do you do that?
>> [LAUGH] >> Is this commercial stuff,
is the stuff that you totally design yourself?
>> So my focus was on programming the display to take
advantage of it.
And my fellow grad student,
Alex Blate designed the hardware that actually drives the LCD.
That's going to be part of his dissertation.
But some basic ideas on it, we're
not using pulse width modulation to control the light source.
That's normally how you control an LED to change it on and
off very bright, or very fast.
But we'd need a five or six gigahertz modulator, and
most LEDs don't like changing that fast.
So it's a series of,
Very precise off amps combined with a DAC that
basically drives the current we want exactly.
In order to be able to turn the LED on and off very quickly,
we keep it in a nearly-on state.
So that we aren't gonna get the initial overshoot that you get
when turning on an LED.
Or the delay before shutting off, cuz when we want to turn
it off, we take it to that nearly-on state again.
And so the LEDs are brightness controlled via the current.
And so we're able to basically produce the exact brightness
that we want after a color calibration step.
>> So why do people typically just vary these on and
off versus doing this thing.
Does it take more power?
What's the [CROSSTALK] >> If you were only trying to do
an eight bit display then you can control.
You can either time division multiplex it, so
you're spending say, half the time on one single bit.
Then a quarter of the time on the next single bit
and so forth.
And just keep the LED at a particular brightness.
And with a 16 or 32 kilohertz display,
I think you can get about seven bits out of that.
You could just have half the time be at full brightness, and
half the time it be at a different brightness.
And so you don't have to pulse the LED very quickly,
you can just again, with time division multiplexing of it.
That's mainly how most of those DMD projectors work.
>> [CROSSTALK].
>> If you keep increasing the power of the lights,
at some point,
you're not going to be able to modulate it effectively, right?
>> We were able to make it uncomfortably bright,
at least within 11 hertz.
>> Because the sun is pretty uncomfortably bright, right?
>> Yep, [LAUGH] if I recall correctly,
we're not quite bright enough to overwhelm the sun.
>> That's not what I mean though.
I mean, can the mirrors still, can you find detail around that
stuff or does the energy- >> Are you talking about that
the light is spilling off of the other mirrors?
No, that really wasn't an issue actually.
The bigger issue with regards to getting diffusion
of the image was when we were using a back projection surface
for the projection.
We ended up eventually doing direct view.
So that you're looking essentially through just lenses
at the DMD itself.
Originally the back projected version was a little bit
blurrier.
But we didn't have any blurriness issues resulting from
a very bright light source.
Because when the mirror's in the off position, it's
reflecting that light actually back at the lamp itself.
So it's not leaving through the lens system.
>> The mirror is, but
there's stuff around the edges of the mirror and all that too.
I don't know if that [INAUDIBLE], but
don't worry about it.
>> I didn't notice any issues.
But with this we were actually were able
to get 16 bit per color and
48 binary frames, as opposed to 6 bit grayscale and 63.
So the integration cycle is actually shorter on this system.
So it was faster than PR-PDM, had greater color depth.
So, for this particular example, the desk that we're gonna
show some virtual objects on is about three meters away.
We've got a series of light sources around the room
including a very bright LED.
>> Wait, 16 bit per channel >> Per channel
>> So 40 bits.
>> Yes.
>> Okay.
>> Yep.
>> That's great.
>> And then that's for the lamp that you're gonna see
in the room that's a Turner 50 watt equivalent LED light
that was driven by a DC power source so
it was actually even higher watt equivalency.
Now in the first demo the input to the system was a pair of
colorful static teapots of identical appearance.
And this is just to show the maximum range of the system.
So the viewpoint is static and the system is configured such
that the left teapot is using the upper 8 bits and
the lower teapot is using the lower 8 bits.
Now the range is exceeding that of a normal camera.
This is a standard dynamic range camera or
low dynamic range camera.
And so in order to be able to film it,
we use a neutral density filer that we move back and
forth in front of the lens.
So you're able to see both tea pots.
But the one on the left is generally six times brighter.
The particular filter that we're using was cutting 98% of
the light, And
so here's some photos again of the same experiment.
So you're able to see the Lower 8 Bits, the Upper 8 Bits and
both together with the neutral density filter.
Now the next demo that I'm gonna show in order to verify that we
still have a low latency capable of display.
Use a different input.
The teapots are now 3D rendered, spinning, textured,
colored, and they're spinning around their own axis.
And the display pose is unlocked so
that the display is able to move.
Now in this demo, the tea pots have fixed brightness, and
we're moving the display at different speeds.
Starting slow.
So latency compensation disables on the top,
latency compensation enables on the bottom And so
you can see a little bit of jitter on the top one and
that they're starting to shake on the table.
But very stable on the bottom one.
And I remind you that 50 degrees per second is not a very fast
head turn speed.
That's about like this.
So the misalignment doesn't increases with that
rotational speed and
it's just like those checker boards that we saw earlier.
Now the latency analysis for the system was very similar
to that of the previous implementation.
The latency varies mostly as a result of waiting for
the transmission and
stabilization of the mirrors to occur.
Though in contrast the PR-PDM system had a constant on
illuminator so
as soon as the mirror moved you could see the change.
Now we have to wait for all the mirrors to move to the right
position before we pulse the light for a very short time.
So these changes take more time.
So, overall, this HDR system had an average motion-to-photon
latency of about 126 microseconds.
This isn't quite everything yet.
Now that we have HDR support,
how can we make sure that the virtual and
real objects seems to be better co-exist well adaptation?
One more modification of the system
we use HDR light sensors to observe the real environment.
These are shown up here at the top right.
These are just in a horizontal array of four sensors.
Each sensor is observing a separate horizontal
quarter of the display's field of view.
These picture sensors though, they only operated at a 40Hz but
we interpolate their data at 1,514 Hz.
That was limited by how fast the CPU could run controlling them.
And in order to interpolate them we use five piecewise linear
equations spatially and an error decay function temporally.
Excuse me.
Now, this conversion operation is occurring in a display on
every binary frame.
We take the GPU's 8-bit per color output, so standard or
limited dynamic range.
The display is gonna interpolate it based on the measured
scale factor.
That's basically a comparison of the measured scene
brightness in that region,
compared to with the maximum supported brightness.
So if it was as bright as it was fully supported, we're gonna
basically shift all of those eight bits into the upper eight.
Say, if it was half as bright, half shifted up only seven and
any fraction there in between.
So the issuer modulation output is a product of the GPU pixel
and that scale factor.
And to demonstrate this, I have a pair of demos to show.
They use the same spinning teapots as before.
In the first of these demos,
we move a pair of narrow flashlights of differing
brightness around the otherwise dark physical scene.
Note how each teapot brightens and
darkens relative to its proximity of the spotlights.
The brightnesses of the teapots also remains aligned to
the teapots as the display rotates.
>> And GPUs have itself in other ways, right?
>> Yeah, but we didn't put the annotation
as low agency as well.
So that's why it's on the display as well.
We did talk about running the GPU's in a high dynamic
range mode and it required a little bit of change
to the memory architecture we have in the system.
Cuz now the system would need to store all 16 bits as oppose to
computing it on demand.
But there are some other reasons you'd want HDR-
>> Instead.
The focus on this was just the last scene adaptive.
And one more demo of scene-adaptive brightness.
So it's the same teapots,
but I'm gonna be moving a very bright desk lamp back and forth.
And again, note the matched response
of the teapots to the light.
>> What was the resolution on your light sensor?
>> There were four, in the brightness resolution, or the,
it's a- >> The lateral resolution.
>> There were four sensors.
Each is a one value sensor.
They each give them an ambient light value as measured.
The reason- >> Is it four just lateral-
>> Four lateral
horizontal sensors.
And that was probably because we hadn't found an HDR camera that
we would like to be able to put them in a particular spot
for it.
We get a more per pixel basis since we were just
interpolating it.
And as each light sensor is basically physically limited
as to what it can see with the.
So nearing the end,
let's return to that original two part thesis statement.
Augmented reality displays, the combination of performing
post-rendering warps, in display hardware to mitigate latency
registration error and using scene-adaptive, high-dynamic
range color to improve the consistency of virtual and
real imagery, decreases the disagreement of the two worlds.
The main idea
was that of performing incremental operations.
First, by incrementally re-rendering the scene in
the display hardware.
And second incremental aggregation of the appearance,
over multiple steps, or binary frames.
Now again, generating that multi pass aggregant,
require modulation, the classic algorithms were simple, but
they can cause flicker, or they cost memory.
PR-PDM reduces that flicker and memory cost but while it was
good enough for an 80 microsecond grayscale display,
it wasn't good enough for color HDR.
And direct digital synthesis, also reduced that flicker and
the memory cost and was able to achieve much greater color depth
while still having low agency capabilities,
126 microseconds for color HDR display.
And with DDS- >> And
HDR light sensors we can scale the brightness of the virtual
scene by matching it to the physical scene.
But what are some other limitations in
the current system and where can we go from here?
Well there's several.
[LAUGH] Let me talk about some of them.
First off, with regards to inertia,
the system's way too heavy.
It's more a display that wears you instead of you wearing it.
>> [LAUGH] >> Instead, if we weren't using
these standardized boards and using more specialized task,
optimized hardware, we could probably shrink this down a bit.
And currently the system uses shaft encoders for
low latency but the limits motions to yaw and pitch.
At the time, no known commercial tracking system provide the 6D
equivalent upper latencies that we were targeting.
Excuse me, the latency we're targeting.
[LAUGH] They're accurate systems out there.
And I'll give you six off but not at 30 microseconds.
>> What was the fastest you did find?
>> We tried the high ball tracking system and
that was best case 16 milliseconds.
And the optic tract latency but
it's only a 100 hertz tracking system.
The opto track tracking system, no latency characterization but
it, the way it operates is a given LAD's fire at 200 Hertz.
And then you divide that by the number of LADs
you're using to track with.
As well as dividing by 1.6 or 1.3.
And so.
So we're now mechanical 60,
cuz I think there used to be one, right.
>> We had a fire alarm which- >> That's rusty.
>> But it would only operate at 25 hertz.
>> Wow, okay. >> Summer time I thought
they were fast.
>> There might be faster ones now, but for
the equipment that we had this was the best we could do.
>> Do you have any idea about how we can make a system that
has no tracking system that are [INAUDIBLE].
Like a super high version of the [INAUDIBLE]
>> I'd be willing
to talk with you on that offline but
not on a setting that is going on public video, sorry.
[LAUGH] >> Probably.
>> [LAUGH] >> In fact, with respect to that
point, these systems, they could be a predictable,
a dynamic system which will run at a much higher frame rate.
It would be doing a prediction out here, right?
That sets up a software to
be combined to work at high frame rate.
So I guess you're not offering those solutions?
>> So if you had a tracking system that operated at least
>> Faster than display,
right, than a conventional PC display, right?
My issue with prediction is sort of the how long are you gonna
try and predict in the future.
The smaller you can make that window,
the less area you're gonna result from your prediction.
It might be not quite the answer you're looking for, but.
I was thinking when people are doing visual inertial slam and
seeing that their systems can run at a really high frame rate,
what they're basically saying is that the visual subsystem runs
at 30 hertz or so.
>> You're talking about sensor fusion.
It goes through where you have a visual system that's running
a camera rate, and an inertial system that runs at a kilohertz.
>> Right. >> Yeah, okay, yeah.
That has some possibilities, but.
I hesitate to say more right now.
I'm sorry.
So then some more future work potential.
Algorithmically, the current system was only exploring this
2D offset, I again limit because of the way we we're
tracking, and that radio distortion correction
was operating in the GPU which actually meant that the 2D
translation occurring after the GPU could shift it worst than if
we're handle performing radial distortion correction at all.
If you were,because it might have one of the shifts
one pixel, six pixels one way,
and then another pixel next seven pixels another way.
And do global 2D offset that can cause some issues.
We'd like to put real distortion correction at the end.
Or even more stages so that if we had a better tracker,
we could explore doing more of these warps in between.
And it's possible that some of the selection of these warps
might be situationally dependent as to what the use case is.
That sort of another open questions to what could be seen. Now-
>> [INAUDIBLE]
>> If the user is likely to be
moving Translationally and
the objects that they're looking at very greatly in depth,
are more likely to what a depth, depth based correction.
But if you're generally looking at an overlay that is at one
depth then you might not need that depth correction as much or
you may go to do a single Depth dependent correction that
operates globally as opposed to per pixel basis.
Now, part of the reason this operated very fast is because it
was very highly paralyzable for the one to one relationship.
And some of these other corrections Might
begin to require some crafty memory algorithms there.
16 bits of dynamic range is nice, but for that photograph I
showed in the meeting room, it's not quite enough.
The current system used fixed width pulses to eliminate
the DMD, and ten microseconds was the longest pulse we
could put in there before it would start interfering with
the mirrors changing on the next frame.
But by adding additional shorter pulses of light we could,
even with a 16-bit DAX, there's a 16-bit part there, we could
get even deeper darks out of it and widen dynamic range.
Possibly, as good as 20 bits per color
based on the minimum time we can pause the light on and off.
It was about 600 nanoseconds rise and
fall time on that light by the way.
And then scene brightness detection,
as you commented earlier, very low resolution,
spatially that we're looking at this.
Which produced occasional artifacts on those transition
zones, because we're doing a linear interpolation
between them.
So alternatives would be some higher spatial resolution,
sensor arrays, ETR cameras.
A second DMD as a camera where you're doing some crafty math
with a single light sensor where the LED would be for a DMD.
And so last, scene relighting as opposed to just trying to scale
the brightness, so that we're not either blinding the user or
hiding the image completely.
When a system could take a look at what the scene room
conditions are to figure out where the lights are.
And adapt and relight the scene,
which gets into some computer vision research.
So to conclude, present some work on the focus on algorithms
for optical see through AR-based systems that used, say, DMDs.
Or other spatial aid modulators that are fast.
And VR systems could also benefit from low latency
HDR displays despite the less stringent requirements.
So I'd like to acknowledge some funding sources that I was
funded by for my Ph.D.
More specifically for the [INAUDIBLE] place, work was NSF,
and BeingThere Centre, or the BeingTogether Centre funded.
I was advised by Henry Fuchs and Greg Welch, as well as for
investment committee was Oliver Bimber, Anselmo Lastra,
Montek Singh, Bruce Thomas, and Turner Whitted.
There were two first author papers by me for
this particular work.
And several other papers I did while at UMC.
And so I'd like to thank all these people that worked with
for all my time at UMC on various parts of the projects.
And thank you for your attention.
>> [APPLAUSE] >> So
if you were gonna build a VR type,
you'd still use these digital mirror devices?
>> [INAUDIBLE].
>> See through.
>> The issue with these DMDs is sort of the bulkiness,
cuz the DMD, you got a sensor, a lens system on it, and so forth.
Now, there are OLED displays that can be operated in a binary
fashion at a kilohertz.
There was, I think a patent or publication by nVidia where they
were controlling six lines of a OLED display at a kilohertz.
And they can shake this device back and forth and
the image stayed locked on.
So for DMDs are nice for being able to program nicely.
And to sort of set up these actual systems, but I'm not,
they are kind of faulty.
>> Well it's nice for HDR, too, OLED [INAUDIBLE].
>> Yeah, with that OLED you're running it in binary mode.
And so you'd start to run into that integration time issue
as well.
>> Yeah, but
the Lumis Play actually will control the brightness-
>> Of that Lumis Play, yeah.
But other HDR displays used a backlight and
a LCD panel that modulated both independently,
To get their 16 bits, because they had the 8 bits of
the backlight and the 8 bits of the modulator in front.
>> Can you do something with an [INAUDIBLE] type display?
That being feasible in a- >> So I haven't worked with
those tests, so I'm not really familiar with them.
>> Okay.
>> Have you found any other prior art relating
to your changing illumination for
high dynamic range?
>> I think they have been used for the control mechanisms.
So I don't think we invented the DDS label for it, but
I have not seen it used at these rates before.
>> So for the second part of the talk,
you focused on easier displays.
And you're trying to address this problem of brightness
consistency in the virtual objects that are rendering.
They seem orthogonal problems to me,
meaning that if you didn't have a HDR display,
you still have to solve this problem somehow.
And am I right in saying that?
>> If we didn't have an HDR display with
the brightness constancy?
>> You still want to solve this problem in the eight
bit frame that you have.
And you are still trying to solve that problem to a certain
extent.
>> Yeah, I mean- >> That's something you
are writing Do it.
>> I mean, you can >> What I mean is that its
modulational phrase, the colors that you're rendering and
adapting them to the screen brightness based on some light
sensing.
That would still be done in HD or less environment.
>> Yeah, think of your cellphones, for instance, where
they're adapting to how bright the sun is that shine on it.
That's actually is what the main
example that I thought of.
Instead of having a constant on, full intensity backlight, you
could have a variable intensity but not pulsing backlight.
And so you could cycle either with a normal modulation scheme
or still with, well, if you're doing DDS,
you might as well get all the bits you can.
You can get faster if you're only doing it in 8-bits.
Or even if you knew that the scene was globally consistently
bright, and
that you're gonna pick these particular brightnesses, and
all the other binary pixels were always zero, skip them.
But if you're gonna do that dynamically,
now, you need to rescale your brightnesses so that you're
night getting brighter because you're doing the brighter ones
more frequently than because you're only doing subset.
So there's a lot of potential adaptations in there.
>> What's your idea of what you wanna do in the next five years,
research wise?
>> So from a general perspective,
I'm interested in ARVR and something to do with games.
And so I like being able to work on projects
that involve hardware and software combinations.
And that you're able to see something visually at the end
of the result, so that instead of just seeing a number on
a screen, you can have a system that somebody can take a look
and see, yeah, I see why that's better than what came before.
But that's sort of a general answer.
For a more near term,
if I was able to keep working on this type of display, I'd be
looking into some of these issues I was talking about.
>> Do you think that this particular design actually
has legs in the sense of being able to really reduce it,
turn it into a viable device for [INAUDIBLE].
>> I'd need to do some more exploring on sort of what other
miniaturization and other sort of display
technologies exist as alternatives.
So I'm aware of DMDs, but they're a little bulky.
I'm aware of OLEDs.
They don't operate quite as fast.
And I'd like to keep looking to see if there's some
alternatives that- >> So if you had the path,
you would take it?
>> Yeah.
>> I see, okay.
>> A DMD is the main limiting factor when it comes to bulk?
Are there other factors involved here that [INAUDIBLE]?
>> Well, on these FPGA boards, most of the components on those
are related to power for those FPGAs.
Or they contain components that we weren't even using on
the board.
For instance, in this particular picture, the only thing we're
using on the original FPGA board that controls that DMD,
on that FPGA that's weak in control is we're just
connecting the inputs to the outputs.
There's no computational logic on that FPGA.
It's all on the vertex seven, that's the outside board.
So if one was able to create a custom board that didn't
requires custom interconnect board and
a FPGA that's only acting as wires and
another FPGA that's a black box that TI programs.
If that was all integrated together,
it could be a [LAUGH] smaller package.
[LAUGH] >> So then if I and of course,
as John mentioned, the tracking system, right?
>> Right.
>> You need something that's at the latency levels you want.
>> Yeah.
>> Did you take the system in artificially increased latency
to see where things got uncode- >> So that would be another
interesting thing to do from a user perceptual standpoint.
We never did any user perception studies because we
figured that the inertia in this system would be too much
of a confounding factor.
But if this is enough, then one could start, yes, injecting
delay into the system to figure out what is that threshold.
How much is enough?
And I might also guide what sort of operations do you need to run
at that time scale then?
But I don't have any numbers for you on that at this point.
>> When you say, you did a 50 degrees per second or
300 degrees per second, how did you Measure that and.
We use a metronome and it.
>> [LAUGH] >> And it stops for the.
>> Yeah.
>> Well, the stops for
as looking at this protractor.
Okay.
So it's all just kind of.
>> It was approximate.
[CROSSTALK] >> So
it was 50 degrees per second on average.
So it was probably faster at the fastest and slower at the ends.
It was always human controlled, so there was no.
We weren't using a robot to do it that we could then predict.
There's still a human in the loop there,
because the camera was recording that.
>> Did you try to have anyone to try and use their neck for that,
do you know where the camera is?
>> Correct.
But if you were to visit UNC
there's a semester in place to try out.
>> Warm up my neck first.
>> Well it's all supported by all this aluminium framing.
It's a rig that's about this wide.
>> No way it's just inertia.
>> Yeah, all this stuff, if I detach this bar,
which this is all supported on, I think it was seven kilograms.
It was either seven times or seven kilograms.
I don't remember which. >> We're going
to replace the aluminum with the steel?
>> This component right here that was the shaft.
It was basically the shaft could wobble and
that's where all the wobble came from.
So the shaft is now a steel shaft.
Is has different better ball variance.
>> UNC grad students [INAUDIBLE] >> That's actually the logos
at a particular height so I'm a chair for the chair of the board
and kind have [INAUDIBLE] the height the maximum height.
>> So you would offer more boundary?
>> So up and lower boundary.
>> [LAUGH] >> So
you've actually looked through this thing.
We've seen the videos that you've ran at high speeds so
all the artifacts are obvious.
What's the perceptual effect?
>> So you can see the difference you can
definitely diverge and stay stuck together.
The bigger issue is it's a very small eye box and
so the hard part is keeping your eye in the right spot so
that you can see any image.
Now that's mostly a result of switching
from a rear projection surface to direct viewing the DMD,
which the rear projection surface
gave you a larger eye box where you could see the image, but
then you could also put your eye in more often in the wrong spot.
When you're direct viewing,
you can only put your eye in the exact spot and
as long you can see, you can see them line up.
Without the brightness adaptation,
which as you can see why my left eye is so bright here.
[LAUGH].
It's very bright to look at if the adaptation is not turned on
which we can simulate also by just moving a flashlight in
front of these light censors to it to dazzle people.
But I'd say one of the more factors is those light censors
with regards to the scene adapter brightness.
But everyone that has looked at it and
seen the on and off can see the difference.
>> Did you or Alex have anything to do with
the hardware for the elimination or the display card?
>> So Alex created the boards.
The custom boards for the inter connect as well as
the LED driver board for elimination.
>> But in terms of sourcing LEDs for
the [CROSSTALK] >> We export several
and the main thing we are after were that each one had
an independently controlled anode and cathode.
And that the RG mini were very close together in common package
that we could they could all fit through the hole of
the culminator for the DND would work.
>> Should you collaborate on that.
>> Yeah.
>> Web system designs and stuff.
>> Yeah >> I mean the optics and
LED and.
>> Well the object is mostly VStock objects that came with
the projector.
>> Gotta change the LED and- >> We remove the LED package and
Alex developed that part and them my focus is mostly on
the control software on both the FPGA and the PC.
>> [APPLAUSE]
Không có nhận xét nào:
Đăng nhận xét