Description of problem:
(courtesy Steven Rostedt)
When X writes to the xen frame buffer, there's no page currently
mapped. So this is the trick that Xen tries to pull off. It maps the
page and flags it as faulted in.
There also exists a worker thread that wakes up via a timer that gets
set when we need to do a refresh. This thread goes down all the
mappings looking for the mappings that have faulted in, and calls
zap_page_range on those mappings. It then goes through the event
channel to get Xen to update the buffer for what the user sees.
What Xen is trying to do is that the next time X writes to that page,
it faults in again (since the mappings no longer exist) and we go
through this process again.
Now the problem is in the details. To check for a fault, in
xenfb_vm_nopage (called by do_no_page) it increments a variable
associated to the mapping (map->fault++). So that it knows that it
needs to update this page and zap the pages for it.
In the worker thread code, there's a loop that goes down all the
mappings looking for any that needs to be updated. This is checked by
seeing if the mapping map->fault is greater than zero, it then zaps
the pages, and clears the map->fault.
There was locking issues with this code that the patches by Markus and
Stephen address, but they don't solve this current problem.
The problem is that from the time in memory.c: do_no_page calls the
xenfb_vm_nopage to the time it actually puts the page into X's pte,
the xen worker thread can run. In which case it will zap the pages
(removing the pte from X that doesn't yet exist). But what does
happen, is that the code in do_no_page, puts the page into the pte
anyway. And that means further writes to that page by X will not be
trapped, and then that info will not be transfered through xen so the
user sees the updates.
Basically we have this:
X -> writes to some address in the Xen frame buffer
Fault - call do_no_page
grab xen page lock
get page mapping - call xenfb_vm_nopage
release xen page lock
return back to do_no_page
(preempted, or SMP)
Xen worker thread runs.
grabs xen page lock
Looks at mappings
Finds this mapping
Zaps its pages (but page not in pte yet)
Sets map->faults = 0
releases xen page lock
(back to X process)
Put page in X's pte
Oh well, we wont be updating the writes to this page anytime soon.
Stephen Tweedie adds:
I suspect that we need a much deeper think about the
problem, to see if we can rely better on core VM infrastructure for
this. For example, we're marking the dirty rectangle in
xenfb_vm_nopage, even if the area is only getting read, not written.
Can we fix that? And we should replace the zap_page_range() call, whose
locking we're unsure about, with a call to unmap_mapping_range(), which
has several advantages:
* It is already fully exported
* It deals with locking automatically via the address_space
* It automatically iterates over all the vmas on the address_space
without us having to loop over them ourselves
but that requires adding the address_space concept to xenfb, which I
think is desirable but which requires a bit more work.
The race has not been reproduced in the wild.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time. This request will be
reviewed for a future Red Hat Enterprise Linux release.
The best long-term option could be a conversion to deferred I/O (see
Documentation/fb/deferred_io.txt). Might not be the right thing for RHEL-5, though.
Upstream kernel went with the solution mentioned in comment#4, in 2.6.26. We could replace RHEL-5's xenfb by a backport of upstream's. However, taking a new, untried backport to fix a race condition that hasn't been reproduced in the lab, and hasn't generated any user complaints so far seems like a really bad idea to me. Therefore, I'm closing this WONTFIX.
If you think we should fix it, please reopen the bug.