Description of problem: (courtesy Steven Rostedt) When X writes to the xen frame buffer, there's no page currently mapped. So this is the trick that Xen tries to pull off. It maps the page and flags it as faulted in. There also exists a worker thread that wakes up via a timer that gets set when we need to do a refresh. This thread goes down all the mappings looking for the mappings that have faulted in, and calls zap_page_range on those mappings. It then goes through the event channel to get Xen to update the buffer for what the user sees. What Xen is trying to do is that the next time X writes to that page, it faults in again (since the mappings no longer exist) and we go through this process again. Now the problem is in the details. To check for a fault, in xenfb_vm_nopage (called by do_no_page) it increments a variable associated to the mapping (map->fault++). So that it knows that it needs to update this page and zap the pages for it. In the worker thread code, there's a loop that goes down all the mappings looking for any that needs to be updated. This is checked by seeing if the mapping map->fault is greater than zero, it then zaps the pages, and clears the map->fault. There was locking issues with this code that the patches by Markus and Stephen address, but they don't solve this current problem. The problem is that from the time in memory.c: do_no_page calls the xenfb_vm_nopage to the time it actually puts the page into X's pte, the xen worker thread can run. In which case it will zap the pages (removing the pte from X that doesn't yet exist). But what does happen, is that the code in do_no_page, puts the page into the pte anyway. And that means further writes to that page by X will not be trapped, and then that info will not be transfered through xen so the user sees the updates. Basically we have this: ===== X -> writes to some address in the Xen frame buffer Fault - call do_no_page grab xen page lock get page mapping - call xenfb_vm_nopage map->faults++; release xen page lock return back to do_no_page (preempted, or SMP) Xen worker thread runs. grabs xen page lock Looks at mappings Finds this mapping Zaps its pages (but page not in pte yet) Sets map->faults = 0 releases xen page lock (back to X process) Put page in X's pte Oh well, we wont be updating the writes to this page anytime soon. Stephen Tweedie adds: I suspect that we need a much deeper think about the problem, to see if we can rely better on core VM infrastructure for this. For example, we're marking the dirty rectangle in xenfb_vm_nopage, even if the area is only getting read, not written. Can we fix that? And we should replace the zap_page_range() call, whose locking we're unsure about, with a call to unmap_mapping_range(), which has several advantages: * It is already fully exported * It deals with locking automatically via the address_space i_mmap_lock spinlock * It automatically iterates over all the vmas on the address_space without us having to loop over them ourselves but that requires adding the address_space concept to xenfb, which I think is desirable but which requires a bit more work. How reproducible: The race has not been reproduced in the wild.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
This request was previously evaluated by Red Hat Product Management for inclusion in the current Red Hat Enterprise Linux release, but Red Hat was unable to resolve it in time. This request will be reviewed for a future Red Hat Enterprise Linux release.
The best long-term option could be a conversion to deferred I/O (see Documentation/fb/deferred_io.txt). Might not be the right thing for RHEL-5, though.
Upstream kernel went with the solution mentioned in comment#4, in 2.6.26. We could replace RHEL-5's xenfb by a backport of upstream's. However, taking a new, untried backport to fix a race condition that hasn't been reproduced in the lab, and hasn't generated any user complaints so far seems like a really bad idea to me. Therefore, I'm closing this WONTFIX. If you think we should fix it, please reopen the bug.