Bug 666005
Summary: | Dom0 crashes on high I/O when using DRBD from suspected network driver bug | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | prickett233 | ||||
Component: | xen | Assignee: | Xen Maintainance List <xen-maint> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 5.5 | CC: | drjones, leiwang, lersek, minovotn, mrezanin, mshao, qwan, xen-maint, yuzhang | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-03-02 15:33:11 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 514499, 570796 | ||||||
Attachments: |
|
Description
prickett233
2010-12-28 11:53:38 UTC
My understanding is this: - DRBD (block device driver) causes the network stack to add references to pages that were temporarily granted to dom0 (blkback) - DRBD reports "I'm done" to blkback, violating the blkdev interface contract (because "background references" remain) - blkback waives its grants on the pages (correctly), the pages are unmapped - the network stack blows up Blkback works on any block device, and so it can rely on nothing else than the block device interface contract. Even if blkback knew about DRBD specific "background references", it could only deal with those by deferring unmapping. Then it would have to poll to see if the references are gone. Or the network stack would have to notify blkback. Alternatively, the network stack would have to unmap the pages itself and clean up the grants. These are unreasonable mixups of responsibilities. Additionally, if blkback waited for background references to vanish, those could stall blkback indefinitely (by holding back pages / ring entries), without actual block device activity. This problem should be fixed in DRBD. At least DRBD should add an option to disable the use of the zero-copy network interface. Ultimately, dom0 crashes due to the non-conforming dom0 block driver. For the time being, RHEL5 dom0 should protect itself with the "ethtool -K xenbr0 sg off" workaround, as verified in http://lists.linbit.com/pipermail/drbd-user/2009-March/011652.html. The command could be added to the xen script that brings up xenbr0. The performance effects of disabling scatter/gather on xenbr0 should be measured, and based on that we should decide whether the ethtool command should run unconditionally or be configurable. (In reply to comment #5) > This problem should be fixed in DRBD. At least DRBD should add an option to > disable the use of the zero-copy network interface. Ultimately, dom0 crashes > due to the non-conforming dom0 block driver. I meant: This problem should be fixed in DRBD. At least DRBD should add an option to disable the use of the zero-copy network interface. Ultimately, dom0 crashes due to *this* non-conforming dom0 block driver. As there's no DRBD package in RHEL that could fix this problem and easy workaround is available, we closed this bz as WONTFIX. |