Bug 683171 - Windows PV netfront driver spams event channel/xenstored on startup
Summary: Windows PV netfront driver spams event channel/xenstored on startup
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xenpv-win
Version: 5.6
Hardware: x86_64
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Paolo Bonzini
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 584249
Blocks: 518407 807971
TreeView+ depends on / blocked
 
Reported: 2011-03-08 18:13 UTC by Jacob Hunt
Modified: 2018-11-28 21:46 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-08-01 11:00:33 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Jacob Hunt 2011-03-08 18:13:46 UTC
Description of problem:

The Windows PV netfront driver has this little nugget of code:

do {
prevState = curState;
curState = xenbus_read_driver_state(info->xbdev->otherend);
RhelDbgPrint(TRACE_LEVEL_INFORMATION,
("Device state is %s.\n", xenbus_strstate(curState)));

if (prevState != curState) {
backend_changed(info->xbdev, curState);
}
} while (curState == XenbusStateInitWait ||
curState == XenbusStateInitialised);

This means that the driver will check the backend state in a tight loop until it shows as XenbusStateConnected. On the dom0 side, this results in extreme event channel traffic and xenstored spending unusual amounts of CPU time returning nodes from the xenstore (e.g., /local/domain/0/backend/vif/144/0/state)

If for some reason hotplug scripts fail to bring up the backend side correctly, the Windows driver will never recover. xenstored will continue to be pummeled with requests, spending precious dom0 CPU cycles.

Why isn't this using a xenbus_watch* function?

Comment 1 Paolo Bonzini 2011-03-09 11:08:37 UTC
I've never seen it in practice.  This piece of code is _very_ heavily hit by WHQL tests and they pass (though admittedly they do so on an otherwise idle machine).  Still, I am aware of it.

Historically it was because this code was run with interrupts disabled.  It was even worse, because the Windows guest would have been completely hung.  Newer versions of the drivers changed it so that at least the Windows guest will "just" hammer on dom0.

The block drivers have the same problem too (and with the complete-hang behavior, unfortunately).

Comment 3 Matt Wilson 2011-03-09 17:46:37 UTC
This may be triggered by some problem in the backend bringup. When the Windows driver is in this state, we see:

# xenstore-ls /local/domain/0/backend/vif/488 
0 = "" 
domain = "dom_93116181" 
handle = "0" 
uuid = "8fca2ad4-5cc0-034a-5d3a-ee3f7860ba0a" 
script = "/etc/xen/scripts/vif-route" 
state = "2" 
frontend = "/local/domain/488/device/vif/0" 
mac = "12:31:3D:04:69:F4" 
online = "1" 
frontend-id = "488" 
type = "front" 
feature-sg = "1" 
feature-gso-tcpv4 = "1" 
feature-rx-copy = "1" 
hotplug-status = "connected" 

# xenstore-ls /local/domain/488/device/vif/0 
backend-id = "0" 
mac = "12:31:3D:04:69:F4" 
handle = "0" 
state = "4" 
backend = "/local/domain/0/backend/vif/488/0" 
tx-ring-ref = "916" 
rx-ring-ref = "797" 
event-channel = "5" 
request-rx-copy = "1" 
feature-rx-notify = "1" 
feature-sg = "0" 
feature-gso-tcpv4 = "0"

Comment 23 RHEL Program Management 2012-04-02 10:29:33 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 25 Paolo Bonzini 2012-08-01 11:00:33 UTC
Destabilizing change, closing as WONTFIX.


Note You need to log in before you can comment on or make changes to this bug.