Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 683171

Summary:	Windows PV netfront driver spams event channel/xenstored on startup
Product:	Red Hat Enterprise Linux 5	Reporter:	Jacob Hunt <jhunt>
Component:	xenpv-win	Assignee:	Paolo Bonzini <pbonzini>
Status:	CLOSED WONTFIX	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	high	Docs Contact:
Priority:	high
Version:	5.6	CC:	drjones, jwest, jzheng, leiwang, msw, pbonzini, qwan, tburke, yuzhou
Target Milestone:	rc	Keywords:	Reopened
Target Release:	---
Hardware:	x86_64
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-08-01 11:00:33 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	584249
Bug Blocks:	518407, 807971

Description Jacob Hunt 2011-03-08 18:13:46 UTC

Description of problem:

The Windows PV netfront driver has this little nugget of code:

do {
prevState = curState;
curState = xenbus_read_driver_state(info->xbdev->otherend);
RhelDbgPrint(TRACE_LEVEL_INFORMATION,
("Device state is %s.\n", xenbus_strstate(curState)));

if (prevState != curState) {
backend_changed(info->xbdev, curState);
}
} while (curState == XenbusStateInitWait ||
curState == XenbusStateInitialised);

This means that the driver will check the backend state in a tight loop until it shows as XenbusStateConnected. On the dom0 side, this results in extreme event channel traffic and xenstored spending unusual amounts of CPU time returning nodes from the xenstore (e.g., /local/domain/0/backend/vif/144/0/state)

If for some reason hotplug scripts fail to bring up the backend side correctly, the Windows driver will never recover. xenstored will continue to be pummeled with requests, spending precious dom0 CPU cycles.

Why isn't this using a xenbus_watch* function?

Comment 1 Paolo Bonzini 2011-03-09 11:08:37 UTC

I've never seen it in practice.  This piece of code is _very_ heavily hit by WHQL tests and they pass (though admittedly they do so on an otherwise idle machine).  Still, I am aware of it.

Historically it was because this code was run with interrupts disabled.  It was even worse, because the Windows guest would have been completely hung.  Newer versions of the drivers changed it so that at least the Windows guest will "just" hammer on dom0.

The block drivers have the same problem too (and with the complete-hang behavior, unfortunately).

Comment 3 Matt Wilson 2011-03-09 17:46:37 UTC

This may be triggered by some problem in the backend bringup. When the Windows driver is in this state, we see:

# xenstore-ls /local/domain/0/backend/vif/488 
0 = "" 
domain = "dom_93116181" 
handle = "0" 
uuid = "8fca2ad4-5cc0-034a-5d3a-ee3f7860ba0a" 
script = "/etc/xen/scripts/vif-route" 
state = "2" 
frontend = "/local/domain/488/device/vif/0" 
mac = "12:31:3D:04:69:F4" 
online = "1" 
frontend-id = "488" 
type = "front" 
feature-sg = "1" 
feature-gso-tcpv4 = "1" 
feature-rx-copy = "1" 
hotplug-status = "connected" 

# xenstore-ls /local/domain/488/device/vif/0 
backend-id = "0" 
mac = "12:31:3D:04:69:F4" 
handle = "0" 
state = "4" 
backend = "/local/domain/0/backend/vif/488/0" 
tx-ring-ref = "916" 
rx-ring-ref = "797" 
event-channel = "5" 
request-rx-copy = "1" 
feature-rx-notify = "1" 
feature-sg = "0" 
feature-gso-tcpv4 = "0"

Comment 23 RHEL Program Management 2012-04-02 10:29:33 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 25 Paolo Bonzini 2012-08-01 11:00:33 UTC

Destabilizing change, closing as WONTFIX.