Hide Forgot
Description of problem: The Windows PV netfront driver has this little nugget of code: do { prevState = curState; curState = xenbus_read_driver_state(info->xbdev->otherend); RhelDbgPrint(TRACE_LEVEL_INFORMATION, ("Device state is %s.\n", xenbus_strstate(curState))); if (prevState != curState) { backend_changed(info->xbdev, curState); } } while (curState == XenbusStateInitWait || curState == XenbusStateInitialised); This means that the driver will check the backend state in a tight loop until it shows as XenbusStateConnected. On the dom0 side, this results in extreme event channel traffic and xenstored spending unusual amounts of CPU time returning nodes from the xenstore (e.g., /local/domain/0/backend/vif/144/0/state) If for some reason hotplug scripts fail to bring up the backend side correctly, the Windows driver will never recover. xenstored will continue to be pummeled with requests, spending precious dom0 CPU cycles. Why isn't this using a xenbus_watch* function?
I've never seen it in practice. This piece of code is _very_ heavily hit by WHQL tests and they pass (though admittedly they do so on an otherwise idle machine). Still, I am aware of it. Historically it was because this code was run with interrupts disabled. It was even worse, because the Windows guest would have been completely hung. Newer versions of the drivers changed it so that at least the Windows guest will "just" hammer on dom0. The block drivers have the same problem too (and with the complete-hang behavior, unfortunately).
This may be triggered by some problem in the backend bringup. When the Windows driver is in this state, we see: # xenstore-ls /local/domain/0/backend/vif/488 0 = "" domain = "dom_93116181" handle = "0" uuid = "8fca2ad4-5cc0-034a-5d3a-ee3f7860ba0a" script = "/etc/xen/scripts/vif-route" state = "2" frontend = "/local/domain/488/device/vif/0" mac = "12:31:3D:04:69:F4" online = "1" frontend-id = "488" type = "front" feature-sg = "1" feature-gso-tcpv4 = "1" feature-rx-copy = "1" hotplug-status = "connected" # xenstore-ls /local/domain/488/device/vif/0 backend-id = "0" mac = "12:31:3D:04:69:F4" handle = "0" state = "4" backend = "/local/domain/0/backend/vif/488/0" tx-ring-ref = "916" rx-ring-ref = "797" event-channel = "5" request-rx-copy = "1" feature-rx-notify = "1" feature-sg = "0" feature-gso-tcpv4 = "0"
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Destabilizing change, closing as WONTFIX.