Description of problem: A virtual interface is created and plugged into the VM. Within the VM it is given an IP address using ifconfig. A ping is then attempted but there appears to be no network connectivity. This happens infrequently during stress testing. /var/log/messages show that the occurrences coincide with drivers/xen/netfront/netfront.c:network_connect() being called twice. This suggests the problem is that the netfront driver is receiving a duplicate "backend_changed" signal, the second of which it should be ignoring. The duplicate "backend_changed" signals are a known issue, and there is a xen-3.1 guest kernel patch to protect against them. However, it looks like this hasn't been applied in either the RHEL4.6 or RHEL5.2 kernel-xen packages. I'll attach the patch to this bug report. Version-Release number of selected component (if applicable): RHEL4.6: kernel-2.6.9-67.0.20.EL RHEL5.2: kernel-2.6.18-92.1.6.el5 How reproducible: Sporadic. difficult. Steps to Reproduce: 1. Create VIF (in VM management tool) 2. assign virtual interface IP address (in VM) 3. try to ping known IP address on same network Actual results: ping says host "Unreachable" Expected results: ping contacts host Additional info:
Created attachment 310658 [details] [NET] front: Fix crashes when xenstore watches fire multiple times.
Alex, Many thanks for the patch. I can see the first and the third patch hunk in the drivers I got from http://xenbits.xensource.com/linux-2.6.18-xen.hg, but not the second. How come? Could you point me to the relevant upstream changeset(s)?
Alex is away at the moment but let me try and answer. The upstream changeset is http://xenbits.xensource.com/xen-unstable.hg?rev/79315be2c9b9 The second hunk is indeed not present any longer. I had a dig and found that it was subsequently removed by http://xenbits.xensource.com/xen-unstable.hg?rev/e99ba0c6c046 which came out of http://lists.xensource.com/archives/html/xen-devel/2006-12/msg00843.html We haven't observed that failure though (I don't know if we test for it though)
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-115.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
This bug has been marked for inclusion in the Red Hat Enterprise Linux 5.3 Release Notes. To aid in the development of relevant and accurate release notes, please fill out the "Release Notes" field above with the following 4 pieces of information: Cause: What actions or circumstances cause this bug to present. Consequence: What happens when the bug presents. Fix: What was done to fix the bug. Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: A race condition exists in the Xenbus protocols for device creation and destruction. The netfront driver didn't cope with it. Consequence: Network device creation can result in a device that is hung. Happens rarely, typically when stress testing. Fix: Backport fix from upstream. Result: Network device creation works reliably, even when stress testing.
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,10 +1 @@ -Cause: A race condition exists in the Xenbus protocols for device +A race condition could occur when creating and destroying virtual network devices. In some circumstances — especially high load situations — this would cause the virtual device to not respond. In this update, the state of the virtual device is checked to prevent the race condition from occurring.-creation and destruction. The netfront driver didn't cope with it. - -Consequence: Network device creation can result in a device that is -hung. Happens rarely, typically when stress testing. - -Fix: Backport fix from upstream. - -Result: Network device creation works reliably, even when stress -testing.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html