Bug 591548

Summary: netback does not properly get to the Connected state after it's been Closed
Product: Red Hat Enterprise Linux 5 Reporter: Paolo Bonzini <pbonzini>
Component: kernel-xenAssignee: Paolo Bonzini <pbonzini>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: high    
Version: 5.5CC: byu, dhoward, jarod, llim, pbonzini, xen-maint, yuzhang
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 21:31:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 518435, 526393, 643345    
Attachments:
Description Flags
patch none

Description Paolo Bonzini 2010-05-12 14:32:48 UTC
The netback driver fails to transition from InitWait to Connected after it's 
been closed once.  The reason is that at the moment netdev_state_change is 
called the interface is still down, so the NETDEV_CHANGE event is not called.

This is visible with the xenpv-win drivers by disabling and enabling the 
adapters repeatedly.  Without the patch, the drivers hang about 1 in 50 times
(and that is because of some hacks in the drivers; if I make the drivers talk
the correct xenbus protocol they will hang 100% of the time).

Upstream ties the Connected transition to the completion of the hotplug scripts, so it doesn't have this issue.

Comment 1 Paolo Bonzini 2010-05-12 14:35:53 UTC
Created attachment 413444 [details]
patch

Comment 2 RHEL Program Management 2010-05-20 12:41:54 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Jarod Wilson 2010-06-29 13:35:48 UTC
in kernel-2.6.18-205.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 6 Jarod Wilson 2010-06-29 13:39:59 UTC
Not sure yet what went wrong w/the release script, but that should have been "in kernel-2.6.18-204.el5" (in build 204, not 205).

Comment 10 Binbin Yu 2010-12-22 08:42:36 UTC
Tested with:
i386 and x86_64 host
Win2008-32 guest
Win2003-64 guest

Component version:
xen-3.0.3-120.el5
xenpv-win-1.3.1-1.el5

Steps:
1. install xenpv-win-1.3.1-1 on Windows guest
2. disable then enable the PV NIC from Device Manager
3. repeat step2

Reproduced the bug with kernel-xen-2.6.18-194.el5:
For Win2008-32 and Win2003-64 guest, they both take only one disable/enable cycle to make guest hang.

Verified the bug with kernel-xen-2.6.18-231.el5:
For both guests, disable/enable work smoothly, and after 6 disable/enable
cycles the guests still work fine without hang.

According to the test result above, set bug to VERIFIED.

here steps are referred to https://bugzilla.redhat.com/show_bug.cgi?id=643345

Comment 11 Binbin Yu 2010-12-24 08:18:56 UTC
Also verified  with kernel-xen-2.6.18-238.el5

Comment 13 errata-xmlrpc 2011-01-13 21:31:39 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html