Bug 734110

Summary:	rhevh - upgrade ovirt node fails due to nonexistent breth0
Product:	Red Hat Enterprise Linux 5	Reporter:	Pavel Stehlik <pstehlik>
Component:	ovirt-node	Assignee:	Mike Burns <mburns>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	5.7	CC:	apevec, cshao, gouyang, leiwang, mburns, moli, ovirt-maint, ycui
Target Milestone:	rc	Keywords:	TestOnly
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	mburns-reviewed -- no tech note needed	Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-02-21 05:04:23 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pavel Stehlik 2011-08-29 13:28:44 UTC

Description of problem:
 When upgrading from rhevh (rhevh-5.7-20110725.1 to rhevh-5.7-20110824.0) from rhevm UI, then host is rebooted in the loop. During booting there is failure of gathering info for  'breth0' interface. The upgrade from 5.6-11.1 works without problems.

Version-Release number of selected component (if applicable):
rhevh-20110824.0

How reproducible:
100%

Steps to Reproduce:
1. try to update from 5.7-20110725.1
2.
3.
  
Actual results:


Expected results:


Additional info:
I already reinstalled the host, so please forgive me that I didn't provide any logs (if you need them, pls let me know).

Comment 1 Mike Burns 2011-08-29 15:18:09 UTC

I'm unable to reproduce the reboot loop using the exact steps provided in the description.  

Pavel will attempt to reproduce and provide either logs or serial console trace.


Note:  I did see the failure when trying to start breth0 and saw that both ifcfg-rhevm and ifcfg-breth0 were persisted, but it did not cause an issue with the upgrade.

Comment 2 Ying Cui 2011-08-30 10:20:12 UTC

In our test environment, we can not reproduce it.

Test build
Upgrade rhevh-20110725.1 to rhevh-20110824.0 via RHEV-M UI.
SM107

Test steps:
1. Install rhevh-20110725.1 manual by Menu.
   rhevh install on local SATA disk.
   rhevh is in iSCSI domain to connect iSCSI soft LUN.
2. rhevh-20110725.1 can be up in RHEV-M UI.
3. Upgrade rhevh-20110725.1 to rhevh-20110824.0 via RHEV-M UI.
4. RHEV-H reboot as we expected after upgrade.
5. rhevh-20110824.0 rhevh is up automatically after upgrade. and iSCSI domain is up as well.

This issue is not reproduced.

Note: 
SM107 is the latest build for current announced mail list. 
SM 109 is ready for QA test in tlv team, but not announce , it just change there is a new bootstrap.

Comment 3 Alan Pevec 2011-08-30 22:08:04 UTC

Backtrace from the vmcore found on the Pavel's machine:

PID: 9868   TASK: ffff810369a06820  CPU: 7   COMMAND: "iscsiadm"
 #0 [ffff81035046f8b0] crash_kexec at ffffffff800afef5
 #1 [ffff81035046f970] __die at ffffffff80065127
 #2 [ffff81035046f9b0] do_page_fault at ffffffff80067474
 #3 [ffff81035046faa0] error_exit at ffffffff8005dde9
    [exception RIP: netpoll_send_skb_on_dev+34]
    RIP: ffffffff8024121e  RSP: ffff81035046fb58  RFLAGS: 00010002
    RAX: 0000000000000006  RBX: ffff810377484000  RCX: ffffffff80000000
    RDX: ffff810377484000  RSI: ffff81036abbbdc0  RDI: ffffffff8862adc0
    RBP: 0000000000000000   R8: 0000000000000000   R9: ffffffff885b7f95
    R10: 0000000080000000  R11: 0000000000000004  R12: ffff81036abbbdc0
    R13: ffffffff804efdd0  R14: 0000000000000020  R15: ffff81036874520c
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #4 [ffff81035046fb70] br_dev_queue_push_xmit at ffffffff885b8154
 #5 [ffff81035046fb80] br_forward_finish at ffffffff885b81e1
 #6 [ffff81035046fb90] __br_deliver at ffffffff885b8377
 #7 [ffff81035046fbb0] br_dev_xmit at ffffffff885b7294
 #8 [ffff81035046fbd0] netpoll_send_skb_on_dev at ffffffff802412ab
 #9 [ffff81035046fbf0] write_msg at ffffffff8862a0e1
#10 [ffff81035046fc20] __call_console_drivers at ffffffff8009350f
#11 [ffff81035046fc40] release_console_sem at ffffffff80017354
#12 [ffff81035046fc70] vprintk at ffffffff80093d04
#13 [ffff81035046fcf0] printk at ffffffff80093dbb
#14 [ffff81035046fde0] sd_revalidate_disk at ffffffff88123635
#15 [ffff81035046fea0] sd_rescan at ffffffff88122534
#16 [ffff81035046feb0] scsi_rescan_device at ffffffff880199a1
#17 [ffff81035046fec0] store_rescan_field at ffffffff8801b649
#18 [ffff81035046fed0] sysfs_write_file at ffffffff80110ade
#19 [ffff81035046ff10] vfs_write at ffffffff80016b92
#20 [ffff81035046ff40] sys_write at ffffffff8001745b
#21 [ffff81035046ff80] tracesys at ffffffff8005d28d (via system_call)
    RIP: 00002b40725897d0  RSP: 00007fff5286bfb8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: ffffffff8005d28d  RCX: ffffffffffffffff
    RDX: 0000000000000001  RSI: 000000000044cd08  RDI: 0000000000000003
    RBP: 0000000000443461   R8: 0000000000000001   R9: 00002b40725dd100
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000003
    R13: 00007fff5286c050  R14: 00007fff5286c4b0  R15: 000000000044cd08
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

netpoll in the trace indicates this is netconsole related.

Ying, did you have netconsole configured?

Comment 6 Ying Cui 2011-08-31 08:20:40 UTC

> netpoll in the trace indicates this is netconsole related.
> 
> Ying, did you have netconsole configured?

Yes, I configured netconsole for this testing.

Ying

Comment 7 Alan Pevec 2011-09-28 10:28:14 UTC

We'll look at this for 5.8, it's likely kernel issue with netconsole on this particular NIC.

Comment 12 Mike Burns 2012-01-13 00:31:29 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
mburns-reviewed -- no tech note needed

Comment 13 errata-xmlrpc 2012-02-21 05:04:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0168.html