Bug 863077

Summary: Soft lockup on reboot with an active VG
Product: Red Hat Enterprise Linux 6 Reporter: Nenad Peric <nperic>
Component: kernelAssignee: Chris Leech <cleech>
Status: CLOSED ERRATA QA Contact: Bruno Goncalves <bgoncalv>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.4CC: agrover, bdonahue, bgoncalv, bprakash, cleech, coughlan, czhang, eddie.wai, edwardn, jbrier, juzhang, lukasz.dorau, robert.evans, sluo
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-2.6.32-329.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 06:49:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nenad Peric 2012-10-04 11:58:40 UTC
Description of problem:

The system locks up during reboot (stopping of iscsi) if a VG is still active. 
No FS needs to be mounted. 

Version-Release number of selected component (if applicable):

Packages installed:
iscsi-initiator-utils-6.2.0.872-41.el6.x86_64
kernel-2.6.32-315.el6.x86_64

lvm2-libs-2.02.97-3.el6.x86_64
lvm2-2.02.97-3.el6.x86_64
lvm2-cluster-2.02.97-3.el6.x86_64
device-mapper-event-libs-1.02.76-3.el6.x86_64
device-mapper-1.02.76-3.el6.x86_64
device-mapper-libs-1.02.76-3.el6.x86_64
device-mapper-event-1.02.76-3.el6.x86_64


How reproducible:
Everytime
 

Steps to Reproduce:

1. Create VG - vgcreate test_vg /dev/sda1 /dev/sda2
2. Create LV - lvcreate test_vg -L10M
3. Reboot - reboot
  
Actual results:

The reboot locks up with the following error:

Stopping iscsi:  connection10:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294823269, last ping 4294828269, now 4294833269
 connection10:0: detected conn error (1011)
 connection8:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294826294, last ping 4294831294, now 4294836294
 connection8:0: detected conn error (1011)
 connection9:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294826312, last ping 4294831312, now 4294836312
 connection9:0: detected conn error (1011)
 connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294826319, last ping 4294831319, now 4294836319
 connection4:0: detected conn error (1011)
 connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294826326, last ping 4294831326, now 4294836326
 connection3:0: detected conn error (1011)
 connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294826330, last ping 4294831330, now 4294836330
 connection1:0: detected conn error (1011)
 connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294826350, last ping 4294831350, now 4294836350
 connection5:0: detected conn error (1011)
 connection7:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294826352, last ping 4294831352, now 4294836352
 connection7:0: detected conn error (1011)
 connection6:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294826352, last ping 4294831352, now 4294836353
 connection6:0: detected conn error (1011)
 connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294826353, last ping 4294831353, now 4294836354
 connection2:0: detected conn error (1011)
BUG: soft lockup - CPU#0 stuck for 67s! [scsi_wq_11:1363]
Modules linked in: sctp libcrc32c autofs4 sg sd_mod crc_t10dif be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
CPU 0
Modules linked in: sctp libcrc32c autofs4 sg sd_mod crc_t10dif be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
 
Pid: 1363, comm: scsi_wq_11 Not tainted 2.6.32-315.el6.x86_64 #1 Red Hat KVM
RIP: 0010:[<ffffffff8150d7c7>]  [<ffffffff8150d7c7>] _spin_unlock_irqrestore+0x17/0x20
RSP: 0018:ffff88003b135d80  EFLAGS: 00000286
RAX: 0000000000000286 RBX: ffff88003b135d80 RCX: 0000000000000000
RDX: ffff88003b308050 RSI: 0000000000000286 RDI: 0000000000000286
RBP: ffffffff8100bc0e R08: dead000000200200 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003b135d80
R13: ffffffff8100bc0e R14: ffff88003b135d00 R15: ffff88003b135d10
FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000017eaba0 CR3: 0000000037ed3000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process scsi_wq_11 (pid: 1363, threadinfo ffff88003b134000, task ffff88003b12c040)
Stack:
 ffff88003b135db0 ffffffff813769c7 ffffffffffffff10 ffff880037541c00
<d> ffff88003b308000 ffff88003b30f930 ffff88003b135e00 ffffffff8137a435
<d> 000000000000fb88 ffff88003dd21000 ffffffff81a8d020 ffff88003b30f800
Call Trace:
 [<ffffffff813769c7>] ? scsi_target_reap+0x77/0xe0
 [<ffffffff8137a435>] ? scsi_remove_target+0x175/0x180
 [<ffffffffa01698e1>] ? __iscsi_unbind_session+0xa1/0x150 [scsi_transport_iscsi]
 [<ffffffffa0169840>] ? __iscsi_unbind_session+0x0/0x150 [scsi_transport_iscsi]
 [<ffffffff8108e050>] ? worker_thread+0x170/0x2a0
 [<ffffffff81093e40>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8108dee0>] ? worker_thread+0x0/0x2a0
 [<ffffffff81093ad6>] ? kthread+0x96/0xa0
 [<ffffffff8108dee0>] ? worker_thread+0x0/0x2a0
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffff81093a40>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
Code: 00 00 00 01 74 05 e8 b9 7a d7 ff c9 c3 0f 1f 80 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 48 89 fa c7 02 00 00 00 00 48 89 f7 57 9d <0f> 1f 44 00 00 c9 c3 66 90 55 48 89 e5 0f 1f 44 00 00 3e ff 07
Call Trace:
 [<ffffffff813769c7>] ? scsi_target_reap+0x77/0xe0
 [<ffffffff8137a435>] ? scsi_remove_target+0x175/0x180
 [<ffffffffa01698e1>] ? __iscsi_unbind_session+0xa1/0x150 [scsi_transport_iscsi]
 [<ffffffffa0169840>] ? __iscsi_unbind_session+0x0/0x150 [scsi_transport_iscsi]
 [<ffffffff8108e050>] ? worker_thread+0x170/0x2a0
 [<ffffffff81093e40>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8108dee0>] ? worker_thread+0x0/0x2a0
 [<ffffffff81093ad6>] ? kthread+0x96/0xa0
 [<ffffffff8108dee0>] ? worker_thread+0x0/0x2a0
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffff81093a40>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20

--------------------------------------------------------------------------------

If VG is deactivated prior to reboot (vgchange -an test_vg) the system reboots printing out only the following warnings during stopping of the iscsi:

Stopping iscsi:  connection2:0: detected conn error (1020)
 connection10:0: detected conn error (1020)
 connection3:0: detected conn error (1020)
 connection4:0: detected conn error (1020)
 connection1:0: detected conn error (1020)
 connection5:0: detected conn error (1020)
 connection6:0: detected conn error (1020)
 connection7:0: detected conn error (1020)
 connection8:0: detected conn error (1020)
 connection9:0: detected conn error (1020)
[  OK  ]


Expected results:

reboot as exptected. 

Additional info:

Comment 2 Bruno Goncalves 2012-10-04 13:08:18 UTC
It is a regression as it is not reproducible on RHEL6.3

2.6.32-279.el6.x86_64
iscsi-initiator-utils-6.2.0.872-41.el6.x86_64

Comment 4 Chris Leech 2012-10-05 00:02:24 UTC
This isn't an iscsi-initiator-utils issue, it's a regression in the kernel.  I reproduced this using only the newer kernel builds on top of a 6.3 install, and bisected it down to a change in kernel-2.6.32-300.el6 (and still reproducible through the latest build 2.6.32-320.el6).

There are no iSCSI initiator changes there either, but 300 did have a large set of device mapper and SCSI changes to look at.

Comment 5 Chris Leech 2012-10-05 06:15:15 UTC
Assigning back to myself, I took a quick look upstream and found a likely fix.  Testing now and will submit the patch for kernel inclusion.

Comment 6 RHEL Program Management 2012-10-05 16:20:44 UTC
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 7 David Milburn 2012-10-05 16:53:48 UTC
*** Bug 862024 has been marked as a duplicate of this bug. ***

Comment 8 David Milburn 2012-10-05 16:57:34 UTC
*** Bug 862930 has been marked as a duplicate of this bug. ***

Comment 9 Tomas Henzl 2012-10-06 19:37:00 UTC
*** Bug 863523 has been marked as a duplicate of this bug. ***

Comment 10 Jarod Wilson 2012-10-12 05:00:15 UTC
Patch(es) available on kernel-2.6.32-329.el6

Comment 13 Sibiao Luo 2012-10-15 03:16:57 UTC
*** Bug 865349 has been marked as a duplicate of this bug. ***

Comment 14 Bruno Goncalves 2012-10-15 07:54:50 UTC
Problem fixed on 2.6.32-329.el6.x86_64.


vgcreate test_vg /dev/sdb
  No physical volume label read from /dev/sdb
  Physical volume "/dev/sdb" successfully created
  Volume group "test_vg" successfully created

lvcreate test_vg -L10M
  Found duplicate PV Qj58vQT59Yb1KlgV5qioVCk0ERYfAB0d: using /dev/sdc not /dev/sdb
  Rounding up size to full physical extent 12.00 MiB
  Logical volume "lvol0" created

reboot

system rebooted without problem.

Comment 15 Tony Camuso 2012-10-24 14:38:04 UTC
*** Bug 857360 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2013-02-21 06:49:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0496.html