Description of problem: System does not shut down Version-Release number of selected component (if applicable): 3.5.1-1.fc17 How reproducible: I don't know. It happens whenever I try to reboot. This did not happen with version 3.4.3-1.fc17. There is an active LVM volume group on the iSCSI target, and the target is unreachable (because it is actually a KVM guest living on the same machine). Actual results: Hangs forever, spitting out stack traces once in a while. Expected results: Clean shutdown. Additional info: After upgrading my F17 system from kernel 3.4.3-1.fc17 to 3.5.1-1.fc17, it is unable to shutdown or reboot at all. It gets stuck in some iscsi wind-down code. Ironically, I was having another shutdown-related problem with NFS (#830862), which I hoped would get resolved by the upgrade, but now it's much worse than it was before :(. Pid: 745, comm: iscsid Not tainted 3.5.1-1.str.fc17.x86_64 RIP __scsi_host_match+0x20 Call Trace: ? scsi_target_reap+0x28 scsi_remove_target+0x199 __iscsi_unbind_session+0xbf iscsi_remove_session+0xe2 iscsi_destroy_session+0x16 iscsi_session_teardown+0xa6 iscsi_sw_tcp_session_destroy+0x4b iscsi_if_rx+0xab0 ? __kfree_skb+0x47 ? kfree+0x49 ? kmem_cache_alloc_node+0x38 ? __kmalloc_node_track_caller+0x63 ? netlink_sendmsg+0x22a netlink_unicast+0x19d netlink_sendmsg+0x2bd sock_sendmsg+0xf8 ? sock_recvmsg+0xfd ? move_addr_to_kernel+0x5a ? verify_iovec+0x56 __sys_sendmsg+0x3ac ? sysfs_refresh_inode+0x90 ? sysfs_getattr+0x52 ? cp_new_stat+0x120 sys_sendmsg+0x49 system_call_fastpath+0x16 The exact location varies from slightly, but it's always below scsi_remove_target. Meanwhile, I've become quite adept at typing in stack traces from screen photographs ;).
I don't know the actual message because it scrolled off the screen, and I couldn't scroll back. After a while, multiple "CPU#x stuck for xxs" messages appear.
A few more observations: • ssh sessions stay alive; I can still watch "top". • systemd gets stuck in "D" state (as does apparently every other process trying to access a disk). • The volume group must be activated for this to happen. Fortunately, this means that I now know how to work around this issue (deactivate it before shutting down).
I can reproduce it reliably now, using only two KVM virtual guests. One is the iSCSI target. For this I'm using CentOS 6.2 (c6). The other is the problematic iSCSI consumer (initiator) running F17 (f17): c6$ sudo service tgtd start f17$ sudo vgchange -ay && sudo mount -o barrier=0 /dev/mapper/vg_scsi-lvtest1 /xfs f17$ sudo umount /xfs c6$ sudo service tgtd force-stop (wait for 120 seconds for f17 to complain that it cannot reconnect) f17$ sudo reboot The mounted file system is (predictably) XFS.
(In reply to comment #2) > Fortunately, this > means that I now know how to work around this issue (deactivate it before > shutting down). ... which also doesn't work :(.
It is caused by this commit [1]. This is the stable series, shortly after v3.5. Mainline mailing list thread seems to be [2]. What happens is this. Don't be confused by the version number; it's actually a build of the aforementioned version (10f8d5b86743): [ 136.160148] session1: session recovery timed out after 120 secs [ 159.903518] sd 2:0:0:1: [sda] Synchronizing SCSI cache [ 159.904013] sd 2:0:0:1: [sda] [ 159.904249] Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK [ 184.023076] BUG: soft lockup - CPU#0 stuck for 22s! [iscsid:648] [ 184.023076] Modules linked in: xfs lockd sunrpc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter(-) ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack tpm_bios snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd microcode pcspkr virtio_balloon virtio_net soundcore i2c_piix4 i2c_core virtio_blk floppy [last unloaded: scsi_wait_scan] [ 184.023076] CPU 0 [ 184.023076] Modules linked in:[ 184.023076] xfs lockd sunrpc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter(-) ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack tpm_bios snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd microcode pcspkr virtio_balloon virtio_net soundcore i2c_piix4 i2c_core virtio_blk floppy [last unloaded: scsi_wait_scan] [ 184.023076] Pid: 648, comm: iscsid Not tainted 3.4.0-0.rc0.git10.1.el6.x86_64 #1 Red Hat KVM [ 184.023076] RIP: 0010:[<ffffffff81608221>] [<ffffffff81608221>] _raw_spin_unlock_irqrestore+0x11/0x20 [ 184.023076] RSP: 0018:ffff88007b1c38d8 EFLAGS: 00000286 [ 184.023076] RAX: 0000000000000286 RBX: ffffffff81077022 RCX: 000000000000002a [ 184.023076] RDX: ffff88007b1f1018 RSI: 0000000000000286 RDI: 0000000000000286 [ 184.023076] RBP: ffff88007b1c38d8 R08: ffffffff81cd8060 R09: 0000000000000080 [ 184.023076] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88007b2ec810 [ 184.023076] R13: ffff88007b2ed138 R14: ffff8800368a53b8 R15: ffff8800368a5078 [ 184.023076] FS: 00007f38dfc53740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 184.023076] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 184.023076] CR2: 0000000000f369f0 CR3: 000000007b1ae000 CR4: 00000000000006f0 [ 184.023076] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 184.023076] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 184.023076] Process iscsid (pid: 648, threadinfo ffff88007b1c2000, task ffff88007b1a5c80) [ 184.023076] Stack: [ 184.023076] ffff88007b1c3908 ffffffff813dc447 0000000000000296 ffff88007b1f1000 [ 184.023076] ffff88007b2ec7f0 ffff88007b2ec800 ffff88007b1c3958 ffffffff813dfb59 [ 184.023076] 0000000000000286 ffff88007b2ec800 0000000000000000 ffff88007b2ed000 [ 184.023076] Call Trace: [ 184.023076] [<ffffffff813dc447>] scsi_target_reap+0x77/0xe0 [ 184.023076] [<ffffffff813dfb59>] scsi_remove_target+0x1a9/0x1c0 [ 184.023076] [<ffffffffa011bf34>] __iscsi_unbind_session+0xb4/0x170 [scsi_transport_iscsi] [ 184.023076] [<ffffffffa011c0ce>] iscsi_remove_session+0xde/0x1b0 [scsi_transport_iscsi] [ 184.023076] [<ffffffffa011c1b6>] iscsi_destroy_session+0x16/0x60 [scsi_transport_iscsi] [ 184.023076] [<ffffffffa0132739>] iscsi_session_teardown+0xa9/0xd0 [libiscsi] [ 184.023076] [<ffffffffa013e160>] iscsi_sw_tcp_session_destroy+0x50/0x70 [iscsi_tcp] [ 184.023076] [<ffffffffa011c725>] iscsi_if_rx+0x295/0xd10 [scsi_transport_iscsi] [ 184.023076] [<ffffffff814f9ec1>] ? __skb_recv_datagram+0x121/0x2f0 [ 184.023076] [<ffffffff811768ec>] ? kfree+0x2c/0x120 [ 184.023076] [<ffffffff814f4b5b>] ? __alloc_skb+0x4b/0x170 [ 184.023076] [<ffffffff8152bb58>] netlink_unicast+0x1a8/0x220 [ 184.023076] [<ffffffff8152c94c>] netlink_sendmsg+0x20c/0x300 [ 184.023076] [<ffffffff814ec5a8>] sock_sendmsg+0xf8/0x130 [ 184.023076] [<ffffffff811a794b>] ? mntput_no_expire+0x4b/0x130 [ 184.023076] [<ffffffff811a7a53>] ? mntput+0x23/0x40 [ 184.023076] [<ffffffff811950a2>] ? path_put+0x22/0x30 [ 184.023076] [<ffffffff814eb07e>] ? move_addr_to_kernel+0x4e/0x90 [ 184.023076] [<ffffffff814f92c5>] ? verify_iovec+0x85/0xf0 [ 184.023076] [<ffffffff814edcdd>] __sys_sendmsg+0x3fd/0x420 [ 184.023076] [<ffffffff811f9f30>] ? sysfs_refresh_inode+0x90/0xa0 [ 184.023076] [<ffffffff8118f436>] ? cp_new_stat+0x116/0x130 [ 184.023076] [<ffffffff814edf09>] sys_sendmsg+0x49/0x90 [ 184.023076] [<ffffffff816104e9>] system_call_fastpath+0x16/0x1b [ 184.023076] Code: 40 00 f3 90 0f b6 07 38 c2 75 f7 c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 80 07 01 48 89 f7 57 9d <66> 66 90 66 90 c9 c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 [ 159.9] is when I issued reboot. [1] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commit;h=10f8d5b86743b33d841a175303e2bf67fd620f42 [2] https://lkml.org/lkml/2012/6/22/31
This should solve it: http://marc.info/?t=134621728400002&r=1&w=2&n=3
The fix is in stable kernel 3.5.6: http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commit;h=7cea667518d3c3719fa05264b3c195e74ea137e8
3.5.6 is build in koji and filed as an update. It hasn't hit the repos yet, but if you could test it that would be appreciated: https://admin.fedoraproject.org/updates/kernel-3.5.6-1.fc17
I left an anonymous comment. Fixed.
(In reply to comment #8) > 3.5.6 is build in koji and filed as an update. It hasn't hit the repos yet, > but if you could test it that would be appreciated: > > https://admin.fedoraproject.org/updates/kernel-3.5.6-1.fc17 This is in the stable repo. Closing.