Description of problem: RHEL 5.5 server hangs during a reboot after mapping Luns to a Dell MD3000i target through iSCSI Version-Release number of selected component (if applicable): How reproducible: Always. Steps to Reproduce: 1. connect a RHEL 5.5 server to an iSCSI Dell MD300i target. 2. use software initiator to establish iSCSI sessions. 3. create few luns on the target and map them to the server. 4. check that the host sees the luns using lsscsi 5. reboot the host Actual results: After issuing a reboot, the host hangs with the messages below: -------------------------------------------------------- 21:08:13:742 TS1 >md: stopping all md devices. 21:08:14:835 TS1 >Synchronizing SCSI cache for disk sdi: 21:08:15:054 TS1 >usb 4-1: new full speed USB device using uhci_hcd and address 2 21:08:15:163 TS1 >usb 4-1: not running at top speed; connect to a high speed hub 21:08:15:163 TS1 >usb 4-1: configuration #1 chosen from 1 choice 21:08:15:163 TS1 >hub 4-1:1.0: USB hub found 21:08:15:163 TS1 >hub 4-1:1.0: 4 ports detected 21:08:17:568 TS1 > connection1:0: ping timeout of 5 secs expired, recv timeout 10, last rx 4295260241, last ping 4295270241, now 4295275241 21:08:17:568 TS1 > connection1:0: detected conn error (1011) 21:08:17:786 TS1 > connection4:0: ping timeout of 5 secs expired, recv timeout 10, last rx 4295260484, last ping 4295270484, now 4295275484 21:08:17:786 TS1 > connection4:0: detected conn error (1011) 21:08:22:704 TS1 >hub 1-5:1.0: activate --> -19 21:08:22:814 TS1 >usb 1-5: USB disconnect, address 2 21:12:12:874 TS1 >INFO: task reboot:4573 blocked for more than 120 seconds. 21:12:12:874 TS1 >"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:12:12:874 TS1 >reboot D 0000000000000001 0 4573 1 3888 (NOTLB) 21:12:12:874 TS1 > ffff81011e081be8 0000000000000086 0000000000000092 ffffffff8009eaed 21:12:12:874 TS1 > ffff81013b53ba90 0000000000000007 ffff810138e017e0 ffff81013fc5a820 21:12:12:983 TS1 > 0000009de31171f1 000000003b9aca00 ffff810138e019c8 000000073b3ae800 21:12:12:983 TS1 >Call Trace: 21:12:12:983 TS1 > [<ffffffff8009eaed>] __queue_work+0x43/0x53 21:12:12:983 TS1 > [<ffffffff8001cb9f>] __mod_timer+0x100/0x10f 21:12:12:983 TS1 > [<ffffffff80064167>] wait_for_completion+0x79/0xa2 21:12:12:983 TS1 > [<ffffffff8008e16d>] default_wake_function+0x0/0xe 21:12:12:983 TS1 > [<ffffffff801459cc>] blk_execute_rq_nowait+0x86/0x9a 21:12:12:983 TS1 > [<ffffffff80145a78>] blk_execute_rq+0x98/0xc0 21:12:12:983 TS1 > [<ffffffff80028bbd>] get_request_wait+0x21/0x11f 21:12:12:983 TS1 > [<ffffffff8807abbb>] :scsi_mod:scsi_execute+0xd1/0xeb 21:12:12:983 TS1 > [<ffffffff8807ac7a>] :scsi_mod:scsi_execute_req+0xa5/0xce 21:12:12:983 TS1 > [<ffffffff880a747d>] :sd_mod:sd_sync_cache+0x56/0xb4 21:12:12:983 TS1 > [<ffffffff80050000>] __filemap_fdatawrite_range+0x42/0x5b 21:12:12:983 TS1 > [<ffffffff801c9074>] class_device_get+0x14/0x1b 21:12:12:983 TS1 > [<ffffffff880a7c55>] :sd_mod:sd_shutdown+0x46/0xc9 21:12:12:983 TS1 > [<ffffffff801cc2ed>] device_shutdown+0x56/0x88 21:12:12:983 TS1 > [<ffffffff8009c4fd>] kernel_restart+0x9/0x46 21:12:12:983 TS1 > [<ffffffff8009c687>] sys_reboot+0x146/0x1c7 21:12:12:983 TS1 > [<ffffffff8003b153>] hrtimer_try_to_cancel+0x4a/0x53 21:12:12:983 TS1 > [<ffffffff8005aa74>] hrtimer_cancel+0xc/0x16 21:12:12:983 TS1 > [<ffffffff80064d05>] do_nanosleep+0x47/0x70 21:12:12:983 TS1 > [<ffffffff8005a961>] hrtimer_nanosleep+0x58/0x118 21:12:12:983 TS1 > [<ffffffff800b878c>] audit_syscall_entry+0x180/0x1b3 21:12:12:983 TS1 > [<ffffffff8005e229>] tracesys+0x71/0xe0 21:12:13:093 TS1 > [<ffffffff8005e28d>] tracesys+0xd5/0xe0 21:12:13:093 TS1 > 21:14:12:976 TS1 >INFO: task reboot:4573 blocked for more than 120 seconds. 21:14:12:976 TS1 >"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:14:12:976 TS1 >reboot D 0000000000000001 0 4573 1 3888 (NOTLB) 21:14:12:976 TS1 > ffff81011e081be8 0000000000000086 0000000000000092 ffffffff8009eaed 21:14:12:976 TS1 > ffff81013b53ba90 0000000000000007 ffff810138e017e0 ffff81013fc5a820 21:14:12:976 TS1 > 0000009de31171f1 000000003b9aca00 ffff810138e019c8 000000073b3ae800 21:14:12:976 TS1 >Call Trace: 21:14:12:976 TS1 > [<ffffffff8009eaed>] __queue_work+0x43/0x53 21:14:12:976 TS1 > [<ffffffff8001cb9f>] __mod_timer+0x100/0x10f 21:14:12:976 TS1 > [<ffffffff80064167>] wait_for_completion+0x79/0xa2 21:14:13:085 TS1 > [<ffffffff8008e16d>] default_wake_function+0x0/0xe 21:14:13:085 TS1 > [<ffffffff801459cc>] blk_execute_rq_nowait+0x86/0x9a 21:14:13:085 TS1 > [<ffffffff80145a78>] blk_execute_rq+0x98/0xc0 21:14:13:085 TS1 > [<ffffffff80028bbd>] get_request_wait+0x21/0x11f 21:14:13:085 TS1 > [<ffffffff8807abbb>] :scsi_mod:scsi_execute+0xd1/0xeb 21:14:13:085 TS1 > [<ffffffff8807ac7a>] :scsi_mod:scsi_execute_req+0xa5/0xce 21:14:13:085 TS1 > [<ffffffff880a747d>] :sd_mod:sd_sync_cache+0x56/0xb4 21:14:13:085 TS1 > [<ffffffff80050000>] __filemap_fdatawrite_range+0x42/0x5b 21:14:13:085 TS1 > [<ffffffff801c9074>] class_device_get+0x14/0x1b 21:14:13:085 TS1 > [<ffffffff880a7c55>] :sd_mod:sd_shutdown+0x46/0xc9 21:14:13:085 TS1 > [<ffffffff801cc2ed>] device_shutdown+0x56/0x88 21:14:13:085 TS1 > [<ffffffff8009c4fd>] kernel_restart+0x9/0x46 21:14:13:085 TS1 > [<ffffffff8009c687>] sys_reboot+0x146/0x1c7 21:14:13:085 TS1 > [<ffffffff8003b153>] hrtimer_try_to_cancel+0x4a/0x53 21:14:13:085 TS1 > [<ffffffff8005aa74>] hrtimer_cancel+0xc/0x16 21:14:13:085 TS1 > [<ffffffff80064d05>] do_nanosleep+0x47/0x70 21:14:13:085 TS1 > [<ffffffff8005a961>] hrtimer_nanosleep+0x58/0x118 21:14:13:085 TS1 > [<ffffffff800b878c>] audit_syscall_entry+0x180/0x1b3 21:14:13:085 TS1 > [<ffffffff8005e229>] tracesys+0x71/0xe0 21:14:13:085 TS1 > [<ffffffff8005e28d>] tracesys+0xd5/0xe0 21:14:13:085 TS1 > 21:16:13:050 TS1 >INFO: task reboot:4573 blocked for more than 120 seconds. 21:16:13:050 TS1 >"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:16:13:050 TS1 >reboot D 0000000000000001 0 4573 1 3888 (NOTLB) 21:16:13:050 TS1 > ffff81011e081be8 0000000000000086 0000000000000092 ffffffff8009eaed 21:16:13:159 TS1 > ffff81013b53ba90 0000000000000007 ffff810138e017e0 ffff81013fc5a820 21:16:13:159 TS1 > 0000009de31171f1 000000003b9aca00 ffff810138e019c8 000000073b3ae800 21:16:13:159 TS1 >Call Trace: 21:16:13:159 TS1 > [<ffffffff8009eaed>] __queue_work+0x43/0x53 21:16:13:159 TS1 > [<ffffffff8001cb9f>] __mod_timer+0x100/0x10f 21:16:13:159 TS1 > [<ffffffff80064167>] wait_for_completion+0x79/0xa2 21:16:13:159 TS1 > [<ffffffff8008e16d>] default_wake_function+0x0/0xe 21:16:13:159 TS1 > [<ffffffff801459cc>] blk_execute_rq_nowait+0x86/0x9a 21:16:13:159 TS1 > [<ffffffff80145a78>] blk_execute_rq+0x98/0xc0 21:16:13:159 TS1 > [<ffffffff80028bbd>] get_request_wait+0x21/0x11f 21:16:13:159 TS1 > [<ffffffff8807abbb>] :scsi_mod:scsi_execute+0xd1/0xeb 21:16:13:159 TS1 > [<ffffffff8807ac7a>] :scsi_mod:scsi_execute_req+0xa5/0xce 21:16:13:159 TS1 > [<ffffffff880a747d>] :sd_mod:sd_sync_cache+0x56/0xb4 21:16:13:159 TS1 > [<ffffffff80050000>] __filemap_fdatawrite_range+0x42/0x5b 21:16:13:159 TS1 > [<ffffffff801c9074>] class_device_get+0x14/0x1b 21:16:13:159 TS1 > [<ffffffff880a7c55>] :sd_mod:sd_shutdown+0x46/0xc9 21:16:13:159 TS1 > [<ffffffff801cc2ed>] device_shutdown+0x56/0x88 21:16:13:159 TS1 > [<ffffffff8009c4fd>] kernel_restart+0x9/0x46 21:16:13:159 TS1 > [<ffffffff8009c687>] sys_reboot+0x146/0x1c7 21:16:13:159 TS1 > [<ffffffff8003b153>] hrtimer_try_to_cancel+0x4a/0x53 21:16:13:159 TS1 > [<ffffffff8005aa74>] hrtimer_cancel+0xc/0x16 21:16:13:159 TS1 > [<ffffffff80064d05>] do_nanosleep+0x47/0x70 21:16:13:159 TS1 > [<ffffffff8005a961>] hrtimer_nanosleep+0x58/0x118 21:16:13:268 TS1 > [<ffffffff800b878c>] audit_syscall_entry+0x180/0x1b3 21:16:13:268 TS1 > [<ffffffff8005e229>] tracesys+0x71/0xe0 21:16:13:268 TS1 > [<ffffffff8005e28d>] tracesys+0xd5/0xe0 21:16:13:268 TS1 > 21:18:13:189 TS1 >INFO: task reboot:4573 blocked for more than 120 seconds. ---------------------------------------------------------------- Expected results: Host reboots successfully and sessions are re-established. Additional info:
(In reply to comment #0) > 1. connect a RHEL 5.5 server to an iSCSI Dell MD300i target. > 2. use software initiator to establish iSCSI sessions. > 3. create few luns on the target and map them to the server. > 4. check that the host sees the luns using lsscsi > 5. reboot the host > Does #3 have anything to do with this problem? If you have luns setup before you do the iscsi login, then do #2, #4, and #5, do you hit the same problem? If so the problem is that the session are not being shutdown during reboot/shutdown. If the disks need a cache sync to be sent during kernel shutdown then we hang on that command because the network is down at this time. I am working on a fix here: https://bugzilla.redhat.com/show_bug.cgi?id=583218 My original fix did not handle iscsi root properly, so still working on it.
(In reply to comment #1) > (In reply to comment #0) > > 1. connect a RHEL 5.5 server to an iSCSI Dell MD300i target. > > 2. use software initiator to establish iSCSI sessions. > > 3. create few luns on the target and map them to the server. > > 4. check that the host sees the luns using lsscsi > > 5. reboot the host > > > > > Does #3 have anything to do with this problem? If you have luns setup before > you do the iscsi login, then do #2, #4, and #5, do you hit the same problem? > > If so the problem is that the session are not being shutdown during > reboot/shutdown. If the disks need a cache sync to be sent during kernel > shutdown then we hang on that command because the network is down at this time. > > I am working on a fix here: > https://bugzilla.redhat.com/show_bug.cgi?id=583218 > > My original fix did not handle iscsi root properly, so still working on it. Mike; #3 doesn't have to do anything with it. We did test it where the luns were already created before doing #2, #4 and #5 and we hit the same issue. also we did test the workaround provided in 583218 and it worked. below are the steps: 1) Established iSCSI sessions. 2) Created volumes and mapped. 3) Ran IO for sometime and stopped. 4) Logged out all sessions using the command iscsiadm –m node –U all 5) Rebooted the host. The host rebooted without any issues. So, does that mean that this bug is a dup of 583218?
(In reply to comment #2) > So, does that mean that this bug is a dup of 583218? Yep, dup. *** This bug has been marked as a duplicate of bug 583218 ***