Bug 590173 - [LSI-CR177223] - RHEL 5.5 server hangs during a reboot after mapping Luns to a Dell target through iSCSI
Summary: [LSI-CR177223] - RHEL 5.5 server hangs during a reboot after mapping Luns to ...
Keywords:
Status: CLOSED DUPLICATE of bug 583218
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: iscsi-initiator-utils
Version: 5.5
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Mike Christie
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-07 21:48 UTC by Abdel Sadek
Modified: 2010-05-10 17:44 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-05-10 17:44:18 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Abdel Sadek 2010-05-07 21:48:55 UTC
Description of problem:
RHEL 5.5 server hangs during a reboot after mapping Luns to a Dell MD3000i target through iSCSI
Version-Release number of selected component (if applicable):


How reproducible:
Always.

Steps to Reproduce:
1. connect a RHEL 5.5 server to an iSCSI Dell MD300i target.
2. use software initiator to establish iSCSI sessions.
3. create few luns on the target and map them to the server.
4. check that the host sees the luns using lsscsi
5. reboot the host
  
Actual results:
After issuing a reboot, the host hangs with the messages below:
--------------------------------------------------------
21:08:13:742  TS1  >md: stopping all md devices.
21:08:14:835  TS1  >Synchronizing SCSI cache for disk sdi: 
21:08:15:054  TS1  >usb 4-1: new full speed USB device using uhci_hcd and address 2
21:08:15:163  TS1  >usb 4-1: not running at top speed; connect to a high speed hub
21:08:15:163  TS1  >usb 4-1: configuration #1 chosen from 1 choice
21:08:15:163  TS1  >hub 4-1:1.0: USB hub found
21:08:15:163  TS1  >hub 4-1:1.0: 4 ports detected
21:08:17:568  TS1  > connection1:0: ping timeout of 5 secs expired, recv timeout 10, last rx 4295260241, last ping 4295270241, now 4295275241
21:08:17:568  TS1  > connection1:0: detected conn error (1011)
21:08:17:786  TS1  > connection4:0: ping timeout of 5 secs expired, recv timeout 10, last rx 4295260484, last ping 4295270484, now 4295275484
21:08:17:786  TS1  > connection4:0: detected conn error (1011)
21:08:22:704  TS1  >hub 1-5:1.0: activate --> -19
21:08:22:814  TS1  >usb 1-5: USB disconnect, address 2
21:12:12:874  TS1  >INFO: task reboot:4573 blocked for more than 120 seconds.
21:12:12:874  TS1  >"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:12:12:874  TS1  >reboot        D 0000000000000001     0  4573      1                3888 (NOTLB)
21:12:12:874  TS1  > ffff81011e081be8 0000000000000086 0000000000000092 ffffffff8009eaed
21:12:12:874  TS1  > ffff81013b53ba90 0000000000000007 ffff810138e017e0 ffff81013fc5a820
21:12:12:983  TS1  > 0000009de31171f1 000000003b9aca00 ffff810138e019c8 000000073b3ae800
21:12:12:983  TS1  >Call Trace:
21:12:12:983  TS1  > [<ffffffff8009eaed>] __queue_work+0x43/0x53
21:12:12:983  TS1  > [<ffffffff8001cb9f>] __mod_timer+0x100/0x10f
21:12:12:983  TS1  > [<ffffffff80064167>] wait_for_completion+0x79/0xa2
21:12:12:983  TS1  > [<ffffffff8008e16d>] default_wake_function+0x0/0xe
21:12:12:983  TS1  > [<ffffffff801459cc>] blk_execute_rq_nowait+0x86/0x9a
21:12:12:983  TS1  > [<ffffffff80145a78>] blk_execute_rq+0x98/0xc0
21:12:12:983  TS1  > [<ffffffff80028bbd>] get_request_wait+0x21/0x11f
21:12:12:983  TS1  > [<ffffffff8807abbb>] :scsi_mod:scsi_execute+0xd1/0xeb
21:12:12:983  TS1  > [<ffffffff8807ac7a>] :scsi_mod:scsi_execute_req+0xa5/0xce
21:12:12:983  TS1  > [<ffffffff880a747d>] :sd_mod:sd_sync_cache+0x56/0xb4
21:12:12:983  TS1  > [<ffffffff80050000>] __filemap_fdatawrite_range+0x42/0x5b
21:12:12:983  TS1  > [<ffffffff801c9074>] class_device_get+0x14/0x1b
21:12:12:983  TS1  > [<ffffffff880a7c55>] :sd_mod:sd_shutdown+0x46/0xc9
21:12:12:983  TS1  > [<ffffffff801cc2ed>] device_shutdown+0x56/0x88
21:12:12:983  TS1  > [<ffffffff8009c4fd>] kernel_restart+0x9/0x46
21:12:12:983  TS1  > [<ffffffff8009c687>] sys_reboot+0x146/0x1c7
21:12:12:983  TS1  > [<ffffffff8003b153>] hrtimer_try_to_cancel+0x4a/0x53
21:12:12:983  TS1  > [<ffffffff8005aa74>] hrtimer_cancel+0xc/0x16
21:12:12:983  TS1  > [<ffffffff80064d05>] do_nanosleep+0x47/0x70
21:12:12:983  TS1  > [<ffffffff8005a961>] hrtimer_nanosleep+0x58/0x118
21:12:12:983  TS1  > [<ffffffff800b878c>] audit_syscall_entry+0x180/0x1b3
21:12:12:983  TS1  > [<ffffffff8005e229>] tracesys+0x71/0xe0
21:12:13:093  TS1  > [<ffffffff8005e28d>] tracesys+0xd5/0xe0
21:12:13:093  TS1  >
21:14:12:976  TS1  >INFO: task reboot:4573 blocked for more than 120 seconds.
21:14:12:976  TS1  >"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:14:12:976  TS1  >reboot        D 0000000000000001     0  4573      1                3888 (NOTLB)
21:14:12:976  TS1  > ffff81011e081be8 0000000000000086 0000000000000092 ffffffff8009eaed
21:14:12:976  TS1  > ffff81013b53ba90 0000000000000007 ffff810138e017e0 ffff81013fc5a820
21:14:12:976  TS1  > 0000009de31171f1 000000003b9aca00 ffff810138e019c8 000000073b3ae800
21:14:12:976  TS1  >Call Trace:
21:14:12:976  TS1  > [<ffffffff8009eaed>] __queue_work+0x43/0x53
21:14:12:976  TS1  > [<ffffffff8001cb9f>] __mod_timer+0x100/0x10f
21:14:12:976  TS1  > [<ffffffff80064167>] wait_for_completion+0x79/0xa2
21:14:13:085  TS1  > [<ffffffff8008e16d>] default_wake_function+0x0/0xe
21:14:13:085  TS1  > [<ffffffff801459cc>] blk_execute_rq_nowait+0x86/0x9a
21:14:13:085  TS1  > [<ffffffff80145a78>] blk_execute_rq+0x98/0xc0
21:14:13:085  TS1  > [<ffffffff80028bbd>] get_request_wait+0x21/0x11f
21:14:13:085  TS1  > [<ffffffff8807abbb>] :scsi_mod:scsi_execute+0xd1/0xeb
21:14:13:085  TS1  > [<ffffffff8807ac7a>] :scsi_mod:scsi_execute_req+0xa5/0xce
21:14:13:085  TS1  > [<ffffffff880a747d>] :sd_mod:sd_sync_cache+0x56/0xb4
21:14:13:085  TS1  > [<ffffffff80050000>] __filemap_fdatawrite_range+0x42/0x5b
21:14:13:085  TS1  > [<ffffffff801c9074>] class_device_get+0x14/0x1b
21:14:13:085  TS1  > [<ffffffff880a7c55>] :sd_mod:sd_shutdown+0x46/0xc9
21:14:13:085  TS1  > [<ffffffff801cc2ed>] device_shutdown+0x56/0x88
21:14:13:085  TS1  > [<ffffffff8009c4fd>] kernel_restart+0x9/0x46
21:14:13:085  TS1  > [<ffffffff8009c687>] sys_reboot+0x146/0x1c7
21:14:13:085  TS1  > [<ffffffff8003b153>] hrtimer_try_to_cancel+0x4a/0x53
21:14:13:085  TS1  > [<ffffffff8005aa74>] hrtimer_cancel+0xc/0x16
21:14:13:085  TS1  > [<ffffffff80064d05>] do_nanosleep+0x47/0x70
21:14:13:085  TS1  > [<ffffffff8005a961>] hrtimer_nanosleep+0x58/0x118
21:14:13:085  TS1  > [<ffffffff800b878c>] audit_syscall_entry+0x180/0x1b3
21:14:13:085  TS1  > [<ffffffff8005e229>] tracesys+0x71/0xe0
21:14:13:085  TS1  > [<ffffffff8005e28d>] tracesys+0xd5/0xe0
21:14:13:085  TS1  >
21:16:13:050  TS1  >INFO: task reboot:4573 blocked for more than 120 seconds.
21:16:13:050  TS1  >"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:16:13:050  TS1  >reboot        D 0000000000000001     0  4573      1                3888 (NOTLB)
21:16:13:050  TS1  > ffff81011e081be8 0000000000000086 0000000000000092 ffffffff8009eaed
21:16:13:159  TS1  > ffff81013b53ba90 0000000000000007 ffff810138e017e0 ffff81013fc5a820
21:16:13:159  TS1  > 0000009de31171f1 000000003b9aca00 ffff810138e019c8 000000073b3ae800
21:16:13:159  TS1  >Call Trace:
21:16:13:159  TS1  > [<ffffffff8009eaed>] __queue_work+0x43/0x53
21:16:13:159  TS1  > [<ffffffff8001cb9f>] __mod_timer+0x100/0x10f
21:16:13:159  TS1  > [<ffffffff80064167>] wait_for_completion+0x79/0xa2
21:16:13:159  TS1  > [<ffffffff8008e16d>] default_wake_function+0x0/0xe
21:16:13:159  TS1  > [<ffffffff801459cc>] blk_execute_rq_nowait+0x86/0x9a
21:16:13:159  TS1  > [<ffffffff80145a78>] blk_execute_rq+0x98/0xc0
21:16:13:159  TS1  > [<ffffffff80028bbd>] get_request_wait+0x21/0x11f
21:16:13:159  TS1  > [<ffffffff8807abbb>] :scsi_mod:scsi_execute+0xd1/0xeb
21:16:13:159  TS1  > [<ffffffff8807ac7a>] :scsi_mod:scsi_execute_req+0xa5/0xce
21:16:13:159  TS1  > [<ffffffff880a747d>] :sd_mod:sd_sync_cache+0x56/0xb4
21:16:13:159  TS1  > [<ffffffff80050000>] __filemap_fdatawrite_range+0x42/0x5b
21:16:13:159  TS1  > [<ffffffff801c9074>] class_device_get+0x14/0x1b
21:16:13:159  TS1  > [<ffffffff880a7c55>] :sd_mod:sd_shutdown+0x46/0xc9
21:16:13:159  TS1  > [<ffffffff801cc2ed>] device_shutdown+0x56/0x88
21:16:13:159  TS1  > [<ffffffff8009c4fd>] kernel_restart+0x9/0x46
21:16:13:159  TS1  > [<ffffffff8009c687>] sys_reboot+0x146/0x1c7
21:16:13:159  TS1  > [<ffffffff8003b153>] hrtimer_try_to_cancel+0x4a/0x53
21:16:13:159  TS1  > [<ffffffff8005aa74>] hrtimer_cancel+0xc/0x16
21:16:13:159  TS1  > [<ffffffff80064d05>] do_nanosleep+0x47/0x70
21:16:13:159  TS1  > [<ffffffff8005a961>] hrtimer_nanosleep+0x58/0x118
21:16:13:268  TS1  > [<ffffffff800b878c>] audit_syscall_entry+0x180/0x1b3
21:16:13:268  TS1  > [<ffffffff8005e229>] tracesys+0x71/0xe0
21:16:13:268  TS1  > [<ffffffff8005e28d>] tracesys+0xd5/0xe0
21:16:13:268  TS1  >
21:18:13:189  TS1  >INFO: task reboot:4573 blocked for more than 120 seconds.
----------------------------------------------------------------


Expected results:
Host reboots successfully and sessions are re-established.

Additional info:

Comment 1 Mike Christie 2010-05-08 02:07:54 UTC
(In reply to comment #0)
> 1. connect a RHEL 5.5 server to an iSCSI Dell MD300i target.
> 2. use software initiator to establish iSCSI sessions.
> 3. create few luns on the target and map them to the server.
> 4. check that the host sees the luns using lsscsi
> 5. reboot the host
> 


Does #3 have anything to do with this problem? If you have luns setup before you do the iscsi login, then do #2, #4, and #5, do you hit the same problem?

If so the problem is that the session are not being shutdown during reboot/shutdown. If the disks need a cache sync to be sent during kernel shutdown then we hang on that command because the network is down at this time.

I am working on a fix here:
https://bugzilla.redhat.com/show_bug.cgi?id=583218

My original fix did not handle iscsi root properly, so still working on it.

Comment 2 Abdel Sadek 2010-05-10 16:35:07 UTC
(In reply to comment #1)
> (In reply to comment #0)
> > 1. connect a RHEL 5.5 server to an iSCSI Dell MD300i target.
> > 2. use software initiator to establish iSCSI sessions.
> > 3. create few luns on the target and map them to the server.
> > 4. check that the host sees the luns using lsscsi
> > 5. reboot the host
> > 
> 
> 
> Does #3 have anything to do with this problem? If you have luns setup before
> you do the iscsi login, then do #2, #4, and #5, do you hit the same problem?
> 
> If so the problem is that the session are not being shutdown during
> reboot/shutdown. If the disks need a cache sync to be sent during kernel
> shutdown then we hang on that command because the network is down at this time.
> 
> I am working on a fix here:
> https://bugzilla.redhat.com/show_bug.cgi?id=583218
> 
> My original fix did not handle iscsi root properly, so still working on it.    

Mike;
#3 doesn't have to do anything with it. We did test it where the luns were already created before doing #2, #4 and #5 and we hit the same issue. 
also we did test the workaround provided in 583218 and it worked. below are the steps:
1) Established iSCSI sessions.
2) Created volumes and mapped.
3) Ran IO for sometime and stopped.
4) Logged out all sessions using the command iscsiadm –m node –U all
5) Rebooted the host. The host rebooted without any issues. 

So, does that mean that this bug is a dup of 583218?

Comment 3 Mike Christie 2010-05-10 17:44:18 UTC
(In reply to comment #2)

> So, does that mean that this bug is a dup of 583218?    

Yep, dup.

*** This bug has been marked as a duplicate of bug 583218 ***


Note You need to log in before you can comment on or make changes to this bug.