Bug 470212 - System hang caused by fcport bounce (link down/up)
System hang caused by fcport bounce (link down/up)
Status: CLOSED WONTFIX
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel (Show other bugs)
1.0
x86_64 All
medium Severity high
: ---
: ---
Assigned To: Red Hat Real Time Maintenance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-06 05:01 EST by IBM Bug Proxy
Modified: 2014-08-19 16:53 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-08-19 16:53:51 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description IBM Bug Proxy 2008-11-06 05:01:07 EST
=Comment: #0=================================================
Venkateswarara Jujjuri <jvrao@us.ibm.com> - 
While the blast is hammering on 6 LUNs; I have an expect script bouncing a fcport.

This FC port bounce causes one of LUN paths to go away and come back.

5 hours into this test; a FC port offline caused the system to hang. I have collected the dump.
I will place the dump on KBIC and update this defect.
=Comment: #1=================================================
Venkateswarara Jujjuri <jvrao@us.ibm.com> - 
Crash uploaded to KBIC.

ssh USER@kernel.beaverton.ibm.com

cd /home/services/opensource/realtime/bugzilla/49395

./crash ./vmlinux ./vmcore 
=Comment: #2=================================================
Venkateswarara Jujjuri <jvrao@us.ibm.com> - 
Figured that this is a deadlock.

PID: 16879  TASK: ffff81013a06a140  CPU: 3   COMMAND: "pdflush"
Owns inode_lock and waiting for kernel_sem

PID: 8299   TASK: ffff81024f03e100  CPU: 2   COMMAND: "multipathd"
Owns kernel_sem and waiting for inode_lock.


PID: 8299   TASK: ffff81024f03e100  CPU: 2   COMMAND: "multipathd"^M
 #0 [ffff81024ad338c8] schedule at ffffffff8128534c^M
 #1 [ffff81024ad33980] rt_spin_lock_slowlock at ffffffff81286d15^M << Waiting for inode_lock
 #2 [ffff81024ad33a40] __rt_spin_lock at ffffffff812873b0^M
 #3 [ffff81024ad33a50] rt_spin_lock at ffffffff812873bb^M
 #4 [ffff81024ad33a60] ifind_fast at ffffffff810c32a1^M
 #5 [ffff81024ad33a90] iget_locked at ffffffff810c3be8^M
 #6 [ffff81024ad33ad0] proc_get_inode at ffffffff810ebaff^M
 #7 [ffff81024ad33b10] proc_lookup at ffffffff810f082a^M
 #8 [ffff81024ad33b40] proc_root_lookup at ffffffff810ec312^M
 #9 [ffff81024ad33b70] do_lookup at ffffffff810b78f7^M
#10 [ffff81024ad33bc0] __link_path_walk at ffffffff810b9a93^M
#11 [ffff81024ad33c60] link_path_walk at ffffffff810b9fc1^M
#12 [ffff81024ad33d30] path_walk at ffffffff810ba073^M
#13 [ffff81024ad33d40] do_path_lookup at ffffffff810ba37a^M
#14 [ffff81024ad33d90] __path_lookup_intent_open at ffffffff810baeb0^M
#15 [ffff81024ad33de0] path_lookup_open at ffffffff810baf60^M
#16 [ffff81024ad33df0] open_namei at ffffffff810bb071^M
#17 [ffff81024ad33e80] do_filp_open at ffffffff810ae610^M
#18 [ffff81024ad33f30] do_sys_open at ffffffff810ae67f^M
#19 [ffff81024ad33f70] sys_open at ffffffff810ae729^M


PID: 16879  TASK: ffff81013a06a140  CPU: 3   COMMAND: "pdflush"^M
 #0 [ffff810023063aa0] schedule at ffffffff8128534c^M
 #1 [ffff810023063b58] rt_mutex_slowlock at ffffffff81286ac5^M  << Waiting for Kernel Lock
 #2 [ffff810023063c28] rt_mutex_lock at ffffffff81285fb4^M
 #3 [ffff810023063c38] rt_down at ffffffff8105fec7^M
 #4 [ffff810023063c58] lock_kernel at ffffffff81287b8c^M
 #5 [ffff810023063c78] __blkdev_put at ffffffff810d5d31^M
 #6 [ffff810023063cb8] blkdev_put at ffffffff810d5e68^M
 #7 [ffff810023063cc8] close_dev at ffffffff8819e547^M
 #8 [ffff810023063ce8] dm_put_device at ffffffff8819e579^M
 #9 [ffff810023063d08] free_priority_group at ffffffff881c0e86^M
#10 [ffff810023063d58] free_multipath at ffffffff881c0f11^M
#11 [ffff810023063d78] multipath_dtr at ffffffff881c0f73^M
#12 [ffff810023063d98] dm_table_put at ffffffff8819e347^M
#13 [ffff810023063dc8] dm_any_congested at ffffffff8819d074^M
#14 [ffff810023063df8] sync_sb_inodes at ffffffff810cd451^M
#15 [ffff810023063e38] writeback_inodes at ffffffff810cd7b5^M
#16 [ffff810023063e68] background_writeout at ffffffff8108be38^M
#17 [ffff810023063ed8] pdflush at ffffffff8108c79a^M
#18 [ffff810023063f28] kthread at ffffffff81051477^M
#19 [ffff810023063f48] kernel_thread at ffffffff8100d048^M

crash> kernel_sem
kernel_sem = $5 = {
  count = {
    counter = 0
  }, 
  lock = {
    wait_lock = {
      raw_lock = {
        slock = 34952
      }, 
      break_lock = 0
    }, 
    wait_list = {
      prio_list = {
        next = 0xffff810023063b88, 
        prev = 0xffff81014941fdc0
      }, 
      node_list = {
        next = 0xffff810023063b98, 
        prev = 0xffff81014d04ddd0
      }
    }, 
    owner = 0xffff81024f03e102 << multipathd owns it. (task 0xffff81024f03e100) 
                                                   << last two bits are flags.. so replace it with 0.
  }
}
crash> inode_lock
inode_lock = $6 = {
  lock = {
    wait_lock = {
      raw_lock = {
        slock = 3341
      }, 
      break_lock = 0
    }, 
    wait_list = {
      prio_list = {
        next = 0xffff81024ad339a0, 
        prev = 0xffff8100784fdd78
      }, 
      node_list = {
        next = 0xffff81024ad339b0, 
        prev = 0xffff81024afb3a70
      }
    }, 
    owner = 0xffff81013a06a142 << pdflush owns it. (task 0xffff81013a06a140)
  }, 
  break_lock = 0
}
crash> 



=Comment: #3=================================================
Venkateswarara Jujjuri <jvrao@us.ibm.com> - 
The try patch given by Chandra did not appear to address the issue.
I ran into the deadlock after 3 hour run.
The only difference in this crash is; sendmail holding the kernel lock and waiting for inode
lock...instead of the multipathd(in the previous crash).

This makes me to think.. if the lock sequence done by pdflush is correct?
Is there any unwritten rule that the sequence should be kernel_lock followed by inode_lock??

[root@elm3c24 ~]# crash /test/jvrao/linux-2.6.24.7-ibmrt2.12-view/vmlinux
/var/crash/2008-11-04-07\:34/vmcore

crash> inode_lock
inode_lock = $3 = {
  lock = {
    wait_lock = {
      raw_lock = {
        slock = 32639
      }, 
      break_lock = 0
    }, 
    wait_list = {
      prio_list = {
        next = 0xffff81024c4c19a0, 
        prev = 0xffff810243953dd0
      }, 
      node_list = {
        next = 0xffff81024c4c19b0, 
        prev = 0xffff810243953de0
      }
    }, 
    owner = 0xffff81024e238042
  }, 
  break_lock = 0
}

crash> kernel_sem
kernel_sem = $4 = {
  count = {
    counter = 0
  }, 
  lock = {
    wait_lock = {
      raw_lock = {
        slock = 41634
      }, 
      break_lock = 0
    }, 
    wait_list = {
      prio_list = {
        next = 0xffff81014aaef820, 
        prev = 0xffff810144037dc0
      }, 
      node_list = {
        next = 0xffff81014aaef830, 
        prev = 0xffff810205801dd0
      }
    }, 
    owner = 0xffff81024f072b62
  }
}


PID: 26543  TASK: ffff81014d8c9540  CPU: 3   COMMAND: "pdflush"
 #0 [ffff810064f65bd0] schedule at ffffffff8128531c
 #1 [ffff810064f65c88] rt_spin_lock_slowlock at ffffffff81286ce5
 #2 [ffff810064f65d48] __rt_spin_lock at ffffffff81287380
 #3 [ffff810064f65d58] rt_spin_lock at ffffffff8128738b
 #4 [ffff810064f65d68] __writeback_single_inode at ffffffff810cd01b
 #5 [ffff810064f65df8] sync_sb_inodes at ffffffff810cd4ba
 #6 [ffff810064f65e38] writeback_inodes at ffffffff810cd78d
 #7 [ffff810064f65e68] background_writeout at ffffffff8108be10
 #8 [ffff810064f65ed8] pdflush at ffffffff8108c772
 #9 [ffff810064f65f28] kthread at ffffffff8105144f
#10 [ffff810064f65f48] kernel_thread at ffffffff8100d048

PID: 10196  TASK: ffff81024f072b60  CPU: 6   COMMAND: "sendmail"
 #0 [ffff81024c4c18c8] schedule at ffffffff8128531c
 #1 [ffff81024c4c1980] rt_spin_lock_slowlock at ffffffff81286ce5
 #2 [ffff81024c4c1a40] __rt_spin_lock at ffffffff81287380
 #3 [ffff81024c4c1a50] rt_spin_lock at ffffffff8128738b
 #4 [ffff81024c4c1a60] ifind_fast at ffffffff810c3279
 #5 [ffff81024c4c1a90] iget_locked at ffffffff810c3bc0
 #6 [ffff81024c4c1ad0] proc_get_inode at ffffffff810ebad7
 #7 [ffff81024c4c1b10] proc_lookup at ffffffff810f0802
 #8 [ffff81024c4c1b40] proc_root_lookup at ffffffff810ec2ea
 #9 [ffff81024c4c1b70] do_lookup at ffffffff810b78cf
#10 [ffff81024c4c1bc0] __link_path_walk at ffffffff810b9a6b
#11 [ffff81024c4c1c60] link_path_walk at ffffffff810b9f99
#12 [ffff81024c4c1d30] path_walk at ffffffff810ba04b
#13 [ffff81024c4c1d40] do_path_lookup at ffffffff810ba352
#14 [ffff81024c4c1d90] __path_lookup_intent_open at ffffffff810bae88
#15 [ffff81024c4c1de0] path_lookup_open at ffffffff810baf38
#16 [ffff81024c4c1df0] open_namei at ffffffff810bb049
#17 [ffff81024c4c1e80] do_filp_open at ffffffff810ae5e8
#18 [ffff81024c4c1f30] do_sys_open at ffffffff810ae657
#19 [ffff81024c4c1f70] sys_open at ffffffff810ae701




=Comment: #4=================================================
Venkateswarara Jujjuri <jvrao@us.ibm.com> - 
(In reply to comment #3)
> The try patch given by Chandra did not appear to address the issue.
> I ran into the deadlock after 3 hour run.
> The only difference in this crash is; sendmail holding the kernel lock and
> waiting for inode lock...instead of the multipathd(in the previous crash).
> 
> This makes me to think.. if the lock sequence done by pdflush is correct?
> Is there any unwritten rule that the sequence should be kernel_lock followed by
> inode_lock??
> 
> [root@elm3c24 ~]# crash /test/jvrao/linux-2.6.24.7-ibmrt2.12-view/vmlinux
> /var/crash/2008-11-04-07\:34/vmcore
> 
> crash> inode_lock
> inode_lock = $3 = {
>   lock = {
>     wait_lock = {
>       raw_lock = {
>         slock = 32639
>       }, 
>       break_lock = 0
>     }, 
>     wait_list = {
>       prio_list = {
>         next = 0xffff81024c4c19a0, 
>         prev = 0xffff810243953dd0
>       }, 
>       node_list = {
>         next = 0xffff81024c4c19b0, 
>         prev = 0xffff810243953de0
>       }
>     }, 
>     owner = 0xffff81024e238042
>   }, 
>   break_lock = 0
> }
> 
> crash> kernel_sem
> kernel_sem = $4 = {
>   count = {
>     counter = 0
>   }, 
>   lock = {
>     wait_lock = {
>       raw_lock = {
>         slock = 41634
>       }, 
>       break_lock = 0
>     }, 
>     wait_list = {
>       prio_list = {
>         next = 0xffff81014aaef820, 
>         prev = 0xffff810144037dc0
>       }, 
>       node_list = {
>         next = 0xffff81014aaef830, 
>         prev = 0xffff810205801dd0
>       }
>     }, 
>     owner = 0xffff81024f072b62
>   }
> }
> 
> 
Wrong pdflush stack in the previous comment. Here is the correct one.

PID: 26687  TASK: ffff81024e238040  CPU: 3   COMMAND: "pdflush"
 #0 [ffff8101562b7c20] schedule at ffffffff8128531c
 #1 [ffff8101562b7cd8] rt_write_slowlock at ffffffff8105fcbf
 #2 [ffff8101562b7d98] rt_mutex_down_write at ffffffff8105f169
 #3 [ffff8101562b7da8] __rt_down_write at ffffffff8105fdf1
 #4 [ffff8101562b7db8] rt_down_write at ffffffff8105fe09
 #5 [ffff8101562b7dc8] dm_any_congested at ffffffff8819d642
 #6 [ffff8101562b7df8] sync_sb_inodes at ffffffff810cd429
 #7 [ffff8101562b7e38] writeback_inodes at ffffffff810cd78d
 #8 [ffff8101562b7e68] wb_kupdate at ffffffff8108bf1b
 #9 [ffff8101562b7ed8] pdflush at ffffffff8108c772
#10 [ffff8101562b7f28] kthread at ffffffff8105144f
#11 [ffff8101562b7f48] kernel_thread at ffffffff8100d048


> 
> PID: 10196  TASK: ffff81024f072b60  CPU: 6   COMMAND: "sendmail"
>  #0 [ffff81024c4c18c8] schedule at ffffffff8128531c
>  #1 [ffff81024c4c1980] rt_spin_lock_slowlock at ffffffff81286ce5
>  #2 [ffff81024c4c1a40] __rt_spin_lock at ffffffff81287380
>  #3 [ffff81024c4c1a50] rt_spin_lock at ffffffff8128738b
>  #4 [ffff81024c4c1a60] ifind_fast at ffffffff810c3279
>  #5 [ffff81024c4c1a90] iget_locked at ffffffff810c3bc0
>  #6 [ffff81024c4c1ad0] proc_get_inode at ffffffff810ebad7
>  #7 [ffff81024c4c1b10] proc_lookup at ffffffff810f0802
>  #8 [ffff81024c4c1b40] proc_root_lookup at ffffffff810ec2ea
>  #9 [ffff81024c4c1b70] do_lookup at ffffffff810b78cf
> #10 [ffff81024c4c1bc0] __link_path_walk at ffffffff810b9a6b
> #11 [ffff81024c4c1c60] link_path_walk at ffffffff810b9f99
> #12 [ffff81024c4c1d30] path_walk at ffffffff810ba04b
> #13 [ffff81024c4c1d40] do_path_lookup at ffffffff810ba352
> #14 [ffff81024c4c1d90] __path_lookup_intent_open at ffffffff810bae88
> #15 [ffff81024c4c1de0] path_lookup_open at ffffffff810baf38
> #16 [ffff81024c4c1df0] open_namei at ffffffff810bb049
> #17 [ffff81024c4c1e80] do_filp_open at ffffffff810ae5e8
> #18 [ffff81024c4c1f30] do_sys_open at ffffffff810ae657
> #19 [ffff81024c4c1f70] sys_open at ffffffff810ae701
> 


=Comment: #5=================================================
Venkateswarara Jujjuri <jvrao@us.ibm.com> - 
A brief discussion with Steven Rostedt reveled that..

The latest git tree fs/fs-writeback.c writeback_inodes() is not holding inode_lock() around the
sync_sb_inodes() call. I will try this..

 if (down_read_trylock(&sb->s_umount)) {
                                if (sb->s_root) {
                                        spin_lock(&inode_lock); <<< Take this out.
                                        sync_sb_inodes(sb, wbc);
                                        spin_unlock(&inode_lock); <<< Take this out.
                                }
                                up_read(&sb->s_umount);
                        }

=Comment: #6=================================================
Venkateswarara Jujjuri <jvrao@us.ibm.com> - 
(In reply to comment #5)
> A brief discussion with Steven Rostedt reveled that..
> 
> The latest git tree fs/fs-writeback.c writeback_inodes() is not holding
> inode_lock() around the sync_sb_inodes() call. I will try this..
> 
>  if (down_read_trylock(&sb->s_umount)) {
>                                 if (sb->s_root) {
>                                         spin_lock(&inode_lock); <<< Take this
> out.
>                                         sync_sb_inodes(sb, wbc);
>                                         spin_unlock(&inode_lock); <<< Take this
> out.
>                                 }
>                                 up_read(&sb->s_umount);
>                         }
> 

Well, little closer look reveled that; they just moved the inode_lock into the function 

sync_sb_inodes -> generic_sync_sb_inodes(); 
 generic_sync_sb_inodes()
{
spin_lock(&inode_lock); 
...
}

So, this change is not useful for us.
=Comment: #7=================================================
Venkateswarara Jujjuri <jvrao@us.ibm.com> - 
Chandra provided a new patch.. and this appears to work fine.
I successfully ran the blast overnight while the fc port got bounced every 10min.

Here is the patch.

---
Index: linux-2.6.24.7-ibmrt2.11-view/drivers/md/dm.c
===================================================================
--- linux-2.6.24.7-ibmrt2.11-view.orig/drivers/md/dm.c
+++ linux-2.6.24.7-ibmrt2.11-view/drivers/md/dm.c
@@ -876,16 +876,22 @@ static void dm_unplug_all(struct request

 static int dm_any_congested(void *congested_data, int bdi_bits)
 {
-	int r;
+	int r = bdi_bits;
 	struct mapped_device *md = (struct mapped_device *) congested_data;
-	struct dm_table *map = dm_get_table(md);
+	struct dm_table *map;

-	if (!map || test_bit(DMF_BLOCK_IO, &md->flags))
-		r = bdi_bits;
-	else
-		r = dm_table_any_congested(map, bdi_bits);
+	atomic_inc(&md->pending);

-	dm_table_put(map);
+	if (test_bit(DMF_BLOCK_IO, &md->flags))
+		goto done:
+
+	map = dm_get_table(md);
+	if (map) {
+		r = dm_table_any_congested(map, bdi_bits);
+		dm_table_put(map);
+	}
+done:
+	atomic_dec(&md->pending);
 	return r;
 }
Comment 1 IBM Bug Proxy 2008-11-06 12:41:21 EST
(In reply to comment #7)
> +       if (test_bit(DMF_BLOCK_IO, &md->flags))
> +               goto done:
Typo:
It should be "goto done;"

I have pasted the entire patch with that correction below...

Index: linux-2.6.24.7-ibmrt2.11-view/drivers/md/dm.c
===================================================================
--- linux-2.6.24.7-ibmrt2.11-view.orig/drivers/md/dm.c
+++ linux-2.6.24.7-ibmrt2.11-view/drivers/md/dm.c
@@ -876,16 +876,22 @@ static void dm_unplug_all(struct request

static int dm_any_congested(void *congested_data, int bdi_bits)
{
-       int r;
+       int r = bdi_bits;
struct mapped_device *md = (struct mapped_device *) congested_data;
-       struct dm_table *map = dm_get_table(md);
+       struct dm_table *map;

-       if (!map || test_bit(DMF_BLOCK_IO, &md->flags))
-               r = bdi_bits;
-       else
-               r = dm_table_any_congested(map, bdi_bits);
+       atomic_inc(&md->pending);

-       dm_table_put(map);
+       if (test_bit(DMF_BLOCK_IO, &md->flags))
+               goto done;
+
+       map = dm_get_table(md);
+       if (map) {
+               r = dm_table_any_congested(map, bdi_bits);
+               dm_table_put(map);
+       }
+done:
+       atomic_dec(&md->pending);
return r;
}
Comment 2 Alasdair Kergon 2008-11-06 19:20:39 EST
My comment on dm-devel was for someone to confirm that this sets bdi_bits appropriately in all cases incl. scenarios that fail the DMF_BLOCK_IO test.  This should go into the patch header.
Comment 3 Clark Williams 2010-10-21 10:34:55 EDT
We need to see if this happens on our current kernel (2.6.33.7-rt29.45.el5rt).
Comment 4 Clark Williams 2012-01-05 16:16:03 EST
has anyone tried the latest kernel with this BZ?
Comment 5 Beth Uptagrafft 2014-08-19 16:53:51 EDT
MRG-1 on Red Hat Enterprise Linux 5 reached it's end of life on March 31, 2014. Because this issue is against the MRG-1 release, we are closing it WONTFIX. If you believe this is still an issue on our most recent MRG-2.5 3.10 kernel, please file a new issue for tracking.

Note You need to log in before you can comment on or make changes to this bug.