Bug 129160 - pool_mp -r hard locks the system after pool_mp -m w/ failed paths
pool_mp -r hard locks the system after pool_mp -m w/ failed paths
Status: CLOSED WONTFIX
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
3
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Jonathan Earl Brassow
GFS Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-08-04 12:05 EDT by Adam "mantis" Manthei
Modified: 2010-01-11 21:55 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-10-04 13:11:33 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Adam "mantis" Manthei 2004-08-04 12:05:56 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7)
Gecko/20040706 Firefox/0.9.1

Description of problem:
If pool_mp is used to change the multipath type of a failed path
before pool_mp is used to reactivate a bad path, the machine will
lockup.  There were no console messages.  

Version-Release number of selected component (if applicable):
GFS-modules-smp-6.0.0-1.2, GFS-6.0.0-1.2

How reproducible:
Always

Steps to Reproduce:
1. I have three GNBD servers exporting a shared block device. 

2. On a client node I import the block device from all three servers
and run pool on all of them.  Because it is the same block device
being exported, pool dcorrectly detects this as a multipath
environment with a default mode of "failover". 

3. mount GFS ontop of the pool and being doing load using fsstress

4. kill the active gnbd server (I used a brocade switch to determine
which path was active using the portperfshow command)

5. pool fails over to another path 

6. bring the failed gnbd server back online

7. change the multipath mode (while pool thinks that there is a failed
link)  I used the command "pool_mp -m 4" for this (note the minimum
size 64 is used instead)

8. run "pool_tool -r" to reactivate the failed path.  At this point
the node will lockup.  There is no console output put the node dues
respond to pings.
    

Additional info:
I've only tried this with GNBD so far.
Comment 1 Adam "mantis" Manthei 2004-08-04 15:25:50 EDT
interesting....

On another node I tried setting the stripe size to 1024kB and then ran
pool_mp -r and it worked without issue

[root@trin-07 root]# pool_info -v
Pool name             : trin-gnbd
Pool alias            : /dev/poolbn
Number of subpools    : 1
Total Capacity        : 1048575968
In use                : YES
Multipathing Type     : failover
Pool ID               : 0x7fffe000457256b3
Subpool 0
Number of devices     : 1
Stripe size           : 0
Type                  : gfs_data
Device 0
Path count            : 3
Good path count       : 2
Path 0                : /dev/gnbd/trin-03.lab.msp.redhat.com:gnbd1 (GOOD)
Path 1                : /dev/gnbd/trin-02.lab.msp.redhat.com:gnbd1 (GOOD)
Path 2                : /dev/gnbd/trin-01.lab.msp.redhat.com:gnbd1 (BAD )
Blocks                : 1048575968

[root@trin-07 root]# pool_mp -m 1024
Multipathing for trin-gnbd changed to 'round-robin'
[root@trin-07 root]# pool_info -v
Pool name             : trin-gnbd
Pool alias            : /dev/poolbn
Number of subpools    : 1
Total Capacity        : 1048575968
In use                : YES
Multipathing Type     : round-robin
Multipathing Stripe   : 1024
Pool ID               : 0x7fffe000457256b3
Subpool 0
Number of devices     : 1
Stripe size           : 0
Type                  : gfs_data
Device 0
Path count            : 3
Good path count       : 2
Path 0                : /dev/gnbd/trin-03.lab.msp.redhat.com:gnbd1 (GOOD)
Path 1                : /dev/gnbd/trin-02.lab.msp.redhat.com:gnbd1 (GOOD)
Path 2                : /dev/gnbd/trin-01.lab.msp.redhat.com:gnbd1 (BAD )
Blocks                : 1048575968

[root@trin-07 root]# pool_mp -r
Successfully reintegrated paths for trin-gnbd
[root@trin-07 root]# 
[root@trin-07 root]# 
[root@trin-07 root]# 
[root@trin-07 root]# 
[root@trin-07 root]# pool_info -v
Pool name             : trin-gnbd
Pool alias            : /dev/poolbn
Number of subpools    : 1
Total Capacity        : 1048575968
In use                : YES
Multipathing Type     : round-robin
Multipathing Stripe   : 1024
Pool ID               : 0x7fffe000457256b3
Subpool 0
Number of devices     : 1
Stripe size           : 0
Type                  : gfs_data
Device 0
Path count            : 3
Good path count       : 3
Path 0                : /dev/gnbd/trin-03.lab.msp.redhat.com:gnbd1 (GOOD)
Path 1                : /dev/gnbd/trin-02.lab.msp.redhat.com:gnbd1 (GOOD)
Path 2                : /dev/gnbd/trin-01.lab.msp.redhat.com:gnbd1 (GOOD)
Blocks                : 1048575968
Comment 2 Adam "mantis" Manthei 2004-08-12 13:12:24 EDT
The problem occurs when adding new paths too after changing the
multipathing type from "none" to "failover"


#
# Start with a single gnbd import and run pool_assemble
#

[root@trin-05 root]# gnbd_import
Device name : trin-02.lab.msp.redhat.com:gnbd1
----------------------
    Minor # : 1
  Proc name : gnbdb
         IP : 192.168.44.172
       Port : 14243
      State : Open Connected Pending
   Readonly : No

#
# We are in non-failover mode
#
[root@trin-05 root]# pool_info -v
Pool name             : trin-gnbd
Pool alias            : /dev/poolbn
Number of subpools    : 1
Total Capacity        : 1048575968
In use                : YES
Multipathing Type     : none
Pool ID               : 0x7fffe000457256b3
Subpool 0
Number of devices     : 1
Stripe size           : 0
Type                  : gfs_data
Device 0
Path count            : 1
Good path count       : 1
Path 0                : /dev/gnbd/trin-02.lab.msp.redhat.com:gnbd1 (GOOD)
Blocks                : 1048575968


#
# .....
#
# Start GNBD server on trin-01.lab.msp.redhat.com and import
# on our node again
#
# .....
#

[root@trin-05 root]# gnbd_import
Device name : trin-02.lab.msp.redhat.com:gnbd1
----------------------
    Minor # : 1
  Proc name : gnbdb
         IP : 192.168.44.172
       Port : 14243
      State : Open Connected Clear
   Readonly : No

Device name : trin-01.lab.msp.redhat.com:gnbd1
----------------------
    Minor # : 2
  Proc name : gnbdc
         IP : 192.168.44.171
       Port : 14243
      State : Close Disconnected Clear
   Readonly : No

#
# Change to failover mode
#
[root@trin-05 root]# pool_mp -m failover
Multipathing for trin-gnbd changed to 'failover'

[root@trin-05 root]# pool_info -v
Pool name             : trin-gnbd
Pool alias            : /dev/poolbn
Number of subpools    : 1
Total Capacity        : 1048575968
In use                : YES
Multipathing Type     : failover
Pool ID               : 0x7fffe000457256b3
Subpool 0
Number of devices     : 1
Stripe size           : 0
Type                  : gfs_data
Device 0
Path count            : 1
Good path count       : 1
Path 0                : /dev/gnbd/trin-02.lab.msp.redhat.com:gnbd1 (GOOD)
Blocks                : 1048575968


#
# I should be able to reintegrate paths here, but the node locks hard.
# All I can do is ping the node.  There is no console or ssh access.
#
[root@trin-05 root]# pool_mp -r

<---  node deadlocks here  --->
Comment 3 Adam "mantis" Manthei 2004-08-12 19:16:31 EDT
A clue...

So far I have only seen this isuue on SMP kernels.  I have not been
able to reproduce it UP.  
Comment 4 Jonathan Earl Brassow 2005-10-04 13:11:33 EDT
No customers have seen this (AFAIK, no-one but adam has seen this), and pool is
EOL in RHEL3.  marking WONTFIX

Note You need to log in before you can comment on or make changes to this bug.