Bug 168041 - RAID1 crashes when trying to raidhotremove a few partitions with devlabel
RAID1 crashes when trying to raidhotremove a few partitions with devlabel
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
All Linux
medium Severity high
: ---
: ---
Assigned To: Doug Ledford
Brian Brock
:
Depends On:
Blocks: 170445
  Show dependency treegraph
 
Reported: 2005-09-11 10:54 EDT by Dan Fruehauf
Modified: 2007-11-30 17:07 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-10-12 14:15:40 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
The panic (106.04 KB, image/jpeg)
2005-09-11 10:56 EDT, Dan Fruehauf
no flags Details

  None (edit)
Description Dan Fruehauf 2005-09-11 10:54:38 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050623 Fedora/1.0.4-5 Firefox/1.0.4

Description of problem:
I'm using devlabel and above is I have a few raidsets of RAID1 configured.
If i'm trying to run raidsetfaulty and raidhotremove on some device a kernel panic arrives.
Note that /etc/raidtab is using links which devlabel created. If used without links this does not happen. IMHO it looks like some race condition when trying to resolve those links.

my /etc/raidtab :
raiddev /dev/md0
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part1
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part1
        raid-disk       1

raiddev /dev/md1
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part2
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part2
        raid-disk       1

raiddev /dev/md2
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part3
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part3
        raid-disk       1

raiddev /dev/md3
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part4
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part4
        raid-disk       1

/etc/sysconfig/devlabel seems irrelevant, but this might be relevant :
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part1 -> /dev/sda1
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part2 -> /dev/sda2
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part3 -> /dev/sda3
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part4 -> /dev/sda4
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part1 -> /dev/sdc1
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part2 -> /dev/sdc2
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part3 -> /dev/sdc3
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part4 -> /dev/sdc4


I don't think the hardware beneath is related in any way but if it is of any relevance i'll specify it when needed.

Version-Release number of selected component (if applicable):
kernel-2.4.21-32.0.1.EL

How reproducible:
Always

Steps to Reproduce:
1. Create 4 or more raidsets of RAID1
2. Run raidsetfaulty and raidhotremove on one of the disks of every raidset :
raidsetfaulty /dev/md0 /dev/sda1; raidhotremove /dev/md0 /dev/sda1;
raidsetfaulty /dev/md1 /dev/sda2; raidhotremove /dev/md1 /dev/sda2;
raidsetfaulty /dev/md2 /dev/sda3; raidhotremove /dev/md2 /dev/sda3;
raidsetfaulty /dev/md3 /dev/sda4; raidhotremove /dev/md3 /dev/sda4;
3. A kernel panic should appear
  

Actual Results:  A kernel panic appears (attached).

Expected Results:  The following disks (partitions) should have been removed from the corresponding raidsets

Additional info:

It happened also on kernel-smp-2.4.21-37.EL which is in the RHEL3 beta right now.
Comment 1 Dan Fruehauf 2005-09-11 10:56:51 EDT
Created attachment 118692 [details]
The panic
Comment 2 Dan Fruehauf 2005-09-11 11:01:28 EDT
In steps to reproduce - I forgot to mention the raidsets should be configured
above devlabel or some other linking mechanism.
Comment 3 Dan Fruehauf 2005-09-12 07:43:24 EDT
I tried running again the following with a delay of 1 second between each
command and it worked :
raidsetfaulty /dev/md0 /dev/sda1; sleep 1; raidhotremove /dev/md0 /dev/sda1; sleep 1
raidsetfaulty /dev/md1 /dev/sda2; sleep 1; raidhotremove /dev/md1 /dev/sda2; sleep 1
raidsetfaulty /dev/md2 /dev/sda3; sleep 1; raidhotremove /dev/md2 /dev/sda3; sleep 1
raidsetfaulty /dev/md3 /dev/sda4; sleep 1; raidhotremove /dev/md3 /dev/sda4; sleep 1

As i suspected - this is probably some kind of a race condition.

Note You need to log in before you can comment on or make changes to this bug.