Bug 168041 - RAID1 crashes when trying to raidhotremove a few partitions with devlabel
Summary: RAID1 crashes when trying to raidhotremove a few partitions with devlabel
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
Assignee: Doug Ledford
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 170445
TreeView+ depends on / blocked
 
Reported: 2005-09-11 14:54 UTC by Dan Fruehauf
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-10-12 18:15:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The panic (106.04 KB, image/jpeg)
2005-09-11 14:56 UTC, Dan Fruehauf
no flags Details

Description Dan Fruehauf 2005-09-11 14:54:38 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050623 Fedora/1.0.4-5 Firefox/1.0.4

Description of problem:
I'm using devlabel and above is I have a few raidsets of RAID1 configured.
If i'm trying to run raidsetfaulty and raidhotremove on some device a kernel panic arrives.
Note that /etc/raidtab is using links which devlabel created. If used without links this does not happen. IMHO it looks like some race condition when trying to resolve those links.

my /etc/raidtab :
raiddev /dev/md0
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part1
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part1
        raid-disk       1

raiddev /dev/md1
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part2
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part2
        raid-disk       1

raiddev /dev/md2
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part3
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part3
        raid-disk       1

raiddev /dev/md3
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part4
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part4
        raid-disk       1

/etc/sysconfig/devlabel seems irrelevant, but this might be relevant :
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part1 -> /dev/sda1
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part2 -> /dev/sda2
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part3 -> /dev/sda3
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part4 -> /dev/sda4
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part1 -> /dev/sdc1
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part2 -> /dev/sdc2
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part3 -> /dev/sdc3
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part4 -> /dev/sdc4


I don't think the hardware beneath is related in any way but if it is of any relevance i'll specify it when needed.

Version-Release number of selected component (if applicable):
kernel-2.4.21-32.0.1.EL

How reproducible:
Always

Steps to Reproduce:
1. Create 4 or more raidsets of RAID1
2. Run raidsetfaulty and raidhotremove on one of the disks of every raidset :
raidsetfaulty /dev/md0 /dev/sda1; raidhotremove /dev/md0 /dev/sda1;
raidsetfaulty /dev/md1 /dev/sda2; raidhotremove /dev/md1 /dev/sda2;
raidsetfaulty /dev/md2 /dev/sda3; raidhotremove /dev/md2 /dev/sda3;
raidsetfaulty /dev/md3 /dev/sda4; raidhotremove /dev/md3 /dev/sda4;
3. A kernel panic should appear
  

Actual Results:  A kernel panic appears (attached).

Expected Results:  The following disks (partitions) should have been removed from the corresponding raidsets

Additional info:

It happened also on kernel-smp-2.4.21-37.EL which is in the RHEL3 beta right now.

Comment 1 Dan Fruehauf 2005-09-11 14:56:51 UTC
Created attachment 118692 [details]
The panic

Comment 2 Dan Fruehauf 2005-09-11 15:01:28 UTC
In steps to reproduce - I forgot to mention the raidsets should be configured
above devlabel or some other linking mechanism.

Comment 3 Dan Fruehauf 2005-09-12 11:43:24 UTC
I tried running again the following with a delay of 1 second between each
command and it worked :
raidsetfaulty /dev/md0 /dev/sda1; sleep 1; raidhotremove /dev/md0 /dev/sda1; sleep 1
raidsetfaulty /dev/md1 /dev/sda2; sleep 1; raidhotremove /dev/md1 /dev/sda2; sleep 1
raidsetfaulty /dev/md2 /dev/sda3; sleep 1; raidhotremove /dev/md2 /dev/sda3; sleep 1
raidsetfaulty /dev/md3 /dev/sda4; sleep 1; raidhotremove /dev/md3 /dev/sda4; sleep 1

As i suspected - this is probably some kind of a race condition.


Note You need to log in before you can comment on or make changes to this bug.