Bug 168041

Summary: RAID1 crashes when trying to raidhotremove a few partitions with devlabel
Product: Red Hat Enterprise Linux 3 Reporter: Dan Fruehauf <danfr>
Component: kernelAssignee: Doug Ledford <dledford>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: coughlan, petrides, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-10-12 18:15:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 170445    
Attachments:
Description Flags
The panic none

Description Dan Fruehauf 2005-09-11 14:54:38 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050623 Fedora/1.0.4-5 Firefox/1.0.4

Description of problem:
I'm using devlabel and above is I have a few raidsets of RAID1 configured.
If i'm trying to run raidsetfaulty and raidhotremove on some device a kernel panic arrives.
Note that /etc/raidtab is using links which devlabel created. If used without links this does not happen. IMHO it looks like some race condition when trying to resolve those links.

my /etc/raidtab :
raiddev /dev/md0
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part1
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part1
        raid-disk       1

raiddev /dev/md1
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part2
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part2
        raid-disk       1

raiddev /dev/md2
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part3
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part3
        raid-disk       1

raiddev /dev/md3
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part4
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part4
        raid-disk       1

/etc/sysconfig/devlabel seems irrelevant, but this might be relevant :
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part1 -> /dev/sda1
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part2 -> /dev/sda2
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part3 -> /dev/sda3
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part4 -> /dev/sda4
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part1 -> /dev/sdc1
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part2 -> /dev/sdc2
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part3 -> /dev/sdc3
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part4 -> /dev/sdc4


I don't think the hardware beneath is related in any way but if it is of any relevance i'll specify it when needed.

Version-Release number of selected component (if applicable):
kernel-2.4.21-32.0.1.EL

How reproducible:
Always

Steps to Reproduce:
1. Create 4 or more raidsets of RAID1
2. Run raidsetfaulty and raidhotremove on one of the disks of every raidset :
raidsetfaulty /dev/md0 /dev/sda1; raidhotremove /dev/md0 /dev/sda1;
raidsetfaulty /dev/md1 /dev/sda2; raidhotremove /dev/md1 /dev/sda2;
raidsetfaulty /dev/md2 /dev/sda3; raidhotremove /dev/md2 /dev/sda3;
raidsetfaulty /dev/md3 /dev/sda4; raidhotremove /dev/md3 /dev/sda4;
3. A kernel panic should appear
  

Actual Results:  A kernel panic appears (attached).

Expected Results:  The following disks (partitions) should have been removed from the corresponding raidsets

Additional info:

It happened also on kernel-smp-2.4.21-37.EL which is in the RHEL3 beta right now.

Comment 1 Dan Fruehauf 2005-09-11 14:56:51 UTC
Created attachment 118692 [details]
The panic

Comment 2 Dan Fruehauf 2005-09-11 15:01:28 UTC
In steps to reproduce - I forgot to mention the raidsets should be configured
above devlabel or some other linking mechanism.

Comment 3 Dan Fruehauf 2005-09-12 11:43:24 UTC
I tried running again the following with a delay of 1 second between each
command and it worked :
raidsetfaulty /dev/md0 /dev/sda1; sleep 1; raidhotremove /dev/md0 /dev/sda1; sleep 1
raidsetfaulty /dev/md1 /dev/sda2; sleep 1; raidhotremove /dev/md1 /dev/sda2; sleep 1
raidsetfaulty /dev/md2 /dev/sda3; sleep 1; raidhotremove /dev/md2 /dev/sda3; sleep 1
raidsetfaulty /dev/md3 /dev/sda4; sleep 1; raidhotremove /dev/md3 /dev/sda4; sleep 1

As i suspected - this is probably some kind of a race condition.