168041 – RAID1 crashes when trying to raidhotremove a few partitions with devlabel

Bug 168041 - RAID1 crashes when trying to raidhotremove a few partitions with devlabel

Summary: RAID1 crashes when trying to raidhotremove a few partitions with devlabel

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Doug Ledford
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	170445
TreeView+	depends on / blocked

Reported:	2005-09-11 14:54 UTC by Dan Fruehauf
Modified:	2007-11-30 22:07 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-10-12 18:15:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
The panic (106.04 KB, image/jpeg) 2005-09-11 14:56 UTC, Dan Fruehauf	no flags	Details
View All

Description Dan Fruehauf 2005-09-11 14:54:38 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050623 Fedora/1.0.4-5 Firefox/1.0.4

Description of problem:
I'm using devlabel and above is I have a few raidsets of RAID1 configured.
If i'm trying to run raidsetfaulty and raidhotremove on some device a kernel panic arrives.
Note that /etc/raidtab is using links which devlabel created. If used without links this does not happen. IMHO it looks like some race condition when trying to resolve those links.

my /etc/raidtab :
raiddev /dev/md0
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part1
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part1
        raid-disk       1

raiddev /dev/md1
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part2
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part2
        raid-disk       1

raiddev /dev/md2
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part3
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part3
        raid-disk       1

raiddev /dev/md3
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part4
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part4
        raid-disk       1

/etc/sysconfig/devlabel seems irrelevant, but this might be relevant :
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part1 -> /dev/sda1
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part2 -> /dev/sda2
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part3 -> /dev/sda3
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part4 -> /dev/sda4
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part1 -> /dev/sdc1
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part2 -> /dev/sdc2
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part3 -> /dev/sdc3
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part4 -> /dev/sdc4


I don't think the hardware beneath is related in any way but if it is of any relevance i'll specify it when needed.

Version-Release number of selected component (if applicable):
kernel-2.4.21-32.0.1.EL

How reproducible:
Always

Steps to Reproduce:
1. Create 4 or more raidsets of RAID1
2. Run raidsetfaulty and raidhotremove on one of the disks of every raidset :
raidsetfaulty /dev/md0 /dev/sda1; raidhotremove /dev/md0 /dev/sda1;
raidsetfaulty /dev/md1 /dev/sda2; raidhotremove /dev/md1 /dev/sda2;
raidsetfaulty /dev/md2 /dev/sda3; raidhotremove /dev/md2 /dev/sda3;
raidsetfaulty /dev/md3 /dev/sda4; raidhotremove /dev/md3 /dev/sda4;
3. A kernel panic should appear
  

Actual Results:  A kernel panic appears (attached).

Expected Results:  The following disks (partitions) should have been removed from the corresponding raidsets

Additional info:

It happened also on kernel-smp-2.4.21-37.EL which is in the RHEL3 beta right now.

Comment 1 Dan Fruehauf 2005-09-11 14:56:51 UTC

Created attachment 118692 [details]
The panic

Comment 2 Dan Fruehauf 2005-09-11 15:01:28 UTC

In steps to reproduce - I forgot to mention the raidsets should be configured
above devlabel or some other linking mechanism.

Comment 3 Dan Fruehauf 2005-09-12 11:43:24 UTC

I tried running again the following with a delay of 1 second between each
command and it worked :
raidsetfaulty /dev/md0 /dev/sda1; sleep 1; raidhotremove /dev/md0 /dev/sda1; sleep 1
raidsetfaulty /dev/md1 /dev/sda2; sleep 1; raidhotremove /dev/md1 /dev/sda2; sleep 1
raidsetfaulty /dev/md2 /dev/sda3; sleep 1; raidhotremove /dev/md2 /dev/sda3; sleep 1
raidsetfaulty /dev/md3 /dev/sda4; sleep 1; raidhotremove /dev/md3 /dev/sda4; sleep 1

As i suspected - this is probably some kind of a race condition.

Note You need to log in before you can comment on or make changes to this bug.