Bug 168041

Summary:

RAID1 crashes when trying to raidhotremove a few partitions with devlabel

Product:

Red Hat Enterprise Linux 3

Reporter:

Dan Fruehauf <danfr>

Component:

kernel

Assignee:

Doug Ledford <dledford>

Status:

CLOSED WONTFIX

QA Contact:

Brian Brock <bbrock>

Severity:

high

Docs Contact:

Priority:

medium

Version:

3.0

CC:

coughlan, petrides, tao

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2005-10-12 18:15:40 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

170445

Attachments:

Description	Flags
The panic	none

Description Dan Fruehauf 2005-09-11 14:54:38 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050623 Fedora/1.0.4-5 Firefox/1.0.4

Description of problem:
I'm using devlabel and above is I have a few raidsets of RAID1 configured.
If i'm trying to run raidsetfaulty and raidhotremove on some device a kernel panic arrives.
Note that /etc/raidtab is using links which devlabel created. If used without links this does not happen. IMHO it looks like some race condition when trying to resolve those links.

my /etc/raidtab :
raiddev /dev/md0
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part1
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part1
        raid-disk       1

raiddev /dev/md1
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part2
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part2
        raid-disk       1

raiddev /dev/md2
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part3
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part3
        raid-disk       1

raiddev /dev/md3
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        chunk-size      4
        persistent-superblock   1
        device  /dev/mcs/mcs_disk1_part4
        raid-disk       0
        device  /dev/mcs/mcs_disk7_part4
        raid-disk       1

/etc/sysconfig/devlabel seems irrelevant, but this might be relevant :
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part1 -> /dev/sda1
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part2 -> /dev/sda2
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part3 -> /dev/sda3
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk1_part4 -> /dev/sda4
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part1 -> /dev/sdc1
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part2 -> /dev/sdc2
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part3 -> /dev/sdc3
lrwxrwxrwx    1 root     root            9 Sep 11 17:46 /dev/mcs/mcs_disk7_part4 -> /dev/sdc4


I don't think the hardware beneath is related in any way but if it is of any relevance i'll specify it when needed.

Version-Release number of selected component (if applicable):
kernel-2.4.21-32.0.1.EL

How reproducible:
Always

Steps to Reproduce:
1. Create 4 or more raidsets of RAID1
2. Run raidsetfaulty and raidhotremove on one of the disks of every raidset :
raidsetfaulty /dev/md0 /dev/sda1; raidhotremove /dev/md0 /dev/sda1;
raidsetfaulty /dev/md1 /dev/sda2; raidhotremove /dev/md1 /dev/sda2;
raidsetfaulty /dev/md2 /dev/sda3; raidhotremove /dev/md2 /dev/sda3;
raidsetfaulty /dev/md3 /dev/sda4; raidhotremove /dev/md3 /dev/sda4;
3. A kernel panic should appear
  

Actual Results:  A kernel panic appears (attached).

Expected Results:  The following disks (partitions) should have been removed from the corresponding raidsets

Additional info:

It happened also on kernel-smp-2.4.21-37.EL which is in the RHEL3 beta right now.

Comment 1 Dan Fruehauf 2005-09-11 14:56:51 UTC

Created attachment 118692 [details]
The panic

Comment 2 Dan Fruehauf 2005-09-11 15:01:28 UTC

In steps to reproduce - I forgot to mention the raidsets should be configured
above devlabel or some other linking mechanism.

Comment 3 Dan Fruehauf 2005-09-12 11:43:24 UTC

I tried running again the following with a delay of 1 second between each
command and it worked :
raidsetfaulty /dev/md0 /dev/sda1; sleep 1; raidhotremove /dev/md0 /dev/sda1; sleep 1
raidsetfaulty /dev/md1 /dev/sda2; sleep 1; raidhotremove /dev/md1 /dev/sda2; sleep 1
raidsetfaulty /dev/md2 /dev/sda3; sleep 1; raidhotremove /dev/md2 /dev/sda3; sleep 1
raidsetfaulty /dev/md3 /dev/sda4; sleep 1; raidhotremove /dev/md3 /dev/sda4; sleep 1

As i suspected - this is probably some kind of a race condition.