114641 – RAID doesn't recognize mirror disk

Bug 114641 - RAID doesn't recognize mirror disk

Summary: RAID doesn't recognize mirror disk

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	raidtools
Sub Component:
Version:	9
Hardware:	i386
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Doug Ledford
QA Contact:	David Lawrence
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-01-30 16:50 UTC by Josi Vicente Nzqez Zuleta
Modified:	2005-10-31 22:00 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-01-31 14:55:22 UTC
Embargoed:

Attachments	(Terms of Use)

Description Josi Vicente Nzqez Zuleta 2004-01-30 16:50:54 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1)
Gecko/20031030

Description of problem:
When starting, the raidtools complain about the following:

Jan 30 10:25:33 lnxdev0005 kernel: md: autorun ...
Jan 30 10:25:33 lnxdev0005 kernel: md: considering sda1 ...
Jan 30 10:25:33 lnxdev0005 kernel: md:  adding sda1 ...
Jan 30 10:25:33 lnxdev0005 kernel: md: created md2
Jan 30 10:25:33 lnxdev0005 kernel: md: bind<sda1,1>
Jan 30 10:25:33 lnxdev0005 kernel: md: running: <sda1>
Jan 30 10:25:33 lnxdev0005 kernel: md: sda1's event counter: 00000011
Jan 30 10:25:33 lnxdev0005 kernel: md: RAID level 1 does not need
chunksize!
Continuing anyway.
Jan 30 10:25:33 lnxdev0005 kernel: md2: max total readahead window set
to 124k
Jan 30 10:25:33 lnxdev0005 kernel: md2: 1 data-disks, max readahead per
data-disk: 124k
Jan 30 10:25:33 lnxdev0005 kernel: raid1: device sda1 operational as
mirror 0
Jan 30 10:25:33 lnxdev0005 kernel: raid1: md2, not all disks are
operational --
trying to recover array
Jan 30 10:25:33 lnxdev0005 kernel: raid1: raid set md2 active with 1
out of 2
mirrors
Jan 30 10:25:33 lnxdev0005 kernel: md: updating md2 RAID superblock on
device
Jan 30 10:25:33 lnxdev0005 kernel: md: sda1 [events: 00000012]<6>(write)
sda1's sb offset: 71714048
Jan 30 10:25:33 lnxdev0005 kernel: md: recovery thread got woken up ...
Jan 30 10:25:33 lnxdev0005 kernel: md2: no spare disk to reconstruct
array! --
continuing in degraded mode
Jan 30 10:25:33 lnxdev0005 kernel: md: ... autorun DONE.

I don't think is a hardware issue because I can access the missing mirror
directly from the operating system:

# This disk contains the home directories for the Stamford office
raiddev             /dev/md2
raid-level                  1
nr-raid-disks               2
#chunk-size                  64k
persistent-superblock       1
nr-spare-disks              0
    device          /dev/sda1
    raid-disk     0
    device          /dev/sdb1
    raid-disk     1


[root@lnxdev0005 proc]# dd if=/dev/sdb1 of=/dev/null count=10000
10000+0 records in
10000+0 records out
[root@lnxdev0005 proc]#

Disk /dev/sda: 73.4 GB, 73443143680 bytes
255 heads, 63 sectors/track, 8928 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
 
   Device Boot    Start       End    Blocks   Id  System
/dev/sda1             1      8928  71714128+  fd  Linux raid autodetect
 
Disk /dev/sdb: 73.4 GB, 73407900160 bytes
255 heads, 63 sectors/track, 8924 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
 
   Device Boot    Start       End    Blocks   Id  System
/dev/sdb1             1      8924  71681998+  fd  Linux raid autodetect

Version-Release number of selected component (if applicable):
raidtools-1.00.3-2

How reproducible:
Always

Steps to Reproduce:
1. Start the server
2. Watch the raidtools complain about the problem
3. No raid ma' :)
    

Actual Results:  Raidtools is unable to use the mirror, but you can
access it directly from the command line:

[root@lnxdev0005 proc]# dd if=/dev/sdb1 of=/dev/null count=10000
10000+0 records in
10000+0 records out
[root@lnxdev0005 proc]#

This is the status of the raid array:
[root@lnxdev0005 home]# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 sda1[0]
      71681920 blocks [2/1] [U_]
       
unused devices: <none>


Expected Results:  Both SCSI drives should be used on the mirror.  You
should get something like this instead:

[root@lnxdev0005 root]# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 sdb1[1] sda1[0]
      71681920 blocks [2/2] [UU]


Additional info:

This machine is using two Fanthom external drives. They have been
working well since two months ago (since installation). The drives are
working becaquse we can access the drive by mounting the affected
drive from the command line:

[root@lnxdev0005 home]# mount /dev/sdb1 /mnt/nfs
[root@lnxdev0005 home]# ls /mnt/nfs
access  ashish  cvs      josevnz  lost+found
[root@lnxdev0005 home]#

Extra information:

[root@lnxdev0005 home]# rpm -q glibc
glibc-2.3.2-27.9
[root@lnxdev0005 home]# uname -r
2.4.20-24.9smp
[root@lnxdev0005 home]# cat /etc/redhat-release
Red Hat Linux release 9 (Shrike)
[root@lnxdev0005 home]#

Comment 1 Doug Ledford 2004-01-31 14:55:22 UTC

Use the command:

mdadm /dev/md2 -a /dev/sdb1

to add the second disk into the array as a hot spare.  It is no doubt
out of sync and needs rebuilt.  After it is rebuilt, the superblock
should be updated to match that on /dev/sda1.  From that point on, the
array should start normally.  Most likely, something happened in the
past to cause the two to get out of sync and once that happened,
/dev/sdb1 was no longer considered valid for the array.  If, after
adding /dev/sdb1 to the array and waiting for it to rebuild and mark
/dev/sdb1 as up to date, the machine still refuses to start the array
properly on reboot, then reopen this bug with further details about
the messages the kernel printed out on shutdown after the array was
rebuilt and the messages the kernel printed out on reboot.  Since this
doesn't seem like a real bug, yet, I'm closing this bug report as
NOTABUG.  If what I suggest doesn't work, then reopen the bug report
and post the requested information.

Note You need to log in before you can comment on or make changes to this bug.