Bug 66558

Summary: both raid and raw devices get mounted at the same time, corrupting filesystems
Product: Red Hat Enterprise Linux 2.1 Reporter: Red Hat Production Operations <soc>
Component: mountAssignee: Elliot Lee <sopwith>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: mingo
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-12-13 21:34:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Need Real Name 2002-06-12 07:54:57 UTC
We have seen several occuances of something like the following:

# df
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/md0               2063440    372380   1586244  20% /
/dev/md1                101018     11633     84169  13% /boot
none                   1029544         0   1029544   0% /dev/shm
/dev/md2               5542184    413448   4847204   8% /var
/dev/sda1               101018     11633     84169  13% /boot

In this case, /dev/sda1 is a component of /dev/md1, yet both get mounted
onto /boot.  Normally we see this immediately after a kickstart, and it
typically hits /var.  In this case, it was after a system crash/restart.
AFAIK, this was the first time we have seen this outside of the kickstart
case, and this is particularly alarming.

The "fix" for this in the past has been to catch it before it corrupts the
filesystem that it hits, unmount the 2nd mount, and fsck the affected
filesystem, and remount it.  If you merely unmount the 2nd mount, when you 
reboot, it will come back.  If you do not catch it in time, especially the 
/var case, the filesystem will be damaged beyond repair.

We do not yet know how this happens, but the only theory that makes sense
is that this is some kind of interaction with filesystem lables.  Below is
the /etc/fstab for this machine.

LABEL=/                 /                       ext3    defaults        1 1
LABEL=/boot             /boot                   ext3    defaults        1 2
none                    /dev/pts                devpts  gid=5,mode=620  0 0
none                    /proc                   proc    defaults        0 0
none                    /dev/shm                tmpfs   defaults        0 0
LABEL=/var              /var                    ext3    defaults        1 2
/dev/md3                swap                    swap    defaults        0 0

Comment 1 Elliot Lee 2002-06-12 12:12:13 UTC
I know I put a fix related to this into util-linux (try the one from 7.3). I'm
pretty sure that fix was primarily for cases where one raid array included
another, but it should ignore that have a persistent RAID superblock on them
when mounting by label.

One wild theory is that immediately after the restart, the raid array as a whole
is not available, so the first time 'mount -a' gets run, it passes over md1 and
uses sda1.

I'm assuming that /dev/md1 is a RAID-1 array that includes /dev/sda1...


Comment 2 Need Real Name 2002-06-12 14:36:11 UTC
The theory about the state of the raid array may be a clue, as /var tends to not
be fully synced when the install is done, and in the /boot case, I just checked,
and a resync was required after the reboot.

Comment 3 Elliot Lee 2002-07-11 19:48:04 UTC
Hmm, my theory is bogus, because the kernel doesn't wait until resync to make the device 
available... The most important thing seems to be being able to reproduce the problem on 
any random 7.3 system, which I can't do.

Comment 4 Phil D'Amore 2003-07-01 13:23:27 UTC
This problem had been non-existant on RHEL AS 2.1 machines.  After the recent
upgrad to the e.25 kernel on one of our Dell 1650s, this problem returned after
the next reboot.  Only way to get the double-mounting to stop is to not use the
LABEL=/var syntax, but instead mount the md directly (/dev/md3).  If you don't
do that, the filesystem mounts twice on every reboot after going to e.25. 
However, we have a new Dell 650 which does not seem to have this problem after
going to e.25.  Makes no sense.

Comment 5 Elliot Lee 2004-06-03 20:41:19 UTC
Old bug. Still present?

Comment 6 Need Real Name 2004-06-03 22:17:41 UTC
Yes, we have hit this in RHEL 2.1 kickstart installs as recently as 3
months ago when we did a buildout.

Comment 7 Matthew Galgoci 2004-06-04 15:57:41 UTC
I just ran into this on a 2.1AS-QU4 install that I performed on
yesterday. This time, /boot was double mounted with the raid meta
device /dev/md2 and /dev/sdb1.

/dev/sdb1 was mounted on top of /dev/md2, and it appeared that syslog
had started while only /dev/md2 was mounted, since I unmounted /boot
once without problems, and had to stop syslog before I could unmount
the first mount.

A recurring theme seems to be the following:

- The system has been installed previously in an identical configuration.

- The partitioning on the new install is identical to the previous
install. 

- The affected mount point involves a raid1 meta device *and* one of
it's mirrors.

- The filesystem mount point is mounted by label in /etc/fstab

- The double mount seems to occur somewhere well after the initial
filesystem hierarchy is mounted, but before mtab can be updated.

Comment 8 Matthew Galgoci 2004-06-04 16:10:31 UTC
OOOOOOOOOOoooooooooh. I just realized there was a fun error message
in dmesg:

md2: max total readahead window set to 124k
md2: 1 data-disks, max readahead per data-disk: 124k
raid1: device sdb1 operational as mirror 1
raid1: device sda1 operational as mirror 0
raid1: raid set md2 active with 2 out of 2 mirrors
md: updating md2 RAID superblock on device
md: ... autorun DONE.
md: trying to remove sdb1 from md2 ...
md: bug in file md.c, line 2332
 
md:     **********************************
md:     * <COMPLETE RAID STATE PRINTOUT> *
md:     **********************************
md2: <sdb1><sda1> array superblock:
md:  SB: (V:0.90.0) ID:<acfd394f.a9895763.de1d9c99.9ccb85dc> CT:40bf94df
md:     L1 S00136448 ND:2 RD:2 md2 LO:0 CS:65536
md:     UT:40c094cc ST:0 AD:2 WD:2 FD:0 SD:0 CSUM:fc1e41ca E:00000005
     D  0:  DISK<N:0,sda1(8,1),R:0,S:6>
     D  1:  DISK<N:1,sdb1(8,17),R:1,S:6>
md:     THIS:  DISK<N:1,sdb1(8,17),R:1,S:6>
md: rdev sdb1: O:sdb1, SZ:00136448 F:0 DN:1 md: rdev superblock:
md:  SB: (V:0.90.0) ID:<acfd394f.a9895763.de1d9c99.9ccb85dc> CT:40bf94df
md:     L1 S00136448 ND:2 RD:2 md2 LO:0 CS:65536
md:     UT:40c094cc ST:0 AD:2 WD:2 FD:0 SD:0 CSUM:fc1e41f0 E:00000005
     D  0:  DISK<N:0,sda1(8,1),R:0,S:6>
     D  1:  DISK<N:1,sdb1(8,17),R:1,S:6>
md:     THIS:  DISK<N:1,sdb1(8,17),R:1,S:6>
md: rdev sda1: O:sda1, SZ:00136448 F:0 DN:0 md: rdev superblock:
md:  SB: (V:0.90.0) ID:<acfd394f.a9895763.de1d9c99.9ccb85dc> CT:40bf94df
md:     L1 S00136448 ND:2 RD:2 md2 LO:0 CS:65536
md:     UT:40c094cc ST:0 AD:2 WD:2 FD:0 SD:0 CSUM:fc1e41de E:00000005
     D  0:  DISK<N:0,sda1(8,1),R:0,S:6>
     D  1:  DISK<N:1,sdb1(8,17),R:1,S:6>
md:     THIS:  DISK<N:0,sda1(8,1),R:0,S:6>
md1: <sdb2><sda2> array superblock:
md:  SB: (V:0.90.0) ID:<6406712f.9645c33a.00038d48.d1c2d7c4> CT:40bf94bf
md:     L1 S02096384 ND:2 RD:2 md1 LO:0 CS:65536
md:     UT:40bfa0e6 ST:0 AD:2 WD:2 FD:0 SD:0 CSUM:f6de1bc9 E:00000003
     D  0:  DISK<N:0,sda2(8,2),R:0,S:6>
     D  1:  DISK<N:1,sdb2(8,18),R:1,S:6>
md:     THIS:  DISK<N:1,sdb2(8,18),R:1,S:6>
md: rdev sdb2: O:sdb2, SZ:02096384 F:0 DN:1 md: rdev superblock:
md:  SB: (V:0.90.0) ID:<6406712f.9645c33a.00038d48.d1c2d7c4> CT:40bf94bf
md:     L1 S02096384 ND:2 RD:2 md1 LO:0 CS:65536
md:     UT:40bfa0e6 ST:0 AD:2 WD:2 FD:0 SD:0 CSUM:f6de1c37 E:00000003
     D  0:  DISK<N:0,sda2(8,2),R:0,S:6>
     D  1:  DISK<N:1,sdb2(8,18),R:1,S:6>
md:     THIS:  DISK<N:1,sdb2(8,18),R:1,S:6>
md: rdev sda2: O:sda2, SZ:02096384 F:0 DN:0 md: rdev superblock:
md:  SB: (V:0.90.0) ID:<6406712f.9645c33a.00038d48.d1c2d7c4> CT:40bf94bf
md:     L1 S02096384 ND:2 RD:2 md1 LO:0 CS:65536
md:     UT:40bfa0e6 ST:0 AD:2 WD:2 FD:0 SD:0 CSUM:f6de1c25 E:00000003
     D  0:  DISK<N:0,sda2(8,2),R:0,S:6>
     D  1:  DISK<N:1,sdb2(8,18),R:1,S:6>
md:     THIS:  DISK<N:0,sda2(8,2),R:0,S:6>
md0: <sdb3><sda3> array superblock:
md:  SB: (V:0.90.0) ID:<298764ac.b027302b.a8d37f79.8c6c82a7> CT:40bf94c6
md:     L1 S06650816 ND:2 RD:2 md0 LO:0 CS:65536
md:     UT:40bfa0e5 ST:0 AD:2 WD:2 FD:0 SD:0 CSUM:39ff9815 E:00000003
     D  0:  DISK<N:0,sda3(8,3),R:0,S:6>
     D  1:  DISK<N:1,sdb3(8,19),R:1,S:6>
md:     THIS:  DISK<N:1,sdb3(8,19),R:1,S:6>
md: rdev sdb3: O:sdb3, SZ:06650816 F:0 DN:1 md: rdev superblock:
md:  SB: (V:0.90.0) ID:<298764ac.b027302b.a8d37f79.8c6c82a7> CT:40bf94c6
md:     L1 S06650816 ND:2 RD:2 md0 LO:0 CS:65536
md:     UT:40bfa0e5 ST:0 AD:2 WD:2 FD:0 SD:0 CSUM:39ff9882 E:00000003
     D  0:  DISK<N:0,sda3(8,3),R:0,S:6>
     D  1:  DISK<N:1,sdb3(8,19),R:1,S:6>
md:     THIS:  DISK<N:1,sdb3(8,19),R:1,S:6>
md: rdev sda3: O:sda3, SZ:06650816 F:0 DN:0 md: rdev superblock:
md:  SB: (V:0.90.0) ID:<298764ac.b027302b.a8d37f79.8c6c82a7> CT:40bf94c6
md:     L1 S06650816 ND:2 RD:2 md0 LO:0 CS:65536
md:     UT:40bfa0e5 ST:0 AD:2 WD:2 FD:0 SD:0 CSUM:39ff9870 E:00000003
     D  0:  DISK<N:0,sda3(8,3),R:0,S:6>
     D  1:  DISK<N:1,sdb3(8,19),R:1,S:6>
md:     THIS:  DISK<N:0,sda3(8,3),R:0,S:6>
md3: <sdd1><sdc1> array superblock:
md:  SB: (V:0.90.0) ID:<0f30bce0.4bb7c402.033b034e.98f8d807> CT:40bf94e7
md:     L1 S08883840 ND:2 RD:2 md3 LO:0 CS:65536
md:     UT:40bfa0e5 ST:0 AD:2 WD:2 FD:0 SD:0 CSUM:224f7092 E:00000003
     D  0:  DISK<N:0,sdc1(8,33),R:0,S:6>
     D  1:  DISK<N:1,sdd1(8,49),R:1,S:6>
md:     THIS:  DISK<N:1,sdd1(8,49),R:1,S:6>
md: rdev sdd1: O:sdd1, SZ:08883840 F:0 DN:1 md: rdev superblock:
md:  SB: (V:0.90.0) ID:<0f30bce0.4bb7c402.033b034e.98f8d807> CT:40bf94e7
md:     L1 S08883840 ND:2 RD:2 md3 LO:0 CS:65536
md:     UT:40bfa0e5 ST:0 AD:2 WD:2 FD:0 SD:0 CSUM:224f70ff E:00000003
     D  0:  DISK<N:0,sdc1(8,33),R:0,S:6>
     D  1:  DISK<N:1,sdd1(8,49),R:1,S:6>
md:     THIS:  DISK<N:1,sdd1(8,49),R:1,S:6>
md: rdev sdc1: O:sdc1, SZ:08883840 F:0 DN:0 md: rdev superblock:
md:  SB: (V:0.90.0) ID:<0f30bce0.4bb7c402.033b034e.98f8d807> CT:40bf94e7
md:     L1 S08883840 ND:2 RD:2 md3 LO:0 CS:65536
md:     UT:40bfa0e5 ST:0 AD:2 WD:2 FD:0 SD:0 CSUM:224f70ed E:00000003
     D  0:  DISK<N:0,sdc1(8,33),R:0,S:6>
     D  1:  DISK<N:1,sdd1(8,49),R:1,S:6>
md:     THIS:  DISK<N:0,sdc1(8,33),R:0,S:6>
md:     **********************************
 
md: cannot remove active disk sdb1 from md2 ...
md: updating md2 RAID superblock on device
md: md2 stopped.
md: autorun ...
md: considering sdb1 ...
md:  adding sdb1 ...
md:  adding sda1 ...
md: created md2
md: running: <sdb1><sda1>
RAID level 1 does not need chunksize! Continuing anyway.
md2: max total readahead window set to 124k
md2: 1 data-disks, max readahead per data-disk: 124k
raid1: device sdb1 operational as mirror 1
raid1: device sda1 operational as mirror 0
raid1: raid set md2 active with 2 out of 2 mirrors
md: updating md2 RAID superblock on device
md: ... autorun DONE.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.11, 3 Oct 2001 on md(9,2), internal journal
EXT3-fs: mounted filesystem with ordered data mode.


Comment 9 Elliot Lee 2004-07-16 19:04:25 UTC
Once you get that RAID array sorted out, try mount-2.11g-7 from
beehive (currently in dist-2.1AS-errata-candidate) and let me know how
it goes.

Comment 10 Elliot Lee 2004-09-28 15:33:04 UTC
Hey guys, ping, any success with the new package? Does it fix the problem?

Comment 11 John Flanagan 2004-12-13 21:34:34 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-401.html