Bug 615709

Summary: Old raid5 does not assemble under f13, and other raid problems
Product: [Fedora] Fedora Reporter: Edek Pienkowski <spojenie>
Component: mdadmAssignee: Doug Ledford <dledford>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 13CC: anton, dledford, dougsland, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-29 13:21:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Dump of all raid partitions with mdadm -Q -E
none
Dump of all raid partitions with mdadm -Q -E
none
FWIW, last /dev/sd{a,b,c,e}7 raid5 working date none

Description Edek Pienkowski 2010-07-18 07:15:23 UTC
Description of problem:

Under F11/F8 a raid5 array of sd{a,b,c,e}7 works. Under f13, it does not assemble. When assembling with mdadm kernel sees /dev/sda7, then for the rest of devices it outputs two lines with meaning like:

the /dev/sdb7 had same UUID but different superblock than sda7
the /dev/sdb7 has different UUID than sda7

(yes, two messages with opposite meaning for each device)

This array has metadata v. 0.90, 32K left-sym, it was probably created under F8.
Version-Release number of selected component (if applicable):
kernel 2.6.33.3-85.fc13

How reproducible:
Don't really know.

Steps to Reproduce:
1. Assemble an array
  
Actual results:
The raid does not work

Expected results:
Raid works ;)

Additional info:
This is currently happening on a live-usb system, with no mdadm.conf and no modification to udev or other rules. 
Additionally, when this live boots on this machine, it makes some attempt at assembling mdraid arrays, but fails artistically, failing to assemble even other arrays, which assemble correctly later if stopped and assembled by hand (there are a couple of them ,raid0, raid1, raid5). 
Additionally, one of these partially assembled arrays contains opaquely encrypted data, and to my horror it tries to use it as swap. Later swap does not show up, so I have still some hope that the data is still there.

Since this is a large pile, I have run some hardware tests (ram, cpu) and found nothing.

What data would you need?

PS. A hint how to disable md startup during init would be helpful temporarily...

Comment 1 Chuck Ebbert 2010-07-18 16:27:02 UTC
You should be able to disable MD startup by adding

  rd_NO_MD

to the kernel command line options.  (run 'man dracut' for the full list of options.)

Comment 2 Edek Pienkowski 2010-07-18 17:21:39 UTC
(In reply to comment #1)

Thanks - this option was set. Dracut starts fine; the problem appears later during initscripts phase. I managed to disable that with AUTO -all in mdadm.conf, so it won't do any damage for now. 

I also tried to disable initialization of raid in initscripts with kernel commandline, but nothing seemed to work.

Comment 3 Chuck Ebbert 2010-07-18 22:50:22 UTC
Can you dump the superblocks with mdadm? I think the command should be:

  mdadm -Q -E <partition>

Comment 4 Edek Pienkowski 2010-07-20 17:04:11 UTC
Created attachment 433215 [details]
Dump of all raid partitions with mdadm -Q -E

Comment 5 Edek Pienkowski 2010-07-20 17:04:53 UTC
Created attachment 433216 [details]
Dump of all raid partitions with mdadm -Q -E

Comment 6 Edek Pienkowski 2010-07-20 17:05:53 UTC
Created attachment 433217 [details]
FWIW, last /dev/sd{a,b,c,e}7 raid5 working date

Comment 7 Edek Pienkowski 2010-07-20 17:23:58 UTC
The raid partitions are organized as follows: there are four drives, partitioned in exactly the same way. The same partition number on all four drives forms a raid (level 0,1, or 5).

What I see is that sda7 "thinks" it is part of a healthy array, while sdb7, sdc7 and sde7 think that sda7 is removed. 

How it became so, I don't know. Basically, I've been using this setup for a couple of years without trouble. Recently, there were two "events"
- I made a backup of LVs on sd{a,b,c,e}7 raid5 and sd{a,b,c,e}6 (they form one VG) onto an external drive. What is weird, I was resizing ext3 on this external drive, and it corrupted the ext3, though I think I have done it as usual: shrink another ext3 to get some free space, shrink LV with some margin, resize2fs to enlarge a bit to get the margin back, then enlarge LV and resize of ext3 (e2fschks of course were ran as resize2fs required, and after all was done). This corrupted the enlarged ext3. This would be an event, 'cause this never happened to me before. It was done on F11.
- I waited till raid rebuilds, or checks, whatever cron does, and then I booted F13 live on this box. It failed to get raid up - unfortunately I did not save what it looked like in /proc/mdstat, but I remember that most of the volumes were wrong, such as UU__ and U_U_. I am a bit afraid of booting F13 again before I make backups of some volumes I haven't used for about a year, but I still want to have them, and they are on raid.

Another two or three events where short electricity outages (no UPS...), but during the day, when nothing heavy happens to the drives.

I had some situations similar on other boxes long time ago, both were because of failing DIMMs - but now there is ECC and no ECC events in logs. No MCEs either. It could be the motherboard, but the external drive was aoe, not SATA.

Now F11 also cannot get sd{a,b,c,e}7 up, but other raids are OK.

To recover sd{a,b,c,e}7, should I mount this raid5 with b,c,e and then add a and rebuild an array?

And, which is more important, what went wrong?

Thanks,
Edek

Comment 8 Chuck Ebbert 2010-07-22 01:23:20 UTC
(In reply to comment #7)
> To recover sd{a,b,c,e}7, should I mount this raid5 with b,c,e and then add a
> and rebuild an array?
> 

Yes, get it running without sda7 and then back it up. After you get a good backup, zero out the raid superblock on sda7 and add it back to the array. You can just clear the entire partition if you're not sure how to clear the superblock.

> And, which is more important, what went wrong?

There's probably no way of knowing that.

Comment 9 Edek Pienkowski 2010-07-22 06:14:20 UTC
(In reply to comment #8)
> (In reply to comment #7)

> > And, which is more important, what went wrong?
> 
> There's probably no way of knowing that.    

Ok, I guess I'll try to fix what there is. However, still, F13 live has problems with most of the arrays (U_U_ if I understand correctly means two out of four disks are ok), and that is during auto-detection, manually they can be assembled. Is there anything wrong with other arrays (than sda7) when looking at superblock dumps?

I'll give it one more try, if something fails like before I'll gather more data.

Comment 10 Edek Pienkowski 2010-07-24 11:16:40 UTC
Under F11 I assembled /dev/sd{b,c,e}7 and then mdadm -Iq /dev/sda7. It has rebuilt, writing mostly to sda7.

The state now is:
mdadm --assemble --scan under F11 segfaults after one array. Manually they can be assembled, filesystems/lvm are clean.

Under F13 live:
- dracut does not touch raid
- initscripts fail like before (hardly assemble anything, some are 2/4, some 1/4)
- mdadm --assemble --scan  - does what it is supposed to, assembles all arrays (if they have been all stoped manually). Filesystems are ok.

I can dump the superblocks in binary form and device sizes if you tell me where the superblocks are.

Comment 11 Chuck Ebbert 2010-08-05 19:06:24 UTC
I wonder if this is a bug in mdadm rather than a kernel bug? I'll reassign it and see what the maintainer thinks.

Comment 12 Doug Ledford 2010-08-05 19:31:54 UTC
There are significant improvements in mdadm's handling of hot plugged devices here recently.  In particular, there has been a race condition in the handling of the mdadm device map file that is likely the reason f13 is doing such a poor job assembling your arrays.  In fact, this race condition is *more* pronounced during init scripts bring up than it is during dracut bring up, so using rd_NO_MD on the command line actually makes the situation worse, not better.  Regardless though, the improved mdadm won't hit an install image until f14 install images are cut.  Fixing things in already created install images is very difficult.  I have built updated mdadm packages for f12, f13, f14, and rawhide.  The current, race fixed package is mdadm-3.1.3-0.git20100804.2, so you need that version or later to have the complete fix to the race condition that is affecting you.

As to your array on sd{a,b,c,e}7, the output of the superblocks did clearly indicate that the last three drives were up to date and the first was out of date.  An out of date drive always thinks it is up to date because once the drive is failed, we don't attempt to write a superblock that marks it as failed to the failed drive, we only update the superblocks on the remaining drives to indicate that the failed drive is failed.  The fact that the other three drives all showed only three working disks instead of four, and had an events counter that was higher than sda7's event counter is how we know this.  When we update the superblocks on the other three drives to mark sda7 as bad, we also increment the events counter, but because the superblock on sda7 wasn't updated, it has both the old count of number of working disks and the old events counter, which signals to the raid stack that it's out of date and should be kicked from the array.  The remaining array assembly problems are likely the race condition I mentioned.

If you could test with the latest mdadm, I would appreciate it.  However, depending on the version of system you are running, you need to make sure you get a matching mdadm version.  Don't attempt to use a f14 or rawhide mdadm on anything other than f14 or rawhide, f13 and earlier need the f13 or earlier mdadm packages due to a file packaging change introduced for f14 (udev no longer ships a rule file that mdadm now ships in f14, attempting to install the f14 package on f13 will cause a file conflict between mdadm and udev).  The latest mdadm package has not yet hit the updates-testing repo, but should within another day or two.

Comment 13 Edek Pienkowski 2010-08-06 05:02:56 UTC
Thanks. I'll try the new mdadm, but please give me some time.

What I noticed in the meantime, is that the live image on USB has those described problems - like 2 out of 4 drives - very often, but the same software booted from a faster source has no problems at all (ie a system installed from this live image). I do not know what race condition it is, but timing seems to affect the result.

Comment 14 Edek Pienkowski 2010-09-13 19:36:10 UTC
Hello, 

sorry it took so long. I updated mdadm on one f13 system, I can boot it now with AUTO +all, at least for a couple of reboots. It has "rotational" drives. 

I had same problem on another machine with ssd's, I'll reboot it a couple of times to check, hopefully in the next few days.

Seems to work!

Comment 15 Bug Zapper 2011-06-01 13:34:26 UTC
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 16 Bug Zapper 2011-06-29 13:21:02 UTC
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.