Description of problem: having our R815 shutdown in orderly fashion then any hard disk that is not a part of the filesystem removed, causes a file system fail to mount, fsck: Superblock hint for external superblock should be 0xfd04 journal for the failing filesystem is external, again not on a drive being removed, journal device is an lvm2 device then if we put those taken out drives back in the filesystem mounts fine again fsck, if we leave removed drives out, fixes the problem and filesystem mounts ok Version-Release number of selected component (if applicable): 2.6.35.13-92.fc14.x86_64 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: it seem like kernel and ext4 loose the track of what is what when enumerating hard drives after something was removed, but if a drive irrelevant to filesystem in question is being replaced by another drive then problem persists Expected results: Additional info:
int e2fsck_fix_ext3_journal_hint(e2fsck_t ctx) { ... uuid_unparse(sb->s_journal_uuid, uuid); journal_name = blkid_get_devname(ctx->blkid, "UUID", uuid); if (!journal_name) return 0; if (stat(journal_name, &st) < 0) return 0; if (st.st_rdev != sb->s_journal_dev) { clear_problem_context(&pctx); pctx.num = st.st_rdev; if (fix_problem(ctx, PR_0_EXTERNAL_JOURNAL_HINT, &pctx)) { so it looks at the filesystem superblock for s_journal_uuid, and then asks blkid to get the device name containing that uuid. it then stats the device, and checks whether it has the same device number as is stored in the superblock. This does seem like a recipe for failure if devices are rearranged... I'll try to ask Ted, this seems weird. (but - you said if you switch one non-fs disk with another non-fs disk you get the same problem? Perhaps they are still enumerated differently...)
How did mount fail? this may be expected, sadly, if device numbers are rearranged. journal_dev=devnum When the external journal device's major/minor numbers have changed, this option allows the user to specify the new journal location. The journal device is identified through its new major/minor numbers encoded in devnum. could be used to specify a new device number after you have rearranged disks.
Hi Eric, yes it does fail in the same fashion, whem/if a non-fs drive is being replaced with another non-fs drive. In my case it's a hardware raid thus I'd reckon only rearranging raid devices ,that similarly bear no relation to the failing filesystem, causes ext4 to fail. Seem like using journal_dev at mount time is a way around the problem, but so is applying fsck on the filesystem, only faster as it does not do all the work fsck does, used once at mount time heals the problem and not needed next time and FS mounts ok. all redhat-derived distros seem to suffer from this problem, have checked Oracle 6 and SL 6.1, have not checked different distros
lejeczek, I'm afraid this behavior is by design... rearranging devices does mess up the external journal device location. Without a mount.extN mount helper to call blkid and look for the new location, I'm not sure how we could do this differently...
sure it's ok when/if there is an easy fix for a problem, like there is one for this very problem. if it is by design then whether by negligence or oversight the mechanism ended up to be somewhat dysfunctional, surely this must not be a goal set by logic, if intended then only as a trade-off between whatever the designer(s) had on stake. enumeration of the block devices seemed always to be an Achilles heel of linux in the past, I did come across it in the past (492456) surely it would be great if this design could be rectify in some near future. should we mark it as not-a-bug or should we leave it here as info for others?
I'd suggest bringing this up as a feature request upstream at linux-fsdevel.org We wouldn't introduce something Fedora specific for this (especially in f14 at this stage), so it would have to have upstream buy-in anyway.