732169 – Superblock hint for external superblock should be .....

Bug 732169 - Superblock hint for external superblock should be .....

Summary: Superblock hint for external superblock should be .....

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	14
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Eric Sandeen
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-08-20 08:03 UTC by lejeczek
Modified:	2011-09-26 18:52 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2011-09-26 18:52:11 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description lejeczek 2011-08-20 08:03:26 UTC

Description of problem:

having our R815 shutdown in orderly fashion then any hard disk that is not a part of the filesystem removed, causes a file system fail to mount, fsck:

Superblock hint for external superblock should be 0xfd04

journal for the failing filesystem is external, again not on a drive being removed, journal device is an lvm2 device

then if we put those taken out drives back in the filesystem mounts fine again

fsck, if we leave removed drives out, fixes the problem and filesystem mounts ok


Version-Release number of selected component (if applicable):

2.6.35.13-92.fc14.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:
it seem like kernel and ext4 loose the track of what is what when enumerating hard drives after something was removed, but if a drive irrelevant to filesystem in question is being replaced by another drive then problem persists

Expected results:


Additional info:

Comment 1 Eric Sandeen 2011-08-22 17:30:40 UTC

int e2fsck_fix_ext3_journal_hint(e2fsck_t ctx)
{
...

        uuid_unparse(sb->s_journal_uuid, uuid);
        journal_name = blkid_get_devname(ctx->blkid, "UUID", uuid);
        if (!journal_name)
                return 0;

        if (stat(journal_name, &st) < 0)
                return 0;

        if (st.st_rdev != sb->s_journal_dev) {
                clear_problem_context(&pctx);
                pctx.num = st.st_rdev;
                if (fix_problem(ctx, PR_0_EXTERNAL_JOURNAL_HINT, &pctx)) {

so it looks at the filesystem superblock for s_journal_uuid, and then asks blkid to get the device name containing that uuid.

it then stats the device, and checks whether it has the same device number as is stored in the superblock.

This does seem like a recipe for failure if devices are rearranged...  I'll try to ask Ted, this seems weird.

(but - you said if you switch one non-fs disk with another non-fs disk you get the same problem?  Perhaps they are still enumerated differently...)

Comment 2 Eric Sandeen 2011-08-22 17:35:26 UTC

How did mount fail?

this may be expected, sadly, if device numbers are rearranged.

journal_dev=devnum      When the external journal device's major/minor numbers
                        have changed, this option allows the user to specify
                        the new journal location.  The journal device is
                        identified through its new major/minor numbers encoded
                        in devnum.


could be used to specify a new device number after you have rearranged disks.

Comment 3 lejeczek 2011-09-09 15:14:22 UTC

Hi Eric,
yes it does fail in the same fashion, whem/if a non-fs drive is being replaced with another non-fs drive.
In my case it's a hardware raid thus I'd reckon only rearranging raid devices ,that similarly bear no relation to the failing filesystem, causes ext4 to fail.

Seem like using journal_dev at mount time is a way around the problem, but so is applying fsck on the filesystem, only faster as it does not do all the work fsck does, used once at mount time heals the problem and not needed next time and FS mounts ok.

all redhat-derived distros seem to suffer from this problem, have checked Oracle 6 and SL 6.1, have not checked different distros

Comment 4 Eric Sandeen 2011-09-09 15:22:48 UTC

lejeczek, I'm afraid this behavior is by design... rearranging devices does mess up the external journal device location.

Without a mount.extN mount helper to call blkid and look for the new location, I'm not sure how we could do this differently...

Comment 5 lejeczek 2011-09-09 22:22:55 UTC

sure it's ok when/if there is an easy fix for a problem, like there is one for this very problem.
if it is by design then whether by negligence or oversight the mechanism ended up to be somewhat dysfunctional, surely this must not be a goal set by logic, if intended then only as a trade-off between whatever the designer(s) had on stake.

enumeration of the block devices seemed always to be an Achilles heel of linux in the past, I did come across it in the past (492456)

surely it would be great if this design could be rectify in some near future.

should we mark it as not-a-bug or should we leave it here as info for others?

Comment 6 Dave Jones 2011-09-26 18:52:11 UTC

I'd suggest bringing this up as a feature request upstream at linux-fsdevel.org

We wouldn't introduce something Fedora specific for this (especially in f14 at this stage), so it would have to have upstream buy-in anyway.

Note You need to log in before you can comment on or make changes to this bug.