Description of problem: M/B failure created error in software RAID = resize inode not valid unexpected inconsistency Version-Release number of selected component (if applicable): 1.35-12.11 (latest from up2date) How reproducible: Run fsck on failed /dev/md0. fsck fails to repair the error. I have no idea how to create the error Steps to Reproduce: 1.run fsck -y /dev/md0 2.answer yes 3. Actual results: no change - error still exists Expected results: error repaired Additional info: This was posted as fixed in the latest release but apparently was not...
Could you attach the full e2fsck output, as well as the output of the following command, so I can hopefully see what is wrong with the inode: debugfs -c /dev/md0 debugfs: stat <7> Thanks, -Eric
Created attachment 299434 [details] fsck report from boot to single user
Created attachment 299435 [details] debug output from boot to single user
The results of fsck -y /dev/md0 seem different when run after booting as a single user than they do when run after the systems fails to 'maintenance.' I will try to get another set from the 'maintenance' failure but time presses. The customer wants his server back up.
Hm, so, this time no messages about the resize inode, but a different error? Perhaps making an e2image of the fs for safekeeping & later analysis would be good. In the meantime... It appears that the only 2 problems are now 2 directory inodes in lost+found/ which have bad (unfixable?) parents entries... So it looks like the rest of the fs is fixed, w.r.t. the pressing time issue. can you try "stat <15302615>" and "ls <15302615>" in debugfs, and same for the other inode? (37017175) I'm guessing they might say "size 0" Unless you know it's critical data in those lost+found files, we can maybe just nuke them with debugfs, with either kill <inode> or rm <inode> and perhaps a subsequent repair. I've found one other report of this problem, but no solution yet, can't tell if it's fixed upstream. I hope with the stat information I can recreate it here to investigate.
Can you provide the information requested? Thanks, -Eric
Created attachment 303715 [details] Data you requested with my apologies You were right about the length. I just had to get the computer back from the customer for awhile.
Thanks! If you do have the computer for a bit now you might make an e2image of the filesystem, something like e2image -r /dev/sda1 - | bzip2 > sda1.e2i.bz2 and we could do further analysis later on that, if needed. I'll see if I can work out anything from the info provided.
If there's any chance of getting access to the image, it'd be greatly helpful. So far I've not been able to recreate a filesystem with corruption which behaves in teh same way...
The thing that's very odd here is that in order to get a message like: '..' in /lost+found/#15302615 (15302615) is <The NULL inode> (0), should be /lost+found (11). then, well, the inode nr. needs to be 0, as it says. But that was in pass3, and in pass2, we do this: static int check_dotdot(e2fsck_t ctx, struct ext2_dir_entry *dirent, struct dir_info *dir, struct problem_context *pctx) { int problem = 0; if (!dirent->inode) problem = PR_2_MISSING_DOT_DOT; and since !dirent->inode (this is the inode nr; it's 0) then we'd get: Pass 2: Checking directory structure Missing '..' in directory inode 15302615. I'm just not seeing how we can get '..' in /lost+found/#15302615 (15302615) is <The NULL inode> (0), should be /lost+found (11). without first seeing: Missing '..' in directory inode 15302615.
The image should fit on a 40 GB or so disk. How about I make the image copy to a USB drive and ship it to you?
Consider this: When we booted the system with /dev/md0 in fstab, the attempt to auto mount tries to clean the hard disks, which /dev/md0 fails. This drops the system into a single user 'maintenance mode.' The time or two we ran e2fsprogs or fsck in this 'maintenance mode,' we received a different, but consistent, report on /dev/md0. I haven't put /dev/md0 back into fstab because I am afraid our problem will change while we are looking at it. The system runs because I manually mount /dev/md0 and ignore the recommended file check.
(sorry, somehow I missed the bug updates for a couple days) > How about I make the image copy to a USB drive and ship it to you? is 40GB compressed? I'd hoped it would be a bit smaller. Before we resort to that, as long as you have the image now, let me see if I can have you run a few other things to try to debug the problem. I'll look into this a bit more, and follow up with some requests for more info. Another option might be guest access to a machine where I could do some debugging. But, if you are amenable to physically sending the image, perhaps that would be the fastest path. Thanks, -Eric
I'll see what I can do about a 'guest' system.
Any news about getting access to a filesystem image one way or another? Thanks, -Eric
We had the techs out this AM to review the situation and the possibilities. We have been unable to duplicate the problem. What additional reports or files can we produce and attach to this bug report that would assist you. That would be easier than punching you through two firewalls. If we can arrange for you to view the production system as a last resort, can you learn anything in a NON-destructive mode that would help? Due to the nature of the information, we would also have to have an executed non- disclosure agreement for access. Thanks
Do you still have a copy of the bad fs? And, was e2image really 40GB compressed? When you say you can't duplicate the problem... does that mean you no longer have a filesystem which exhibits this failing fsck behavior, or? Just a note, I'm out of the office 'til next Thurs. Thanks, -Eric
I also might be able to give you a custom e2sfck which would print more info as it goes... that might be the simplest route at first. I'll cook something up when I get back. -Eric
Any more info on this, or will it be lost to the mists of time... :) (do you still have a copy of the bad fs?)
I'm afraid that without more info, we're not going to be able to fix this one. If you are able to provide anything else that might offer a clue, feel free to re-open. thanks, -Eric