Bug 557959 - Extended attributes being cleared by e2fsck
Extended attributes being cleared by e2fsck
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: e2fsprogs (Show other bugs)
11
All Linux
low Severity medium
: ---
: ---
Assigned To: Eric Sandeen
Fedora Extras Quality Assurance
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-01-22 16:39 EST by David Shaw
Modified: 2013-08-17 09:05 EDT (History)
6 users (show)

See Also:
Fixed In Version: e2fsprogs-1.42.8-3.fc20
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-15 23:45:03 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description David Shaw 2010-01-22 16:39:31 EST
Description of problem:

Twice now I have rebooted a box and seen a hundred or so unexpected messages from e2fsck about extended attributes being cleared:

disk: Extended attribute in inode 1437565 has a value size (0) which is invalid
CLEARED.

(for many different inodes)

This filesystem has a lot of files with single-byte xattrs (it is user.test='x').  After the fsck, I looked at a few of the files that correspond to the inodes mentioned by e2fsck, and that xattr was missing.  However, some other files were not touched by e2fsck and still had the single-digit xattr.

The only other clue I have at the moment is that in at least one of the examples, the filesystem had just been resized (online) with resize2fs.

Both boxes are Fedora 11 with kernel-2.6.30.9-102.fc11.i586 and e2fsprogs-1.41.4-12.fc11.i586

Version-Release number of selected component (if applicable):

e2fsprogs-1.41.4-12.fc11.i586
kernel-2.6.30.9-102.fc11.i586

How reproducible:

It does not always happen, unfortunately, but the most recent time it happened was a resize2fs followed by a reboot with /forcefsck set.
Comment 1 Eric Sandeen 2010-01-22 22:23:08 EST
Was this on ext4 or ext3 or ext2?

I'll double-check if anything has been fixed in either kernelspace or userspace since those versions ...

-Eric
Comment 2 Eric Sandeen 2010-01-22 22:25:23 EST
Also is selinux on?  (and of course things have been fixed since then - obviously I meant a relevant fix) :)

-Eric
Comment 3 David Shaw 2010-01-22 22:54:48 EST
This was on ext3, and selinux is off.
Comment 4 Eric Sandeen 2010-01-26 00:19:19 EST
For what it's worth, I just updated e2fsprogs for F11 (it's in testing now) but I am not yet aware of a bugfix in there that would be relevant...
Comment 5 Eric Sandeen 2010-02-17 13:26:47 EST
Is this still happening?
Comment 6 David Shaw 2010-02-17 13:32:38 EST
(In reply to comment #5)
> Is this still happening?    

It's not not happening, but it was always intermittent.
Comment 7 David Shaw 2010-04-15 22:42:03 EDT
Just had another occurrence.

For what it's worth, I noticed that the inodes that were reported on are in consecutive order:

disk: Extended attribute in inode 7972714 has a value size (0) which is invalid
CLEARED.
disk: Extended attribute in inode 7972715 has a value size (0) which is invalid
CLEARED.
disk: Extended attribute in inode 7972716 has a value size (0) which is invalid
CLEARED.
disk: Extended attribute in inode 7972717 has a value size (0) which is invalid
CLEARED.

etc.

After doing some digging with debugfs ncheck to map the inode back to a real file path, and comparing to other files, it seems that despite the error message, the file didn't have any xattrs to start with.
Comment 8 Bug Zapper 2010-04-28 07:45:08 EDT
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 9 Bug Zapper 2010-06-28 11:38:10 EDT
Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.
Comment 10 Harald Reindl 2013-04-06 13:32:43 EDT
the same happens REPEATLY on Fedora 17 on several machines

dracut: system: Extended attribute in inode 303115 has a value size (0) which is invalid
dracut: CLEARED.
dracut: system: Extended attribute in inode 303117 has a value size (0) which is invalid
dracut: CLEARED.
dracut: system: Extended attribute in inode 303122 has a value size (0) which is invalid

3.8.4-101.fc17.x86_64
3.8.6-101.fc17.x86_64
Comment 11 Eric Sandeen 2013-04-06 14:33:11 EDT
Any way to reproduce this?

What happened to this box prior to the fsck above?  (what initiated the fsck?)

Oops or powerloss?  Runtime corruption?

Got system logs or dmesg to attach?

We can't do much with just the end-result in fsck output.  It found corruption, but we don't know why.
Comment 12 Harald Reindl 2013-04-06 14:40:37 EDT
NOTHING

no oops, no opower loss, no crash, nothing at all

the fsck was initiated by "touch /forcefsck; reboot" and all few
months it results in the messages abvoe without any reason
Comment 13 Harald Reindl 2013-04-06 14:41:11 EDT
and no it happens not only on one machine, it happens on ANY out of 30 randomly
Comment 14 Eric Sandeen 2013-04-06 15:09:13 EDT
So, it's possible corruption may have happened at any time in the past, and now you forced a fack, and it found latent issues...

Are you using selinux?  Are there other applications creating xattrs?

Does it only happen on the root fs?

It might be interesting to rig initscripts to make an e2image *before* the fsck, so that if this happens, we have a pre-fsck image to look at.

Or, more easily, modify forcefsck to run "-f -n" so that it doesn't actually make the modification, and then we can go look at the inodes in question from a rescue disk.
Comment 15 Harald Reindl 2013-04-06 15:17:49 EDT
no i am not using SElinux, "selinux=0" as boot param

cat /etc/selinux/config  | grep SELINUX
SELINUX=disabled
SELINUXTYPE=targeted 

yes it happens ONLY on the rootfs and since these are production servers it' snot so easy to debug because it does not happen everytime and until now it did not produce a problem, i found this bugreport after type the message from "dmesg" in google

yes this was orginally ext3 on Fedora 9, the convert to ext4 happened 2010 because dracut was from this moment able to boot from ext4 and the messages are relative new in dmesg or where supressed in the past

tune2fs -l /dev/sdb1
tune2fs 1.42.3 (14-May-2012)
Filesystem volume name:   system
Last mounted on:          /
Filesystem UUID:          918f24a7-bc8e-4da5-8a23-8800d5104421
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file uninit_bg dir_nlink
Filesystem flags:         signed_directory_hash 
Default mount options:    journal_data_writeback user_xattr acl nobarrier
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              393216
Block count:              1572354
Reserved block count:     2
Free blocks:              1089161
Free inodes:              337884
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      383
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Filesystem created:       Mon Aug 18 06:48:05 2008
Last mount time:          Sat Mar 23 19:47:45 2013
Last write time:          Sat Apr  6 01:15:27 2013
Mount count:              4
Maximum mount count:      -1
Last checked:             Sat Mar  9 01:47:49 2013
Check interval:           31104000 (12 months)
Next check after:         Tue Mar  4 01:47:49 2014
Lifetime writes:          1206 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Journal inode:            8
First orphan inode:       34185
Default directory hash:   half_md4
Directory Hash Seed:      1e9d689f-15fe-4c0d-aaba-9d323049c7f4
Journal backup:           inode blocks
Comment 16 Eric Sandeen 2013-04-06 16:13:49 EDT
Ugh, converted ext3 - aka "never tested" :(  But I guess it was seen on ext3 as well.

Do your apps use xattrs?  What I'm trying to get at is, should we expect any xattrs on this filesystems at all?

Anyway, you apparently take the extra step to sometimes touch /forcefsck?

If you'd like to help get to the bottom of this, modifying the boxes to change /forcefsck to add "-n" and capture messages, then be able to make an e2image of the fs in question if it finds errors.  At least then I'd have slightly more to go on.

Otherwise all I have is e2fsck telling you that it found corruption, and it's pretty much impossible to work backwards from that I'm afraid.
Comment 17 Harald Reindl 2013-04-06 16:31:48 EDT
no, my apps do not use xattrs, at least not on the rootfs
"your apps" aka Fedora may but who knows :-)

only the fileserver is using it explictliy and heavily (netatalk on a 5 TB LVM with 5x1 TB vdisks) but the LVM is never cring with the messages above

i mounted over years with disabled xattrs and acl in /etc/fstab until after a kernel update there was spitted warnings that this is deprecated and the option will be removed sooner or later and "lynis" is also "happier" with the attributes enabled 

> modifying the boxes to change /forcefsck to add "-n" and capture messages

PLEASE TELL ME HOW EXACTLY, i could make a clone of one server and wait until it happens on this, yes all the amchines are running on top of VMware ESXi 5.0 but that should not really matter in this context as it happens only on the rootfs
Comment 18 Eric Sandeen 2013-04-06 16:49:54 EDT
Ho hum, well, TBH I don't have any idea how forcefsck is handled anymore.  It used to be in simple initscripts but it's been absorbed by systemd-fsck at this point, so I have no idea how we could make the modification.  Progress!
Comment 19 Eric Sandeen 2013-04-06 16:50:56 EDT
Out of curiosity, why do you do /forcefsck in the first place?
Comment 20 Harald Reindl 2013-04-06 16:55:58 EDT
> I don't have any idea how forcefsck is handled anymore

me too :-)

> Out of curiosity, why do you do /forcefsck in the first place?

because this is the way i control when fsck is done instead the typical intervals which hit you on a random reboot which is motsly the wrong moment, on a saturday at 23:00 PM is as exmaple the better time as in the business time after you rebootet for whatever reason and the 30 days are over
Comment 21 Eric Sandeen 2013-04-06 17:17:28 EDT
Ok, well, I'll try to look into this again; TBH a 0-length value should be ok, there seems to be something else going on here.

# strace getfattr -d -m - mnt/testfile1
...
getxattr("mnt/testfile1", "user.test", "", 0) = 0
getxattr("mnt/testfile1", "user.test", "", 0) = 0
# # getfattr -d -m - mnt/testfile1
# file: mnt/testfile1
user.test

#

and the fs checks fine.
Comment 22 Eric Sandeen 2013-04-07 14:26:24 EDT
Ok, I can actually reproduce this now, it may be simpler than I thought.  Sorry (esp. to the original reporter) that this has been around so long!
Comment 23 Eric Sandeen 2013-04-07 14:43:38 EDT
Essentially doing this:

truncate --size 1073741824 fsfile
mkfs.ext4 -F fsfile &>/dev/null
mount -o user_xattr,loop -o context=system_u:object_r:nfs_t:s0 fsfile mnt/
touch mnt/testfile1
setfattr -n "user.test" mnt/testfile1
umount mnt


yields the result:

e2fsck 1.43-WIP (21-Jan-2013)
Pass 1: Checking inodes, blocks, and sizes
Extended attribute in inode 12 has a value size (0) which is invalid
Clear? yes

====

I don't think there needs to be any prohibition against 0-sized values, TBH, we can probably just remove the check.

I've sent a patch upstream; apologies for this bug lingering so long, I had filed it under "weird corruption" not under "bad logic in e2fsck."
Comment 24 Harald Reindl 2013-04-07 15:55:22 EDT
no problem, since it does not damage anything as far as i know i thought a "me too" can not be a mistake :-)
Comment 25 Eric Sandeen 2013-04-07 16:15:27 EDT
No, I appreciate you bringing it up again. :)

If you have apps which want to set name-only, valueless xattrs, having e2fsck clear them might confuse the app.  I don't know what you have that is setting these xattrs.
Comment 26 Harald Reindl 2013-04-07 16:18:32 EDT
> I don't know what you have that is setting these xattrs

me too, there are mot much apps on the machines :-)

/dev/sdb1      ext4  5.8G  1.4G  4.5G  23% /

rpm -qa | wc -l
456
Comment 27 Eric Sandeen 2013-06-17 10:16:30 EDT
commit 10fc3a63d9b7efb14e810ee94ad1d2f254d44eae
Author: Eric Sandeen <sandeen@redhat.com>
Date:   Thu Apr 25 00:14:33 2013 -0400

    e2fsprogs: allow 0-length xattr values in e2fsck
    
    e2fsck thinks that this:
    
    # touch mnt/testfile1
    # setfattr -n "user.test" mnt/testfile1
    
    results in a filesystem with corruption:
    
    Pass 1: Checking inodes, blocks, and sizes
    Extended attribute in inode 12 has a value size (0) which is invalid
    Clear? yes
    
    but as far as I can tell, there is absolutely nothing wrong with
    a 0-length value on an extended attribute.  Just remove the check.
    
    Reported-by: David Shaw <dshaw@jabberwocky.com>
    Reported-by: Harald Reindl <h.reindl@thelounge.net>
    Addresses-Red-Hat-Bugzilla: #557959
    Signed-off-by: Eric Sandeen <sandeen@redhat.com>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Comment 28 Eric Sandeen 2013-08-15 23:45:03 EDT
Well, fixed in F20 now that my commit made it upstream:   e2fsprogs-1.42.8-3.fc20

Better late than never!
Comment 29 Harald Reindl 2013-08-17 08:54:18 EDT
good to hear!
thank you!
Comment 30 Harald Reindl 2013-08-17 09:05:40 EDT
BTW: please take alook at https://bugzilla.redhat.com/show_bug.cgi?id=998121
similar issue with directory depth never cleared

Note You need to log in before you can comment on or make changes to this bug.