Bug 578674

Summary: JBD: Spotted dirty metadata buffer
Product: [Fedora] Fedora Reporter: Norman Gaywood <ngaywood>
Component: kernelAssignee: Eric Sandeen <esandeen>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 14CC: anton, dougsland, esandeen, gansalmon, itamar, jonathan, kernel-maint, kmcmartin, L.Bonnaud, ngaywood, pza
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.35.10-72.fc14 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-12-22 19:53:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
interleaved calls to setquota(8) and JBD messages none

Description Norman Gaywood 2010-04-01 02:43:16 UTC
Description of problem:

Changes to quota limits with setquota(8) cause messages like this in syslog:

kernel: JBD: Spotted dirty metadata buffer (dev = dm-0, blocknr = 86532501). There's a risk of filesystem corruption in case of system crash.


Version-Release number of selected component (if applicable):

Any 2.6.32 kernel

How reproducible:

Always happens for me


Expected results:

No warning message.


Additional info:

These are ext4 filesystems with user quotas enabled.

Comment 1 Norman Gaywood 2010-04-01 02:59:50 UTC
I guess I should be more specific in my kernel versions.

I've seen this in at least:

kernel-2.6.32.9-70.fc12.x86_64
kernel-2.6.32.10-90.fc12.x86_64

Comment 2 Eric Sandeen 2010-04-01 03:23:56 UTC
was this on ext3 or ext4?

Comment 3 Norman Gaywood 2010-04-01 03:29:31 UTC
I have several ext4 filesystems with user quotas enabled and it happens on all of them. Output from mount looks like:


/dev/mapper/SYSTEM-root on / type ext4 (rw,relatime)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/mapper/SYSTEM-var on /var type ext4 (rw,relatime,usrquota)
/dev/mapper/SYSTEM-opt on /opt type ext4 (rw,relatime)
/dev/xvda1 on /boot type ext3 (rw,relatime)
/dev/mapper/SYSTEM-tmp on /tmp type ext4 (rw,relatime,usrquota)
/dev/mapper/HOME-home on /.automount/turing/disks/turing/home type ext4 (rw,relatime,usrquota)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)

Comment 4 Norman Gaywood 2010-04-01 03:34:32 UTC
The syslog messages appears whenever I call this script after adding a new user:

#!/bin/sh

uid=$1

# Must have a uid
if [ -z ${uid} ]; then
  echo "$0: No uid provided"
  exit 1
fi

#  setquota ${uid} ${bsoft} ${bhard} ${isoft} ${ihard} ${filesys}
/usr/sbin/setquota ${uid} 100000 200000 100000 200000 /dev/mapper/SYSTEM-tmp
/usr/sbin/setquota ${uid} 500000 800000 100000 200000 /dev/mapper/SYSTEM-var 
/usr/sbin/setquota ${uid} 500000 540000 100000 200000 /dev/mapper/HOME-home

Comment 5 Eric Sandeen 2010-04-01 03:38:56 UTC
can you do:

# debugfs -R "icheck 86532501" /dev/dm-0 

(or whatever the device corresponsing to dm-0 is, probably your root fs device?)

Comment 6 Norman Gaywood 2010-04-01 03:56:39 UTC
/dev/dm-0 is /dev/mapper/HOME-home

debugfs -R "icheck 86532501" /dev/mapper/HOME-home 
debugfs 1.41.9 (22-Aug-2009)
Block	Inode number
86532501	100727

Comment 7 Norman Gaywood 2010-04-01 04:00:49 UTC
Created attachment 403861 [details]
interleaved calls to setquota(8) and JBD messages

I just did some extra checks to see if the messages did occur after setquota, and I'm not so sure now. Attached are the adduser logs where setquota is called and the JBD messages from syslog.

There seems to be a delay if there is a correlation.

Comment 8 Eric Sandeen 2010-04-01 04:02:18 UTC
ok and now:

# find /dev/mapper-HOME-home -inum 100727

but ... if it's not related to setquota after all, then maybe the file containing the block in question is not so interesting.  Still, may offer a clue.

Comment 9 Norman Gaywood 2010-04-01 04:22:41 UTC
100727 aquota.user

and similarly on a smaller filesystem than the home area:

debugfs -R "icheck 5547680" /dev/dm-4
debugfs 1.41.9 (22-Aug-2009)
Block	Inode number
5547680	14050

ls -li /var/aquota.user 
14050 -rw------- 1 root root 12028928 2010-04-01 15:19 /var/aquota.user

Comment 10 Eric Sandeen 2010-04-01 04:32:04 UTC
Ok, that was my guess; excellent clue, I'll see what I can make of this.

Thanks,
-Eric

Comment 11 Eric Sandeen 2010-06-02 21:35:20 UTC
There's a patch proposed upstream from Jan Kara to address this.

http://marc.info/?l=linux-ext4&m=127548861305393&w=2

[PATCH] ext4: Always journal quota file modifications

When journaled quota options are not specified, we do writes
to quota files just in data=ordered mode. This actually causes
warnings from JBD2 about dirty journaled buffer because ext4_getblk
unconditionally treats a block allocated by it as metadata. Since
quota actually is filesystem metadata, the easiest way to get rid
of the warning is to always treat quota writes as metadata...

Signed-off-by: Jan Kara <jack>
---
 fs/ext4/super.c |   19 +++++--------------
 1 files changed, 5 insertions(+), 14 deletions(-)

  Ted, this patch fixes some JBD2 warning for me when running XFSQA
with quotas enabled. I think this is a move into a direction you are
trying to achieve as well. Will you merge the patch or should I do it?

								Honza

Comment 12 Norman Gaywood 2010-06-02 23:59:35 UTC
Thanks Eric, good news.

I guess this patch will make its way into 2.6.32.16 and then into koji.

I'll wait till it hits there and then test.

Comment 13 Eric Sandeen 2010-06-03 00:18:56 UTC
Norman, dunno about 2.6.32.16 but hopefully 2.6.32.x eventually.

Sorry for the long wait on this, too many irons, only one fire ;)

-Eric

Comment 14 Eric Sandeen 2010-07-05 21:58:17 UTC
FWIW I still don't see that it's made it upstream.

I'll poke Ted on it ... it's not a huge problem unless you crash, in which case you can always rebuild quotafiles.

Comment 15 Norman Gaywood 2010-07-11 22:48:32 UTC
Eric, thanks for pursuing this.

It so happens that the system I am seeing this problem on, a Xen domU, crashes often!

See bug #550724 for the details of the system crash.

I've had many crashes but have only seen a corrupted quota file reported maybe 5 times. As you say, rebuilding the quota files allows us to continue.

Comment 16 Norman Gaywood 2010-08-11 01:33:42 UTC
Still seeing this message in 2.6.32.17-156.fc12.x86_64 so I guess the patch did not make the cut for 2.6.32.17.

I also see that 2.6.32.18 is out and I don't see anything in the patch list that indicates the patch is included.

I posted an oops to bug #608770 that was could have been caused by corruption of the quota file. I probably should have posted that report here.

I guess I'm in an unlucky state in that I'm seeing a combination of rare problems that not many others are seeing. So there is not much pressure to get this particular issue fixed.

I do appreciate the help I am getting though.

Comment 17 Eric Sandeen 2010-08-11 21:03:17 UTC
Patch just fairly recently made it upstream:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=62d2b5f2dcd3707b070efb16bbfdf6947c38c194

I'll see about getting it into .32.y but can put it in fedora in the meantime...

-Eric

Comment 18 Norman Gaywood 2010-08-12 13:11:53 UTC
(In reply to comment #17)

> I'll see about getting it into .32.y but can put it in fedora in the
> meantime...

Thanks Eric for the attention. I was wondering though, is this a serious problem? Is it just an ugly warning message?

Will there always be corruption if there is a crash or does the crash have to happen within a certain time after the message?

I get a lot of crashes and sometimes I get quotafile corruption. Will the patch potentially stop the corruption or will it just hide the error message?

Comment 19 Eric Sandeen 2010-08-12 13:48:14 UTC
I think it's mostly just a warning.  You weren't journaling quota anyway, but bits of code thought you were (because of the way the IO was submitted), so issued this warning when they saw the buffer.

It's not a message that gives users a warm fuzzy though, so fixing would/will be good.

Sorry it's been open so long :)

-Eric

Comment 20 Bug Zapper 2010-11-03 18:09:27 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 21 Bug Zapper 2010-12-03 16:30:43 UTC
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 22 Norman Gaywood 2010-12-10 07:03:13 UTC
Just updated the system that is seeing this bug to F14 and the messages are still there.

kernel-2.6.35.9-64.fc14.x86_64

Same event of setting user quotas generates the message.

Comment 23 Kyle McMartin 2010-12-10 14:47:34 UTC
Thanks, will be in the next update of the F-14 kernel.

Comment 24 Fedora Update System 2010-12-17 15:10:46 UTC
kernel-2.6.35.10-68.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/kernel-2.6.35.10-68.fc14

Comment 25 Fedora Update System 2010-12-19 23:57:08 UTC
kernel-2.6.35.10-69.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/kernel-2.6.35.10-69.fc14

Comment 26 Fedora Update System 2010-12-21 13:55:43 UTC
kernel-2.6.35.10-72.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/kernel-2.6.35.10-72.fc14

Comment 27 Phil Anderson 2010-12-21 15:12:00 UTC
I'm seeing this in RHEL6 as well

Comment 28 Phil Anderson 2010-12-21 15:13:08 UTC
Ignore my comment. RHEL6 is addressed in:
http://rhn.redhat.com/errata/RHSA-2010-0842.html
https://bugzilla.redhat.com/show_bug.cgi?id=641454

Comment 29 Fedora Update System 2010-12-22 00:03:56 UTC
kernel-2.6.35.10-72.fc14 has been pushed to the Fedora 14 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: https://admin.fedoraproject.org/updates/kernel-2.6.35.10-72.fc14

Comment 30 Fedora Update System 2010-12-22 19:52:20 UTC
kernel-2.6.35.10-72.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.