Bug 818087

Summary:	WARNING: at fs/dquot.c:814 dquot_claim_reserved_space() in dmesg
Product:	Red Hat Enterprise Linux 5	Reporter:	Magnus Morén <magnus.moren>
Component:	kernel	Assignee:	Red Hat Kernel Manager <kernel-mgr>
Status:	CLOSED ERRATA	QA Contact:	Petr Beňas <pbenas>
Severity:	high	Docs Contact:
Priority:	urgent
Version:	5.8	CC:	dhoward, eguan, esandeen, jkalliya, jwest, ltroan, moshiro, pbenas, pstehlik, rwheeler, yoguma
Target Milestone:	rc	Keywords:	ZStream
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-01-08 04:31:54 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	847326

Description Magnus Morén 2012-05-02 07:53:41 UTC

Description of problem:

When running ext4 with user-quota on a file server I get the following warnings in dmesg frequently (ie every other minute)

May  2 09:32:04 empok kernel: WARNING: at fs/dquot.c:814 dquot_claim_reserved_space()
May  2 09:32:04 empok kernel:
May  2 09:32:04 empok kernel: Call Trace:
May  2 09:32:04 empok kernel:  [<ffffffff801079bb>] dquot_claim_space+0xa3/0x110
May  2 09:32:04 empok kernel:  [<ffffffff88a8a2af>] :ext4:ext4_da_update_reserve_space+0x12f/0x196
May  2 09:32:04 empok kernel:  [<ffffffff88a9f4a8>] :ext4:ext4_ext_get_blocks+0x14e1/0x1665
May  2 09:32:04 empok kernel:  [<ffffffff8001ab16>] poll_freewait+0x24/0x60
May  2 09:32:04 empok kernel:  [<ffffffff8001ebf7>] __pollwait+0x0/0xe2
May  2 09:32:04 empok kernel:  [<ffffffff8008ee74>] default_wake_function+0x0/0xe
May  2 09:32:04 empok kernel:  [<ffffffff88a8a431>] :ext4:ext4_get_blocks+0x11b/0x1cf
May  2 09:32:04 empok kernel:  [<ffffffff88a8a5f4>] :ext4:mpage_da_map_and_submit+0xb0/0x791
May  2 09:32:04 empok kernel:  [<ffffffff80047c5e>] pagevec_lookup_tag+0x1a/0x21
May  2 09:32:04 empok kernel:  [<ffffffff800f7ce9>] write_cache_pages+0x164/0x334
May  2 09:32:04 empok kernel:  [<ffffffff88a8c88d>] :ext4:__mpage_da_writepage+0x0/0x154
May  2 09:32:04 empok kernel:  [<ffffffff88a8ddee>] :ext4:ext4_da_writepages+0x353/0x4f8
May  2 09:32:04 empok kernel:  [<ffffffff8005a8a3>] do_writepages+0x20/0x2f
May  2 09:32:04 empok kernel:  [<ffffffff8004f765>] __filemap_fdatawrite_range+0x50/0x5b
May  2 09:32:04 empok kernel:  [<ffffffff88a85aa0>] :ext4:ext4_release_file+0x1d/0x94
May  2 09:32:04 empok kernel:  [<ffffffff80012bdd>] __fput+0xd3/0x1bd
May  2 09:32:04 empok kernel:  [<ffffffff80023c62>] filp_close+0x5c/0x64
May  2 09:32:04 empok kernel:  [<ffffffff8001e0c4>] sys_close+0x88/0xbd
May  2 09:32:04 empok kernel:  [<ffffffff8005d116>] system_call+0x7e/0x83


Version-Release number of selected component (if applicable):

# uname -rvp
2.6.18-308.4.1.el5 #1 SMP Wed Mar 28 01:54:56 EDT 2012 x86_64

# rpm -q quota
quota-3.13-5.el5

How reproducible:

100%

Steps to Reproduce:
1. # mkfs.ext4 /dev/mapper/disk1
2. # tune2fs -c 0 -i 0 -m 0 /dev/mapper/disk1
3. Enable user quota

Actual results:

WARNING: at fs/dquot.c:814 dquot_claim_reserved_space()

Expected results:

No Warning

Additional info:

1) Probably related to: https://bugzilla.redhat.com/show_bug.cgi?id=696545

2) Devive is over multipath + iSCSI

Comment 1 Eric Sandeen 2012-05-02 14:00:33 UTC

(In reply to comment #0)
> Steps to Reproduce:
> 1. # mkfs.ext4 /dev/mapper/disk1
> 2. # tune2fs -c 0 -i 0 -m 0 /dev/mapper/disk1
> 3. Enable user quota


Can you be a little more specific on 3) ?  There must be a mount command in there somewhere, and a quota check possibly ... at what point is the mount exposed to users, and how (samba, nfs, etc?)

Thanks,
-Eric

Comment 2 Magnus Morén 2012-05-02 14:18:30 UTC

# grep /dev/mapper/disk1 /etc/fstab
/dev/mapper/disk1 /home/stud1 ext4 defaults,usrquota,nobarrier,_netdev 0 0

File data is exposed to users by: samba3x and NFS

# rpm -qa|grep samba
samba3x-common-3.5.10-0.109.el5_8
samba3x-winbind-3.5.10-0.109.el5_8
samba3x-winbind-3.5.10-0.109.el5_8
samba3x-3.5.10-0.109.el5_8

Now when I revisit my installation notes I realize that I copied the aquota.user file from the old SAN-disk (ext3 based) the the new SAN-disk (ext4).

Can this be the cause of the problem?

# cp <src>/aquota.user <dst>/aquota.user ;copy the quota-db
# quotacheck -vu <dst>                   ;Update the quota-db on an empty fs
# quotaon -vu <dst>                      ;activate quota

after this I copied all data using rsync (with samba and NFS service stopped)

# rsync -avH --delete --exclude=/aquota.user --exclude=/lost+found /home/stud1/ /new/home/stud1/

Comment 3 Eric Sandeen 2012-05-03 15:01:01 UTC

Just so I don't have to infer anything, can you give me explicit, step-by-step details from mkfs, through mount, with any quota and/or samba/nfs operations, up to rsync, and finally whatever led to the message, as well as you can?

I don't think copying aquota.user is a problem, but I'll have to give that a little more thought.

Comment 4 Magnus Morén 2012-05-03 17:58:33 UTC

- Create fs and enable quota

# yum install e4fsprogs
# mkfs.ext4 /dev/mapper/stud1
# tune4fs -c 0 -i 0 -m 0 /dev/mapper/stud1
# mount /new/home/stud1
# cp /home/stud1/aquota.user /new/home/stud1/
# quotacheck -vu /new/home/stud1
# quotaon -vu /new/home/stud1

then

# rsync ...

then

# /etc/init.d/smb start
# /etc/init.s/nfs start


- /etc/fstab - See above

- /etc/exports
/home/stud  some*.xx.se(rw) other*.xx.se(rw) more*.xx.se(rw,no_root_squash)
/home/stud1 some*.xx.se(rw) other*.xx.se(rw) more*.xx.se(rw,no_root_squash)


/etc/samba/smb.conf
[global]
 server string      = Student home file server
 workgroup          = XX
 security           = ADS
 realm              = XX.SE
 interfaces         = eth0 lo
 bind interfaces only = yes
 name resolve order = host
 smb ports          = 445
 socket options     = TCP_NODELAY
 log level          = 1
 log file           = /var/log/samba/%m.log
 max log size       = 300
 local master       = no
 preferred master   = no
 domain master      = no
 load printers      = no
 printcap name      = /dev/null
[homes]
 comment            = Home Folder
 browseable         = no
 writable           = yes
 create mask        = 0600
 directory mask     = 0700

Comment 10 Eric Sandeen 2012-06-19 18:37:14 UTC

Ok, there are still a couple of upstream commits that are needed to resolve this, I think - they address cases where delayed writes happen before quota is enabled.

Can either reporter confirm or deny that there is a chance for IO to happen before quota is enabled on the filesystem in their usecase?

Comment 11 Magnus Morén 2012-06-19 20:46:24 UTC

I do not really know exactly what you are asking...

1) I mount the file system at boot from /etc/fstab (see #2 above).

2) The NFS- and SMB-services are started manually after boot.

3) The volume is accessed via iSCSI + multipath from a Compellent SAN.

No IO is performed before quota is enabled (as far as I can tell).

Comment 16 RHEL Program Management 2012-07-03 17:38:36 UTC

This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 17 Eric Sandeen 2012-07-03 21:31:53 UTC

Magnus, there are a couple upstream patches that should take care of this.

Comment 19 Eric Sandeen 2012-07-03 21:49:12 UTC

This can be tested/demonstrated with something like:

#!/bin/bash

DEV=/dev/null # XXX CHANGEME
MNT=/mnt/test # XXX CHANGEME

umount $DEV
mkfs.ext4 $DEV 
mount $DEV $MNT -oquota
quotacheck $MNT
umount $DEV
mount $DEV $MNT -oquota
dd if=/dev/zero of=$MNT/foo bs=1M count=1
quotaon $MNT
dd if=/dev/zero of=$MNT/foo seek=1 bs=1M count=1
sync
dmesg | tail

in the failing case you'll see the traceback from the warning

Comment 30 Petr Beňas 2012-08-15 07:34:33 UTC

Reproduced in kernel-2.6.18-323.el5 and verified in kernel-2.6.18-335.el5.

Comment 32 Larry Troan 2012-08-22 20:15:35 UTC

> Please provide test package? Our partner is willing to test the fix.

I placed what didn't exceed the "quota" on my people.page from the build performed by Eric S. on 8/15. 

It's at http://people.redhat.com/ltroan/fixes/.818087/ and includes src as well as select x86 and x86_64 kernel images.

Please test and report results in your Customer Portal case or directly here in the bugzilla.

Comment 36 Eric Sandeen 2012-08-27 02:10:00 UTC

I will look at the reproducer.  However, this bug was specifically opened for the warning messages which arises from a mismatch of reserved and claimed quota space.  If that is fixed, this bug will be closed.  If you have discovered a new quota-related problem, we will need a new bug to track that.

(This behavior may be related to delayed allocation, and there is some potential mismatch in limits & actual use, but what you show above is more of an overrun than I would expect).

In any case, I suggest opening a new bug for this other problem.

Thanks,
-Eric

Comment 39 Eric Sandeen 2012-08-28 21:46:43 UTC

For what it's worth, I can't reproduce the 4% over-allocation result above on my test box.  I ran the reproducer 5 times, each with 10 loops, and did not see the error.

It would be good to know whether or not it looks like a regression on the partner's box.

-Eric

Comment 53 errata-xmlrpc 2013-01-08 04:31:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0006.html