Bug 522615 - ext4: mpage_da_map_blocks fails with EDQUOT
Summary: ext4: mpage_da_map_blocks fails with EDQUOT
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: x86_64
OS: Linux
low
urgent
Target Milestone: rc
: ---
Assignee: Eric Sandeen
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
: 523201 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-09-10 20:19 UTC by Yehia.Adham
Modified: 2018-10-27 14:06 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-03-14 18:23:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Yehia.Adham 2009-09-10 20:19:03 UTC
Hello,

A few hours ago, one of my servers encountered an error with EXT4, The
filesystem is on top of LVM  and a 2 disk array. The array and all the drives
in it are reporting optimal status and I don't suspect a hardware problem.  I
was able to reboot into single user and manually fsck the filesystem and get it
back running, but I feel like it' not the problem , that happened since we have
upgraded to RHEL 5.4 kernel version 2.6.18-164.el5 .. ( I believe LVM is not
the problem as we are expecting the same error with 2 BOXES not running over
LVM )

kernel: mpage_da_map_blocks block allocation failed for inode 1843701 at
logical offset 0 with max blocks 339 with error -122
Message from syslogd@ at Mon Sep 7 18:03:59 2009 ...
cairoserver kernel: This should not happen.!! Data will be lost

----------------

Mr. Eric Sandeen report back .

error -122 is EDQUOT /* Quota exceeded */

Do you have quotas in use? ( Yes, Quotas in use )

In short what happened was that delay-allocation data found no place to go when
it was time to allocate & flush, presumably due to the quota issue.

----------------

cairoserver kernel: mpage_da_map_blocks block allocation failed for inode 1172923 at logical offset 0 with max blocks 1 with error -122
Message from syslogd@ at Thu Sep 10 22:07:15 2009 ...
cairoserver kernel: This should not happen.!! Data will be lost
------------

root@cairoserver [~]# find /home -inum 1172923 -print
/home/ekramy/mail/elasyl.com/hossam/maildirsize
-----------


fstab configuration 
LABEL=home              /home                   ext4	defaults,noatime,usrquota,barrier=1             1 2

uname -a

Linux cairoserver.cairoserver.com 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux


---------------
box207 kernel: mpage_da_map_blocks block allocation failed for inode 4137265 at logical offset 0 with max blocks 1 with error -122
Message from syslogd@ at Thu Sep 10 22:36:10 2009 ...
box207 kernel: This should not happen.!! Data will be lost

---------------
root@box207 [~]# find /home -inum 4137265 -print
/home/suholc00/public_html/me/modules/Weather/cache/SAXX0017.dat
--------------
/etc/fstab | grep home

LABEL=home               /home                    ext4dev    defaults,noatime,usrquota,barrier=1        1 2
--------------
root@box207 [~]# uname -a
Linux box207.exaservers.com 2.6.18-160.el5 #1 SMP Mon Jul 27 17:28:29 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
--------------

quotas is in use version.
http://downloads.sourceforge.net/linuxquota/quota-3.17.tar.gz


Please Advice ....

Thank You!
Yehia

Comment 1 Eric Sandeen 2009-09-10 20:30:24 UTC
Unfortunately quota support for ext4 is not all there in RHEL5.4.  Upstream worked out most issues in 2.6.30, and RHEL5.4's ext4 codebase is more or less 2.6.29.

Because ext4 is still tech preview in RHEL5.4, I don't expect that this will be addressed until RHEL5.5, but please feel free to talk w/ your support folks.

Just a note, quota tools in RHEL5.4 don't yet support ext4 either, so it's somewhat consistently unavailable.

-Eric

Comment 2 Yehia.Adham 2009-09-10 21:01:26 UTC
Hello,

Thank You Eric as I disabled quotas I'm no longer getting any errors which prove it's a quotas problem with ext4.

Thank You Very Much..

Comment 3 Eric Sandeen 2009-09-14 16:48:14 UTC
*** Bug 523201 has been marked as a duplicate of this bug. ***

Comment 4 procaccia 2009-09-14 19:41:25 UTC
OK https://bugzilla.redhat.com/show_bug.cgi?id=523201 is duplicate of that one so I follow discussion here ...

So indeed ext4 is still tech preview in RHEL5.4, so it is for quotas, 
but what about those alarming "error" messages:
kernel: This should not happen.!! Data will be lost

will data really be lost !?

running quota on ext4 is really harmfull or these messages a just anoying bug messages ?

what do you recommand , should I remove quota on ext4 FS ?, update to a fedora11 source package and recompile it for RHEL 5.4 ?
I think that these messages came since I updated to kernel 2.6.18-164.el5, perhaps I should boot back to previous kernel ?

Quota support is mandatory for our 2000 students ! and running back my ext4 FS to ext3 seems very difficult, moreover, ext4 seems to show real better performances 
any date schedule for RHEL 5.5 ?

thanks.

Comment 5 Eric Sandeen 2009-09-14 19:51:50 UTC
If there is delalloc data, and no space to put it, then yes it will be lost.  This would refer to buffered writes that have not yet made it to disk.

Yes, remove quota from ext4; it's not implemented in the 5.4 tech preview and in fact stock, supported RHEL5 quota userspace will not support it at all.

Earlier RHEL5 kernels do not have better quota support; this is not a regression since then.

Full quota support will come, but it is not complete yet; this is one of the reasons that ext4 is still in tech preview in RHEL5.

Thanks,
-Eric

Comment 6 procaccia 2009-09-15 08:12:05 UTC
Damn, I didn't realized that it was that much "tech preview" :-( 
quota is so much usefull for large scale file server that I ddin't imagine it could be a minor feature not yet implemented ...
Anyway, now that I am in production on this 
is there a soft way to keep quota with minimal risks ?
backport a more recent quota package (fedora-11) to rehl 5.4 ?
come back to the previous kernel (2.6.18-128.1.6.el5) ?
these frightening "Data loss" messages, appears only for users that are over-quota !? or for anyone ?

regards .

Comment 7 Eric Sandeen 2009-09-15 15:54:16 UTC
Tech Preview is used for those technologies that are becoming mature, but which we deem not quite ready for commercial support. Red Hat Enterprise Linux subscribers can test and evaluate these future features designated as Tech Preview.

I didn't think our shipped quota tools recognized ext4; did you already grab a newer quota userspace than is shipped with RHEL5? If so, at that point, it's not a supported RHEL5 configuration at all, I'm afraid.  Or were you able to make stock RHEL5.4 quota userspace manage ext4 quotas...

Anyway, quota + delalloc is a bit tricky, which is why it wasn't ready yet.

The kernel code in RHEL5.4 just doesn't work properly with quotas yet; your best bet for now would be to turn quotas off I think, or move back to ext3 if robust quota support is necessary.

-Eric

Comment 8 procaccia 2009-09-16 13:09:16 UTC
Yesterday I upgraded quota package for RHEL 5.4 from re-compile quota-3.16-7.fc10.src.rpm (I think FC10 is the release close to RHEL 5.4, instead of FC11 or fc12 ). Indeed, In that case "it's
not a supported RHEL5 configuration" ...

anyway, the changelog in the package source show this:

* Thu Oct 30 2008 Ondrej Vasik <ovasik> 1:3.16-6
- fix implementation of ext4 support
 (by Mingming Cao, #469127) 

which seems intersting in my case !, however, If I understand well, here my pb is not with quota package but more with the kernel-2.6.18-164.el5 which doesn't manage correclty quota.

OK, I will probably move back to ext3 ( :-( ), until then, could you let me know if those messages that keeps comming in the console 

 Message from syslogd@ at Wed Sep 16 15:04:52 2009 ...
gizeh kernel: mpage_da_map_blocks block allocation failed for inode 3541321 at logical offset 0 with max blocks 10 with error -122
Message from syslogd@ at Wed Sep 16 15:04:52 2009 ...
gizeh kernel: This should not happen.!! Data will be lost

concerns everyone, or only people over-quota ? how can I know from the error above for example, from which filesystem "inode 3541321" is ?

regards .

Comment 9 Eric Sandeen 2009-09-16 15:09:23 UTC
(In reply to comment #8)

...

> OK, I will probably move back to ext3 ( :-( ), until then, could you let me
> know if those messages that keeps comming in the console 
> 
>  Message from syslogd@ at Wed Sep 16 15:04:52 2009 ...
> gizeh kernel: mpage_da_map_blocks block allocation failed for inode 3541321 at
> logical offset 0 with max blocks 10 with error -122
> Message from syslogd@ at Wed Sep 16 15:04:52 2009 ...
> gizeh kernel: This should not happen.!! Data will be lost
> 
> concerns everyone, or only people over-quota ? 

Likely only people over quota, as they should be the only ones hitting the EDQUOT error, and thus the denied writes.

> how can I know from the error
> above for example, from which filesystem "inode 3541321" is ?

find /filesystem -inum 3541321

-Eric

Comment 10 procaccia 2009-09-17 07:25:59 UTC

> Likely only people over quota, as they should be the only ones hitting the
> EDQUOT error, and thus the denied writes.

Yes, my first guess and test confirmed that it happens for users overquota:

gizeh kernel: mpage_da_map_blocks block allocation failed for inode 3412191 at ...

$ find /disk07 -inum 3412191
/disk07/msc2007/user37933/.mozilla/firefox/hc5ntvtg.default/tabsaver.lst.new

and indeed, that user is overquota:
[root@gizeh /disk07/msc2007]
$ quota -s user37933
Disk quotas for user user37933 (uid 37933):
    Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
/dev/mapper/VolGroup02S2IA-LVVG02Users07
                  536M*   489M    538M    none   14264   50000   55000 

Then, among the hundred inode affected by this kind of error messages every day, I also found some that apparently were not overquota :-( :

Sep 16 18:06:45 gizeh kernel: mpage_da_map_blocks block allocation failed for inode 39419 at logical offset 0 with max blocks 2 with error -122 

[root@gizeh /disk19/artemis]
$ find /disk19 -inum 39419
/disk19/artemis/user40029/3DOD_DATA/abom/ABOM.mc.mp4

[root@gizeh /disk19/artemis]
$ quota -s user40029
Disk quotas for user user40029 (uid 40029): none 

in that case, the user doesn't have quota !
I'am confused ... I'am not sure if I really lose data ?

Quota is essential to us, but integrity of data more essential , so I could set quotas off for a while if support in RHEL comes within few mounths, but if it is not schedule before One year or so ... I should probably move back to ext3.
Any idea of a potential date support ? so that I could choose between quota-off or back to ext3 .

regards .

Comment 11 Ric Wheeler 2009-09-17 11:42:34 UTC
Hi Jehan,

The best advice is to never use "tech preview" code in production deployments. It is a preview feature only and this is our mechanism for letting you evaluate it early.

You should move your ext4 production servers back to ext3 until we provide full support in a release. The timing of our releases and when the feature will migrate from tech preview to full support is not set in stone - it depends on how well the tech preview goes, other release priorities, etc. Unfortunately, this means that we cannot provide you a hard and fast promise for full support in a specific RHEL family.  You can contact your Red Hat account team to weigh in on ext4, we certainly do like to hear from customers about how things are going.

Best regards,

Ric

Comment 12 Eric Sandeen 2011-03-14 18:23:11 UTC
Now that RHEL5.6 has updated ext4 code, and ext4 is out of tech preview, this should be resolved - I did a lot of work with respect to delalloc vs. quotas.  I'm going to close this bug, but if you find that the problem persists in your case, please feel free to contact Red Hat support and we'll revisit the issue.

Thanks,
-Eric


Note You need to log in before you can comment on or make changes to this bug.