Bug 1278992 - ceph-osd aborts during 'XFS: possible memory allocation deadlock in kmem_alloc (mode:0x8250)' when directory block size of 64k used [NEEDINFO]
ceph-osd aborts during 'XFS: possible memory allocation deadlock in kmem_allo...
Status: NEW
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel (Show other bugs)
7.1
All Linux
high Severity high
: rc
: 7.4
Assigned To: fs-maint
Zorro Lang
:
Depends On:
Blocks: 1203710 1298243 1313485 1438583 1445812 1295577
  Show dependency treegraph
 
Reported: 2015-11-06 18:47 EST by Kyle Squizzato
Modified: 2017-08-02 14:56 EDT (History)
20 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
jkachuck: needinfo? (fs-maint)


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1597523 None None None Never

  None (edit)
Description Kyle Squizzato 2015-11-06 18:47:21 EST
Description of problem:
ceph-osd daemons begin to suicide during XFS memory allocation deadlocks.  The following messages are printed to /var/log/messages: 

XFS: possible memory allocation deadlock in kmem_alloc (mode:0x8250)  

This appears to occur when a directory block size of 64k used: 

 -n size=65536

Version-Release number of selected component (if applicable):
3.10.0-123.20.1.el7.x86_64 

How reproducible:
Not sure how the issue can be reproduced, however the issue appears to occur when Ceph OSD's are under heavy load in a Ceph (Firefly) cluster.


Actual results:
XFS deadlocks and ceph-osd's suicide. 


Expected results:
No XFS deadlock or ceph-osd suicide's.
Comment 2 Brian Foster 2015-11-07 10:03:00 EST
Just as a quick first step experiment, I formatted an '-n size=64k' fs and ran a quick file creation/deletion loop with a debug printk() in xlog_cil_insert_format_items() to dump the size of any >PAGE_SIZE allocation requests. I very quickly see allocs up to around 64k, some even larger:

...
xlog_cil_insert_format_items(243): buf_size 64984 (nbytes 64880 niovecs 3)
xlog_cil_insert_format_items(243): buf_size 65112 (nbytes 65008 niovecs 3)
xlog_cil_insert_format_items(243): buf_size 65368 (nbytes 65264 niovecs 3)
xlog_cil_insert_format_items(243): buf_size 65496 (nbytes 65392 niovecs 3)
xlog_cil_insert_format_items(243): buf_size 65728 (nbytes 65640 niovecs 2)
...

From that perspective, it doesn't seem that surprising to see allocation failures from kmem_zalloc() calls here if we assume memory fragmentation is an eventuality. Further, we're in KM_NOFS context which I assume precludes things like writeback, etc., but even if we weren't, those are still order 4 or larger sized requests.

My first question is, without having yet dug into the core context for these allocation sizes, is there any reason for not using something like kmem_zalloc_large() here (assuming we preserve the KM_SLEEP behavior)?
Comment 3 Dave Chinner 2015-11-09 05:41:18 EST
Why are is the filesystem configured to use 64k directory block sizes? Are they putting millions of files in a single directory? If not, then just use the default directory block size and the problem goes away....

-Dave.
Comment 14 Eric Sandeen 2016-06-30 12:27:11 EDT
This is a known issue w/ 64k dirs, and there is no current solution, though workarounds exist (i.e. don't mkfs w/ that option).

For now moving to 7.4, though AFAIK there has been no upstream activity on this either, so 7.4 is not necessarily likely, either.

Note You need to log in before you can comment on or make changes to this bug.