Ideally we'd like to alloc GFP_NOFAIL completely, but its not obvious how. We should at least allow lists of pages to be used here rather than a contiguous allocation. ------------[ cut here ]------------ WARNING: at mm/page_alloc.c:1331 get_page_from_freelist+0x8a1/0x910() Hardware name: PowerEdge R710 Modules linked in: gfs2 ebtable_nat ebtables x_tables bridge stp dlm af_packet ] Pid: 3632, comm: bonnie++ Not tainted 2.6.39-rc4+ #214 Call Trace: [<ffffffff8108fada>] warn_slowpath_common+0x7a/0xb0 [<ffffffff8108fb25>] warn_slowpath_null+0x15/0x20 [<ffffffff8112cd51>] get_page_from_freelist+0x8a1/0x910 [<ffffffff81682af9>] ? sub_preempt_count+0xa9/0xe0 [<ffffffff8112d0de>] __alloc_pages_nodemask+0x13e/0x980 [<ffffffff8116d8ea>] kmem_getpages+0x5a/0x170 [<ffffffff8116f4b7>] cache_grow+0x2e7/0x310 [<ffffffff8116f731>] cache_alloc_refill+0x251/0x290 [<ffffffff81170569>] __kmalloc+0x239/0x280 [<ffffffffa0229d73>] ? gfs2_rlist_alloc+0x23/0x80 [gfs2] [<ffffffffa0229d73>] gfs2_rlist_alloc+0x23/0x80 [gfs2] [<ffffffffa0203e7a>] do_strip+0x20a/0x490 [gfs2] [<ffffffffa0218ca0>] ? gfs2_meta_read+0xd0/0x160 [gfs2] [<ffffffff810b253a>] ? wake_up_bit+0x2a/0x40 [<ffffffffa02041c7>] recursive_scan.clone.23+0xc7/0x1d0 [gfs2] [<ffffffffa0204223>] recursive_scan.clone.23+0x123/0x1d0 [gfs2] [<ffffffff811707ad>] ? kmem_cache_alloc_trace+0x1fd/0x230 [<ffffffffa02043d8>] trunc_dealloc+0x108/0x150 [gfs2] [<ffffffff810b2590>] ? autoremove_wake_function+0x40/0x40 [<ffffffffa0210652>] ? gfs2_glock_wait+0x42/0x50 [gfs2] [<ffffffffa0212070>] ? gfs2_glock_nq+0x320/0x480 [gfs2] [<ffffffff810b2590>] ? autoremove_wake_function+0x40/0x40 [<ffffffffa0205feb>] gfs2_file_dealloc+0xb/0x10 [gfs2] [<ffffffffa022a85d>] gfs2_evict_inode+0x22d/0x510 [gfs2] [<ffffffffa022a715>] ? gfs2_evict_inode+0xe5/0x510 [gfs2] [<ffffffff8119e6e1>] evict+0x81/0x180 [<ffffffff8119e944>] iput+0x104/0x1f0 [<ffffffff8119302c>] do_unlinkat+0x10c/0x1b0 [<ffffffff810ee432>] ? audit_syscall_entry+0x1c2/0x1f0 [<ffffffff813d1dae>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81194651>] sys_unlink+0x11/0x20 [<ffffffff81686612>] system_call_fastpath+0x16/0x1b ---[ end trace 2035b90cd25b602e ]--- page_alloc.c line 1331 reads: if (unlikely(gfp_flags & __GFP_NOFAIL)) { /* * __GFP_NOFAIL is not to be used in new code. * * All __GFP_NOFAIL callers should be fixed so that they * properly detect and handle allocation failures. * * We most definitely don't want callers attempting to * allocate greater than order-1 page units with * __GFP_NOFAIL. */ WARN_ON_ONCE(order > 1); }
One solution to this is to get rid of the rlist code entirely. This should be possible if we can remove the two users of this code. These are: o Deallocation of dir leaf blocks o Deallocation of indirect block tree o Deallocation of indirect xattr tree blocks It should be possible to redesign the code to work in the opposite way to the allocation code in order to only lock a single rgrp at a time and thus resolve the locking order issue.
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle. Changing version to '19'. (As we did not run this process for some time, it could affect also pre-Fedora 19 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.11.1-200.fc19. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.12.6-200.fc19. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20. If you experience different issues, please open a new bug report for those.
Indirect block deallocations now no longer use the rlist code. That leaves two callers: gfs2_dir_exhash_dealloc() and ea_dealloc_indirect() The direcotry code can be easily updated to be more efficient, although it is very likely to be dealing with many scattered blocks. The algorithm should be something along the lines of: 1. Find first leaf block to deallocate 2. Lock that rgrp 3. See how many other leaf blocks are in the same rgrp, by (a) scanning down the chain until we hit a block in another rgrp and (b) repeating that for every hash chain 4. Deallocating the leaf blocks in question 5. Updating the hash chain headers 6. Repeating until all hash chains are empty That way each transactions (one per rgrp) ensures that everything remains consistent while the hash chains are deallocated. The iteration means that we no longer need to use the rlist code. A similar approach could be used for the EA blocks too, and then the rlist code can be removed entirely.