Bug 245264
Summary: | kernel: gfs_tool: page allocation failure. order:4, mode:0xd0 | |||
---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Anand <anandab> | |
Component: | GFS-kernel | Assignee: | Robert Peterson <rpeterso> | |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | |
Severity: | high | Docs Contact: | ||
Priority: | low | |||
Version: | 4 | CC: | bmarzins, cfeist, edamato, rpeterso, swhiteho | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | i386 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | GFS-kernel-2.6.9-86.1.el4 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 479421 (view as bug list) | Environment: | ||
Last Closed: | 2011-02-16 16:34:47 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 479421 |
Description
Anand
2007-06-22 00:17:10 UTC
It looks like you're hitting a page allocation failure in the kernel. You might be running out of memory. A gfs tunable 'lockdump_size' determines the minimum amount of kernel memory requested with each 'gfs_tool getXXX' command. The default is 32 pages (131072 bytes). I don't know if this is set very high on your system. Do a 'gfs_tool gettune /gfs/mount/point | grep lockdump_size' to get the current setting. You can set it to a lower value using 'gfs_tool settune /gfs/mount/point lockdump_size 16384' or something. See if that helps. Oops. I'm sorry, I was wrong about the minimum amount of kernel memory requested. The default is 32 pages, but it's _not_ the minimum. If you do a 'gfs_tool counters' , you only request 4096 bytes, so your machine must be really out of memory. I don't think changing the lockdump_size is going to help you. The output of the gfs_tool commad is gfs_tool gettune /mts/dbc2 | grep lockdump lockdump_size = 131072 I also have my drop_count set to 0. The machine has 16GB of memory, does the error have something to do with a large drop_count setting? The memory is *very* fragemented based on the memory output. Any kmalloc size >= 64K will most likely fail, even the system still has large amount of memory inside smaller slab cache buckets: Normal: 23724*4kB 12497*8kB 1235*16kB 75*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 217032kB This is a known linux VMM design issue. For newer 2.6 based kernel, there is command allowing you to purge slab cache. For RHEL 4 (2.6.9 base), we're out of luck here. Hopefully "umount", then "remount" could alleviate the symptoms but there is no guarantee. If this is a repeated GFS issue, GFS RHEL 4.5-base RPM has a glock trimming patch that would allow you to trim glock percentage (that will subsequently return GFS slab cache back to the central pool). Give it a try to see whether it will help: shell> gfs_tool settune /mnt/gfs1 glock_purge 30 (this will purge 30% of glock back to centrol pool on a 5 second interval). Ok... take a look at bug 229461 Without this fix, you're still looking at allocating 64K of contiguous kernel memory which might not be available in all cases. There're still places in the gfs_tool code that request 64K for various ioctl calls. I'm gonna take a look at those and commit fixes. Wendy:I am using dlm not glock. But your analysis was great and i seems to fit the prolems we are seeing. Abhijith:Will you be providing instructions for getting the updates? Fixing gfs_tool to use smaller buffer (but takes longer time to complete) is a good thing to do. However, when a system has fragemented memory like that, the general performance would be bad. VM tuning knowledge and tools are a must for system adminitrators. In simple words, when gfs_tool lock dump (that takes less buffer) is fixed, your problem will be shifted to other parts of the system, Certain level of VM tuning should be required to avoid getting system into this state. GFS "glock" component calls DLM to carry out inter-node locking. There are one-to-one correspondences between glock and dlm lock. When glocks start to accumulate to an unacceptable level, so do dlm locks. Actually it is interesting to read the glock and dlm comments :) .. Aware that GFS actually uses "glock" to do locking (and then glock calls DLM) ? The simple fix for this looks like using vmalloc rather than kmalloc. Abhi, was this fixed in the end? If so please close this bug. Abhi, please can you turn this allocation into a vmalloc and do the same for 5.4 & upstream too? Its a simple fix and the current code is too ghastly for words. Abhi, what is the current status of this bug? I haven't gotten around to doing the vmalloc changes for this. I will get to it as soon as I'm done with 496716 Abhi, is this done yet? Pushed to RHEL4 git branch. Verified that the patch for this bugzilla is included in GFS-kernel-2.6.9-87.3.el4. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0276.html |