Bug 229601

Summary: gfs_tool fails to report counters
Product: Red Hat Enterprise Linux 5 Reporter: Robert Peterson <rpeterso>
Component: gfs-utilsAssignee: Robert Peterson <rpeterso>
Status: CLOSED ERRATA QA Contact: GFS Bugs <gfs-bugs>
Severity: low Docs Contact:
Priority: medium    
Version: 5.0CC: kanderso
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0576 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-07 17:57:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 229461    
Bug Blocks:    

Description Robert Peterson 2007-02-21 23:44:37 UTC
+++ This bug was initially created as a clone of Bug #229461 +++
Crosswrite to RHEL5.

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1)
Gecko/20061204 Firefox/2.0.0.1

Description of problem:
RH4ES, kernel 2.6.9-42.0.3.ELsmp.  Two node cluster with common storage.
System was left running with heavy load over night.  gfs lock count 
100000...300000. Load avg 2..40.  Morning check revealed that gfs_tool
failed to report counters on one node.  gfs_tool worked on second node.
gfs mount working.  Load was dropped to nill and after several hours (4) 
gfs_tool started reporting counters again without reboot or remount.

gfs_tool counters reported 
[root@p20-is03 etc]# gfs_tool counters /usr/db/mail/
gfs_tool: can't get counters: Cannot allocate memory

See additional information for gfs_tool error message and dmesg printout.


Version-Release number of selected component (if applicable):
GFS-6.1.5-0

How reproducible:
Didn't try


Steps to Reproduce:
1. Two node setup
2. Generate heavy load with lock changing between 100000..300000
3. Leave running

Actual Results:


Expected Results:


Additional info:
[root@p20-is03 etc]# gfs_tool counters /usr/db/mail/
gfs_tool: can't get counters: Cannot allocate memory
[root@p20-is03 etc]# uptime
 06:14:20 up 15:52,  1 user,  load average: 37.53, 52.79, 58.09
[root@p20-is03 etc]# gfs_tool counters /usr/db/mail/
gfs_tool: can't get counters: Cannot allocate memory
[root@p20-is03 etc]# ps -ef | grep
[root@p20-is03 etc]# date
Wed Feb 21 06:14:29 UTC 2007
[root@p20-is03 etc]# free
             total       used       free     shared    buffers     cached
Mem:       4025548    1892064    2133484          0       2264    1316976
-/+ buffers/cache:     572824    3452724
Swap:      2040212          0    2040212
[root@p20-is03 etc]# date
Wed Feb 21 06:14:39 UTC 2007
[root@p20-is03 etc]#

dmesg showed:

mptctl: Registered with Fusion MPT base driver
mptctl: /dev/mptctl @ (major,minor=10,220)
gfs_tool: page allocation failure. order:4, mode:0xd0
 [<c0144257>] __alloc_pages+0x28b/0x29d
 [<c0144281>] __get_free_pages+0x18/0x24
 [<c0146d5c>] kmem_getpages+0x1c/0xbb
 [<c01478aa>] cache_grow+0xab/0x138
 [<c0147a9c>] cache_alloc_refill+0x165/0x19d
 [<c0147e70>] __kmalloc+0x76/0x88
 [<f8bf734c>] gi_skeleton+0x4c/0xd3 [gfs]
 [<f8bf7dbb>] gi_get_counters+0x0/0xb72 [gfs]
 [<f8bfb15d>] gfs_ioctl_i+0x1b4/0x507 [gfs]
 [<c015a300>] sys_chown+0x10/0x3c
 [<f8c06e34>] gfs_ioctl+0x75/0x7f [gfs]
 [<c016add6>] sys_ioctl+0x227/0x269
 [<c02d47cb>] syscall_call+0x7/0xb
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
cpu 1 hot: low 2, high 6, batch 1
cpu 1 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16

Free pages:     2079780kB (2064832kB HighMem)
Active:212965 inactive:156261 dirty:11699 writeback:0 unstable:0 free:519945
slab:112464 mapped:29893 pagetables:905
DMA free:12548kB min:16kB low:32kB high:48kB active:0kB inactive:0kB
present:16384kB pages_scanned:12315 all_unreclaimable? yes
protections[]: 0 0 0
Normal free:2400kB min:928kB low:1856kB high:2784kB active:16kB
inactive:404408kB present:901120kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
HighMem free:2064832kB min:512kB low:1024kB high:1536kB active:851844kB
inactive:220636kB present:3145664kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 3*4kB 3*8kB 4*16kB 3*32kB 3*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB
2*4096kB = 12548kB
Normal: 144*4kB 104*8kB 62*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 2400kB
HighMem: 11738*4kB 14885*8kB 16677*16kB 3915*32kB 776*64kB 179*128kB 40*256kB
15*512kB 3*1024kB 0*2048kB 345*4096kB = 2064832kB
Swap cache: add 0, delete 0, find 0/0, race 0+0
0 bounce buffer pages
Free swap:       2040212kB
1015792 pages of RAM
786416 pages of HIGHMEM
9405 reserved pages
357352 pages shared
0 pages swap cached
gfs_tool: page allocation failure. order:4, mode:0xd0
 [<c0144257>] __alloc_pages+0x28b/0x29d
 [<c0144281>] __get_free_pages+0x18/0x24
 [<c0146d5c>] kmem_getpages+0x1c/0xbb
 [<c01478aa>] cache_grow+0xab/0x138
 [<c0147a9c>] cache_alloc_refill+0x165/0x19d
 [<c0147e70>] __kmalloc+0x76/0x88
 [<f8bf734c>] gi_skeleton+0x4c/0xd3 [gfs]
 [<f8bf7dbb>] gi_get_counters+0x0/0xb72 [gfs]
 [<f8bfb15d>] gfs_ioctl_i+0x1b4/0x507 [gfs]
 [<c015a300>] sys_chown+0x10/0x3c
 [<f8c06e34>] gfs_ioctl+0x75/0x7f [gfs]
 [<c016add6>] sys_ioctl+0x227/0x269
 [<c02d47cb>] syscall_call+0x7/0xb
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
cpu 1 hot: low 2, high 6, batch 1
cpu 1 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16

Free pages:     2118108kB (2098880kB HighMem)
Active:208441 inactive:151319 dirty:11702 writeback:0 unstable:0 free:529527
slab:112390 mapped:29893 pagetables:905
DMA free:12548kB min:16kB low:32kB high:48kB active:0kB inactive:0kB
present:16384kB pages_scanned:12321 all_unreclaimable? yes
protections[]: 0 0 0
Normal free:6680kB min:928kB low:1856kB high:2784kB active:28kB
inactive:400532kB present:901120kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
HighMem free:2098880kB min:512kB low:1024kB high:1536kB active:833736kB
inactive:204744kB present:3145664kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 3*4kB 3*8kB 4*16kB 3*32kB 3*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB
2*4096kB = 12548kB
Normal: 1200*4kB 111*8kB 62*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 6680kB
HighMem: 18992*4kB 15514*8kB 16677*16kB 3915*32kB 776*64kB 179*128kB 40*256kB
15*512kB 3*1024kB 0*2048kB 345*4096kB = 2098880kB
Swap cache: add 0, delete 0, find 0/0, race 0+0
0 bounce buffer pages
Free swap:       2040212kB
1015792 pages of RAM
786416 pages of HIGHMEM
9405 reserved pages
347901 pages shared
0 pages swap cached

-- Additional comment from rpeterso on 2007-02-21 09:34 EST --
I'll work on this one.


-- Additional comment from rpeterso on 2007-02-21 11:33 EST --
Created an attachment (id=148501)
RHEL4 patch to fix the problem

Well, gfs_tool was requesting a 64KB chunk of memory for parsing
the counters.  That got sent up to the kernel level, which requested
it in contiguous kernel memory.  When the system is under heavy
stress, the kernel memory can be tight, and so in some circumstances,
there wasn't a 64K chunk of contiguous memory to spare.
In reality, the whole thing only needs a little over 1K to do this work,
so 64K was about a 50X overkill.

This patch backs the memory requirement down to 4K, which is one page
on most systems.  It should therefore always work unless the system is
completely out of memory.

Comment 2 Robert Peterson 2007-02-22 00:07:28 UTC
Fix tested on latest RHEL5 build on trin-10 and committed to CVS at
RHEL5 and HEAD branches.


Comment 3 Kiersten (Kerri) Anderson 2007-04-23 17:39:03 UTC
Fixing product name. Cluster Suite components were integrated into Enterprise
Linux for verion 5.0.

Comment 6 errata-xmlrpc 2007-11-07 17:57:54 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0576.html