838195 – glusterfs page allocation failure

Bug 838195 - glusterfs page allocation failure

Summary: glusterfs page allocation failure

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.0
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Raghavendra Bhat
QA Contact:	Sudhir D
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	840811
TreeView+	depends on / blocked

Reported:	2012-07-07 02:55 UTC by Harshavardhana
Modified:	2018-11-28 19:59 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	840811 (view as bug list)
Environment:
Last Closed:	2012-07-27 15:53:12 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Kernel messages - page allocation failure (55.40 KB, text/plain) 2012-07-07 02:58 UTC, Harshavardhana	no flags	Details
r1g02 (55.40 KB, text/plain) 2012-07-07 06:12 UTC, Harshavardhana	no flags	Details
kernel_trace.tar.gz (3.60 KB, application/x-gzip) 2012-07-09 22:36 UTC, Justin Bautista	no flags	Details
program to reproduce page allocation failure (877 bytes, text/x-csrc) 2012-07-10 07:28 UTC, Raghavendra Bhat	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2012:1120	0	normal	SHIPPED_LIVE	Red Hat Storage 2.0 bug fix update	2012-07-27 19:52:24 UTC

Description Harshavardhana 2012-07-07 02:55:25 UTC

Description of problem:

glusterfsd: page allocation failure. order:4, mode:0xd0
Pid: 7046, comm: glusterfsd Not tainted 2.6.32-220.17.1.el6.x86_64 #1
Call Trace:
 [<ffffffff8112415f>] ? __alloc_pages_nodemask+0x77f/0x940
 [<ffffffff8115e102>] ? kmem_getpages+0x62/0x170
 [<ffffffff8115ed1a>] ? fallback_alloc+0x1ba/0x270
 [<ffffffff8115e76f>] ? cache_grow+0x2cf/0x320
 [<ffffffff8115ea99>] ? ____cache_alloc_node+0x99/0x160
 [<ffffffff8119b15d>] ? getxattr+0x14d/0x170
 [<ffffffff8115f6c9>] ? __kmalloc+0x189/0x220
 [<ffffffff8119b15d>] ? getxattr+0x14d/0x170
 [<ffffffff81183465>] ? putname+0x35/0x50
 [<ffffffff81187042>] ? user_path_at+0x62/0xa0
 [<ffffffff81195b00>] ? mntput_no_expire+0x30/0x110
 [<ffffffff8119b1d5>] ? sys_lgetxattr+0x55/0x80
 [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
Mem-Info:
Node 0 Normal per-cpu:
CPU    0: hi:  186, btch:  31 u

Version-Release number of selected component (if applicable):

3.3.0qa45

Comment 1 Harshavardhana 2012-07-07 02:58:19 UTC

Created attachment 596708 [details]
Kernel messages - page allocation failure

Comment 3 Vijay Bellur 2012-07-07 05:51:44 UTC

Can we get a sosreport and gluster logs from the node where this was seen?

Comment 4 Harshavardhana 2012-07-07 05:53:50 UTC

Available in the attachments here - https://bugzilla.redhat.com/show_bug.cgi?id=838199

Comment 5 Harshavardhana 2012-07-07 06:12:13 UTC

Created attachment 596731 [details]
r1g02

Comment 6 Justin Bautista 2012-07-09 22:36:16 UTC

Created attachment 597164 [details]
kernel_trace.tar.gz

Uploading kernel trace for 'D' state process

Comment 7 Raghavendra Bhat 2012-07-10 07:27:06 UTC

I was able to reproduce the issue with the attached program. Its related to the bug 838204.

Problem is because of this.

In posix_getxattr we do getxattr and use size variable (type size_t) to get the return value. Eventhough getxattr has returned -1 upon failure (if that particular xattr is not there) size variable will be set to a huge valie (size_t type). And later we again do getxattr with size as an argument which results in a huge memory request inside the kernel.

size = sys_lgetxattr (real_path, key, NULL, 0);
		if (size <= 0) {
                        op_errno = errno;
	                if ((op_errno == ENOTSUP) || (op_errno == ENOSYS)) {
                                GF_LOG_OCCASIONALLY (gf_posix_xattr_enotsup_log,
                                                     this->name, GF_LOG_WARNING,
                                                     "Extended attributes not "
		                                     "supported (try remounting"
                                                     " brick with 'user_xattr' "
	                                             "flag)");
                        } else {
	                        gf_log (this->name, GF_LOG_ERROR,
		                        "getxattr failed on %s: %s (%s)",
		                        real_path, key, strerror (op_errno));
		        }

		        goto done;
		}
                value = GF_CALLOC (size + 1, sizeof(char), gf_posix_mt_char);
                if (!value) {
			op_ret = -1;
 goto out;
		}
                op_ret = sys_lgetxattr (real_path, key, value, size);
	        if (op_ret == -1) {
                        op_errno = errno;
	                goto out;
		}


 7062.322895] a.out: page allocation failure: order:4, mode:0xc0d0
[ 7062.322898] Pid: 25965, comm: a.out Not tainted 3.4.4-3.fc17.x86_64 #1
[ 7062.322899] Call Trace:
[ 7062.322904]  [<ffffffff81125d9b>] warn_alloc_failed+0xeb/0x150
[ 7062.322908]  [<ffffffff815ebda8>] ? __alloc_pages_direct_compact+0x17f/0x191
[ 7062.322910]  [<ffffffff81129855>] __alloc_pages_nodemask+0x665/0x950
[ 7062.322928]  [<ffffffffa04c9de0>] ? xfs_attr_shortform_getvalue+0x90/0x130 [xfs]
[ 7062.322931]  [<ffffffff81161ca0>] alloc_pages_current+0xb0/0x120
[ 7062.322933]  [<ffffffff81125404>] __get_free_pages+0x14/0x50
[ 7062.322934]  [<ffffffff8116c1f9>] kmalloc_order_trace+0x39/0xb0
[ 7062.322937]  [<ffffffff8118d5c5>] ? complete_walk+0x75/0xf0
[ 7062.322939]  [<ffffffff8116c62c>] __kmalloc+0x16c/0x180
[ 7062.322941]  [<ffffffff811a2518>] getxattr+0x88/0x180
[ 7062.322943]  [<ffffffff8116cc00>] ? kmem_cache_free+0x20/0x100
[ 7062.322944]  [<ffffffff8118be03>] ? putname+0x33/0x50
[ 7062.322946]  [<ffffffff811908f8>] ? user_path_at_empty+0x68/0xa0
[ 7062.322948]  [<ffffffff8114c547>] ? do_brk+0x267/0x390
[ 7062.322951]  [<ffffffff8119e11c>] ? mntput_no_expire+0x2c/0x120
[ 7062.322952]  [<ffffffff811a3121>] sys_getxattr+0x51/0x70
[ 7062.322955]  [<ffffffff815fc8a9>] system_call_fastpath+0x16/0x1b
[ 7062.322956] Mem-Info:
[ 7062.322957] Node 0 DMA per-cpu:
[ 7062.322958] CPU    0: hi:    0, btch:   1 usd:   0
[ 7062.322959] CPU    1: hi:    0, btch:   1 usd:   0
[ 7062.322960] CPU    2: hi:    0, btch:   1 usd:   0
[ 7062.322961] CPU    3: hi:    0, btch:   1 usd:   0
[ 7062.322962] Node 0 DMA32 per-cpu:
[ 7062.322963] CPU    0: hi:  186, btch:  31 usd:   0
[ 7062.322964] CPU    1: hi:  186, btch:  31 usd:   0
[ 7062.322964] CPU    2: hi:  186, btch:  31 usd:   0
[ 7062.322965] CPU    3: hi:  186, btch:  31 usd:   0
[ 7062.322966] Node 0 Normal per-cpu:
[ 7062.322967] CPU    0: hi:  186, btch:  31 usd:   0
[ 7062.322968] CPU    1: hi:  186, btch:  31 usd:   0
[ 7062.322969] CPU    2: hi:  186, btch:  31 usd:   0
[ 7062.322970] CPU    3: hi:  186, btch:  31 usd:   0
[ 7062.322972] active_anon:1606565 inactive_anon:290683 isolated_anon:0
[ 7062.322972]  active_file:8362 inactive_file:11629 isolated_file:0
[ 7062.322973]  unevictable:0 dirty:33 writeback:15741 unstable:0
[ 7062.322973]  free:25310 slab_reclaimable:18021 slab_unreclaimable:13081
[ 7062.322974]  mapped:9767 shmem:34982 pagetables:13854 bounce:0
[ 7062.322975] Node 0 DMA free:15904kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15648kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 7062.322980] lowmem_reserve[]: 0 3413 7924 7924
[ 7062.322982] Node 0 DMA32 free:47056kB min:29056kB low:36320kB high:43584kB active_anon:2747596kB inactive_anon:549724kB active_file:20288kB inactive_file:30644kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3495808kB mlocked:0kB dirty:52kB writeback:21192kB mapped:26836kB shmem:45020kB slab_reclaimable:47840kB slab_unreclaimable:13428kB kernel_stack:1104kB pagetables:20856kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 7062.322987] lowmem_reserve[]: 0 0 4510 4510
[ 7062.322989] Node 0 Normal free:38280kB min:38392kB low:47988kB high:57588kB active_anon:3678664kB inactive_anon:613008kB active_file:13160kB inactive_file:15872kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:4618656kB mlocked:0kB dirty:80kB writeback:41772kB mapped:12232kB shmem:94908kB slab_reclaimable:24244kB slab_unreclaimable:38896kB kernel_stack:1928kB pagetables:34560kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 7062.322994] lowmem_reserve[]: 0 0 0 0
[ 7062.322995] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15904kB
[ 7062.323000] Node 0 DMA32: 410*4kB 588*8kB 345*16kB 583*32kB 205*64kB 29*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 47352kB
[ 7062.323004] Node 0 Normal: 37*4kB 68*8kB 209*16kB 197*32kB 101*64kB 36*128kB 20*256kB 16*512kB 4*1024kB 0*2048kB 0*4096kB = 38820kB
[ 7062.323009] 78688 total pagecache pages
[ 7062.323010] 23533 pages in swap cache
[ 7062.323011] Swap cache stats: add 969539, delete 946006, find 6752/8570
[ 7062.323011] Free swap  = 4377892kB
[ 7062.323012] Total swap = 8191996kB
[ 7062.334198] 2090480 pages RAM
[ 7062.334199] 75678 pages reserved
[ 7062.334200] 97488 pages shared
[ 7062.334201] 1939058 pages non-shared

Comment 8 Raghavendra Bhat 2012-07-10 07:28:13 UTC

Created attachment 597227 [details]
program to reproduce page allocation failure

Comment 9 Raghavendra Bhat 2012-07-10 11:46:37 UTC

Have sent the patch for the issue.
http://review.gluster.com/#change,3640
Its for master branch. One it gets accepted will backport it to release-3.3 branch.

Comment 10 Harshavardhana 2012-07-10 22:18:49 UTC

My question here is that if getxattr fails, what is the perspective experience from Object Storage side? will that observer 404 during that call?? as well?

Comment 11 Vijay Bellur 2012-07-12 05:26:59 UTC

CHANGE: http://review.gluster.com/3640 (storage/posix: handle getxattr failures gracefully) merged in master by Anand Avati (avati)

Comment 12 Christopher J Suleski 2012-07-13 20:30:25 UTC

We're getting near the end of North America hours on Friday.

Do we think that we'll have this patch available sometime later today or over the weekend? I just want to set expectations appropriately for the customer and GSS's FTS engagement.

If you think the patched rpm will be available soon GSS will monitor the case and BZ over the weekend. 

Otherwise, we will not actively monitor the BZ for an update over the weekend.

Thanks!
Chris

Comment 13 Vijay Bellur 2012-07-17 15:26:23 UTC

CHANGE: http://review.gluster.com/3673 (storage/posix: use ssize_t variable to get the return value of getxattr) merged in master by Anand Avati (avati)

Comment 14 Vijay Bellur 2012-07-17 20:59:56 UTC

CHANGE: http://review.gluster.com/3681 (storage/posix: use the size returned by getxattr to allocate memory) merged in master by Anand Avati (avati)

Comment 15 Vijay Bellur 2012-07-18 08:10:52 UTC

CHANGE: http://review.gluster.com/3658 (storage/posix: handle getxattr failures gracefully) merged in release-3.3 by Vijay Bellur (vbellur)

Comment 16 Saurabh 2012-07-24 05:36:13 UTC

I tried to find similar test case after applying the patch and the same call trace for page allocation failure has not happened, though bugs 842206 and 842230 related to page allocation issue, but these bugs were found only on one node of the cluster.

Comment 17 Harshavardhana 2012-07-24 08:20:20 UTC

(In reply to comment #16)
> I tried to find similar test case after applying the patch and the same call
> trace for page allocation failure has not happened, though bugs 842206 and
> 842230 related to page allocation issue, but these bugs were found only on
> one node of the cluster.

this whole paragraph is confusing.  can you explain them as individual remarks?

Comment 18 Saurabh 2012-07-24 08:43:25 UTC

Harsha,

  I should have put it properly earlier itself, hence rephrasing the things again,

"I tried to execute similar test case after applying the patch and the same call trace for page allocation failure, as mentioned in this bug-838195 has not happened again. Though bugs 842206 and 842230 related to page allocation issue were filed as call traces were different.

Mentioning these bugs for information, as these two issues were found while verifying the fix for this bug-838195. 

The above two issues happened to be seen only on one node of the cluster. The RHS cluster is of VMs and load on a particular node was more than other nodes."

Comment 19 Junaid 2012-07-25 10:32:47 UTC

The issue seems to be because of low memory(RAM) on the VM. The VM is not
able to with stand the load that is applied on it and hence the errors.

Comment 21 errata-xmlrpc 2012-07-27 15:53:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-1120.html

Note You need to log in before you can comment on or make changes to this bug.