763012 – (GLUSTER-1280) gf_mem_set_acct_info goes into spinlock busyloop, never returns

Bug 763012 (GLUSTER-1280) - gf_mem_set_acct_info goes into spinlock busyloop, never returns

Summary: gf_mem_set_acct_info goes into spinlock busyloop, never returns

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	GLUSTER-1280
Product:	GlusterFS
Classification:	Community
Component:	nfs
Sub Component:
Version:	3.1-alpha
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Shehjar Tikoo
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-08-04 13:09 UTC by Shehjar Tikoo
Modified:	2015-12-01 16:45 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:
Regression:	RTP
Mount Type:	nfs
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
NFS volume file (382 bytes, application/octet-stream) 2010-08-04 10:45 UTC, Shehjar Tikoo	no flags	Details
CPU profile, open using kcachegrind (200.17 KB, application/octet-stream) 2010-08-04 10:46 UTC, Shehjar Tikoo	no flags	Details
View All

Description Shehjar Tikoo 2010-08-04 10:45:18 UTC

Created attachment 268 [details]
Patched glxMesa Spec for the new CVS sources

Comment 1 Shehjar Tikoo 2010-08-04 10:46:01 UTC

Created attachment 269 [details]
lvs.cf file

Comment 2 Shehjar Tikoo 2010-08-04 10:46:16 UTC

This is seen on qa3.

Comment 3 Shehjar Tikoo 2010-08-04 13:09:31 UTC

When exporting a simple posix volume through NFS, on doing a showount -e localhost  on the NFS client/nfs server, glusterfsd goes into a cpu usage loop on a spinlock hanging the whole process. 

The lock hung on is in: gf_mem_set_acct_info

Comment 4 Shehjar Tikoo 2010-08-05 04:42:37 UTC

Downgrading to Normal because the work-around is to disable memory-accounting by:

export GLUSTERFS_DISABLE_MEM_ACCT=1

And then running glusterfsd.

Comment 5 Shehjar Tikoo 2010-08-12 08:48:38 UTC

This probably caused by a bug in the memory accounting  init phase in NFS translator.

Comment 6 Shehjar Tikoo 2010-08-31 09:02:01 UTC

The bug occurs because before entering the NFS translator, the nfs-rpc layer does not set THIS, leaving it pointing to some other translator. The locking code sees some crap lock structure because the memory type to be indexed is crap because the translator which gets accessed while allocating memory in mount3.c, for mnt3-export, is not NFS but something else.

Comment 7 Vijay Bellur 2010-08-31 11:44:59 UTC

PATCH: http://patches.gluster.com/patch/4426 in master (nfsrpc: Introduce THIS-setting support to fix mem-accounting)

Comment 8 Vijay Bellur 2010-08-31 11:45:03 UTC

PATCH: http://patches.gluster.com/patch/4423 in master (nfs: Set actorxl to enable setting THIS to nfsx)

Comment 9 Shehjar Tikoo 2010-09-01 02:54:14 UTC

Regression test:

1. Export a simple posix volume through gnfs.

2. Start the nfs server.

3. At the nfs client:

$ showmount -e <server>

This will hang the nfs server in the mnt3-export code path. Verify this through log.

Next step.

4. Kill the nfs server daemon.

5. Run the following command:
export GLUSTERFS_DISABLE_MEM_ACCT=1

6. Start gnfs again.

7. Run the following command on the client:

$ showmount -e <server>

This time, the nfs server daemon will not hang. showmount will report the output correctly.

If with the patches here, step 3 still hangs, there is a regression.

Note You need to log in before you can comment on or make changes to this bug.