Bug 763012 (GLUSTER-1280)

Summary: gf_mem_set_acct_info goes into spinlock busyloop, never returns
Product: [Community] GlusterFS Reporter: Shehjar Tikoo <shehjart>
Component: nfsAssignee: Shehjar Tikoo <shehjart>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 3.1-alphaCC: gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: nfs
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Attachments:
Description Flags
NFS volume file
none
CPU profile, open using kcachegrind none

Description Shehjar Tikoo 2010-08-04 06:45:18 EDT
Created attachment 268 [details]
Patched glxMesa Spec for the new CVS sources
Comment 1 Shehjar Tikoo 2010-08-04 06:46:01 EDT
Created attachment 269 [details]
lvs.cf file
Comment 2 Shehjar Tikoo 2010-08-04 06:46:16 EDT
This is seen on qa3.
Comment 3 Shehjar Tikoo 2010-08-04 09:09:31 EDT
When exporting a simple posix volume through NFS, on doing a showount -e localhost  on the NFS client/nfs server, glusterfsd goes into a cpu usage loop on a spinlock hanging the whole process. 

The lock hung on is in: gf_mem_set_acct_info
Comment 4 Shehjar Tikoo 2010-08-05 00:42:37 EDT
Downgrading to Normal because the work-around is to disable memory-accounting by:

export GLUSTERFS_DISABLE_MEM_ACCT=1

And then running glusterfsd.
Comment 5 Shehjar Tikoo 2010-08-12 04:48:38 EDT
This probably caused by a bug in the memory accounting  init phase in NFS translator.
Comment 6 Shehjar Tikoo 2010-08-31 05:02:01 EDT
The bug occurs because before entering the NFS translator, the nfs-rpc layer does not set THIS, leaving it pointing to some other translator. The locking code sees some crap lock structure because the memory type to be indexed is crap because the translator which gets accessed while allocating memory in mount3.c, for mnt3-export, is not NFS but something else.
Comment 7 Vijay Bellur 2010-08-31 07:44:59 EDT
PATCH: http://patches.gluster.com/patch/4426 in master (nfsrpc: Introduce THIS-setting support to fix mem-accounting)
Comment 8 Vijay Bellur 2010-08-31 07:45:03 EDT
PATCH: http://patches.gluster.com/patch/4423 in master (nfs: Set actorxl to enable setting THIS to nfsx)
Comment 9 Shehjar Tikoo 2010-08-31 22:54:14 EDT
Regression test:

1. Export a simple posix volume through gnfs.

2. Start the nfs server.

3. At the nfs client:

$ showmount -e <server>

This will hang the nfs server in the mnt3-export code path. Verify this through log.

Next step.

4. Kill the nfs server daemon.

5. Run the following command:
export GLUSTERFS_DISABLE_MEM_ACCT=1

6. Start gnfs again.

7. Run the following command on the client:

$ showmount -e <server>

This time, the nfs server daemon will not hang. showmount will report the output correctly.

If with the patches here, step 3 still hangs, there is a regression.