Bug 763012 - (GLUSTER-1280) gf_mem_set_acct_info goes into spinlock busyloop, never returns
gf_mem_set_acct_info goes into spinlock busyloop, never returns
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: nfs (Show other bugs)
3.1-alpha
All Linux
low Severity medium
: ---
: ---
Assigned To: Shehjar Tikoo
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-08-04 09:09 EDT by Shehjar Tikoo
Modified: 2015-12-01 11:45 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: ---
Regression: RTP
Mount Type: nfs
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
NFS volume file (382 bytes, application/octet-stream)
2010-08-04 06:45 EDT, Shehjar Tikoo
no flags Details
CPU profile, open using kcachegrind (200.17 KB, application/octet-stream)
2010-08-04 06:46 EDT, Shehjar Tikoo
no flags Details

  None (edit)
Description Shehjar Tikoo 2010-08-04 06:45:18 EDT
Created attachment 268 [details]
Patched glxMesa Spec for the new CVS sources
Comment 1 Shehjar Tikoo 2010-08-04 06:46:01 EDT
Created attachment 269 [details]
lvs.cf file
Comment 2 Shehjar Tikoo 2010-08-04 06:46:16 EDT
This is seen on qa3.
Comment 3 Shehjar Tikoo 2010-08-04 09:09:31 EDT
When exporting a simple posix volume through NFS, on doing a showount -e localhost  on the NFS client/nfs server, glusterfsd goes into a cpu usage loop on a spinlock hanging the whole process. 

The lock hung on is in: gf_mem_set_acct_info
Comment 4 Shehjar Tikoo 2010-08-05 00:42:37 EDT
Downgrading to Normal because the work-around is to disable memory-accounting by:

export GLUSTERFS_DISABLE_MEM_ACCT=1

And then running glusterfsd.
Comment 5 Shehjar Tikoo 2010-08-12 04:48:38 EDT
This probably caused by a bug in the memory accounting  init phase in NFS translator.
Comment 6 Shehjar Tikoo 2010-08-31 05:02:01 EDT
The bug occurs because before entering the NFS translator, the nfs-rpc layer does not set THIS, leaving it pointing to some other translator. The locking code sees some crap lock structure because the memory type to be indexed is crap because the translator which gets accessed while allocating memory in mount3.c, for mnt3-export, is not NFS but something else.
Comment 7 Vijay Bellur 2010-08-31 07:44:59 EDT
PATCH: http://patches.gluster.com/patch/4426 in master (nfsrpc: Introduce THIS-setting support to fix mem-accounting)
Comment 8 Vijay Bellur 2010-08-31 07:45:03 EDT
PATCH: http://patches.gluster.com/patch/4423 in master (nfs: Set actorxl to enable setting THIS to nfsx)
Comment 9 Shehjar Tikoo 2010-08-31 22:54:14 EDT
Regression test:

1. Export a simple posix volume through gnfs.

2. Start the nfs server.

3. At the nfs client:

$ showmount -e <server>

This will hang the nfs server in the mnt3-export code path. Verify this through log.

Next step.

4. Kill the nfs server daemon.

5. Run the following command:
export GLUSTERFS_DISABLE_MEM_ACCT=1

6. Start gnfs again.

7. Run the following command on the client:

$ showmount -e <server>

This time, the nfs server daemon will not hang. showmount will report the output correctly.

If with the patches here, step 3 still hangs, there is a regression.

Note You need to log in before you can comment on or make changes to this bug.