Bug 763012 (GLUSTER-1280) - gf_mem_set_acct_info goes into spinlock busyloop, never returns
Summary: gf_mem_set_acct_info goes into spinlock busyloop, never returns
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-1280
Product: GlusterFS
Classification: Community
Component: nfs
Version: 3.1-alpha
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Shehjar Tikoo
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-08-04 13:09 UTC by Shehjar Tikoo
Modified: 2015-12-01 16:45 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: RTP
Mount Type: nfs
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
NFS volume file (382 bytes, application/octet-stream)
2010-08-04 10:45 UTC, Shehjar Tikoo
no flags Details
CPU profile, open using kcachegrind (200.17 KB, application/octet-stream)
2010-08-04 10:46 UTC, Shehjar Tikoo
no flags Details

Description Shehjar Tikoo 2010-08-04 10:45:18 UTC
Created attachment 268 [details]
Patched glxMesa Spec for the new CVS sources

Comment 1 Shehjar Tikoo 2010-08-04 10:46:01 UTC
Created attachment 269 [details]
lvs.cf file

Comment 2 Shehjar Tikoo 2010-08-04 10:46:16 UTC
This is seen on qa3.

Comment 3 Shehjar Tikoo 2010-08-04 13:09:31 UTC
When exporting a simple posix volume through NFS, on doing a showount -e localhost  on the NFS client/nfs server, glusterfsd goes into a cpu usage loop on a spinlock hanging the whole process. 

The lock hung on is in: gf_mem_set_acct_info

Comment 4 Shehjar Tikoo 2010-08-05 04:42:37 UTC
Downgrading to Normal because the work-around is to disable memory-accounting by:

export GLUSTERFS_DISABLE_MEM_ACCT=1

And then running glusterfsd.

Comment 5 Shehjar Tikoo 2010-08-12 08:48:38 UTC
This probably caused by a bug in the memory accounting  init phase in NFS translator.

Comment 6 Shehjar Tikoo 2010-08-31 09:02:01 UTC
The bug occurs because before entering the NFS translator, the nfs-rpc layer does not set THIS, leaving it pointing to some other translator. The locking code sees some crap lock structure because the memory type to be indexed is crap because the translator which gets accessed while allocating memory in mount3.c, for mnt3-export, is not NFS but something else.

Comment 7 Vijay Bellur 2010-08-31 11:44:59 UTC
PATCH: http://patches.gluster.com/patch/4426 in master (nfsrpc: Introduce THIS-setting support to fix mem-accounting)

Comment 8 Vijay Bellur 2010-08-31 11:45:03 UTC
PATCH: http://patches.gluster.com/patch/4423 in master (nfs: Set actorxl to enable setting THIS to nfsx)

Comment 9 Shehjar Tikoo 2010-09-01 02:54:14 UTC
Regression test:

1. Export a simple posix volume through gnfs.

2. Start the nfs server.

3. At the nfs client:

$ showmount -e <server>

This will hang the nfs server in the mnt3-export code path. Verify this through log.

Next step.

4. Kill the nfs server daemon.

5. Run the following command:
export GLUSTERFS_DISABLE_MEM_ACCT=1

6. Start gnfs again.

7. Run the following command on the client:

$ showmount -e <server>

This time, the nfs server daemon will not hang. showmount will report the output correctly.

If with the patches here, step 3 still hangs, there is a regression.


Note You need to log in before you can comment on or make changes to this bug.