Bug 842955

Summary: "gluster volume status inode" command blocks glusterd and glusterfsd
Product: [Community] GlusterFS Reporter: Joe Julian <joe>
Component: coreAssignee: Kaushal <kaushal>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: 3.3.0CC: amarts, gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 853211 (view as bug list) Environment:
Last Closed: 2013-07-24 17:56:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 853211    

Description Joe Julian 2012-07-25 05:12:53 UTC
Description of problem:
I have a volume with about 10 clients and about 500 open fds. I ran "gluster volume status home inode" and all my clients hit ping-timeout. I had to kill glusterfsd for the bricks associated with that volume and glusterd on all my servers and restart glusterd to get my volume back.

Version-Release number of selected component (if applicable):
3.3.0

How reproducible:
Couldn't risk it being more than once. This was on production servers.

Steps to Reproduce:
1. Have a busy volume
2. gluster volume status $VOL inode
  
Actual results:
All the clients hit ping-timeout. All other cli commands hang after that.

Additional info:
Had to kill the bricks and glusterd and restart glusterd (which restarted the bricks) on all the servers to get the volume back.

Comment 1 Amar Tumballi 2012-07-25 05:51:20 UTC
Kaushal,

I guess this is the same issue you figured out earlier... dict serialize and unserialize taking lot of time.

The reason for timeout is because these brick-ops from glusterd to glusterfsd are handled in main thread itself, hence blocking more n/w request (mostly from glusterfs) from reaching glusterfsd. Need to create a thread and handle these brick-ops, thus allowing other calls to pass through to below xlators.

Comment 2 Amar Tumballi 2012-10-23 10:45:32 UTC
http://review.gluster.org/4096 should fix the issue.

Comment 3 Vijay Bellur 2012-11-19 08:35:19 UTC
CHANGE: http://review.gluster.org/4096 (glusterfsd-mgmt: make brick-ops work in synctask) merged in master by Vijay Bellur (vbellur)