Red Hat Bugzilla – Bug 842955
"gluster volume status inode" command blocks glusterd and glusterfsd
Last modified: 2013-07-24 13:56:33 EDT
Description of problem:
I have a volume with about 10 clients and about 500 open fds. I ran "gluster volume status home inode" and all my clients hit ping-timeout. I had to kill glusterfsd for the bricks associated with that volume and glusterd on all my servers and restart glusterd to get my volume back.
Version-Release number of selected component (if applicable):
Couldn't risk it being more than once. This was on production servers.
Steps to Reproduce:
1. Have a busy volume
2. gluster volume status $VOL inode
All the clients hit ping-timeout. All other cli commands hang after that.
Had to kill the bricks and glusterd and restart glusterd (which restarted the bricks) on all the servers to get the volume back.
I guess this is the same issue you figured out earlier... dict serialize and unserialize taking lot of time.
The reason for timeout is because these brick-ops from glusterd to glusterfsd are handled in main thread itself, hence blocking more n/w request (mostly from glusterfs) from reaching glusterfsd. Need to create a thread and handle these brick-ops, thus allowing other calls to pass through to below xlators.
http://review.gluster.org/4096 should fix the issue.
CHANGE: http://review.gluster.org/4096 (glusterfsd-mgmt: make brick-ops work in synctask) merged in master by Vijay Bellur (email@example.com)