Red Hat Bugzilla – Bug 1024316
glusterd crash seen after rebalance/remove-brick operations
Last modified: 2015-11-03 18:05:55 EST
Description of problem:
glusterd crash seen in glusterfs-22.214.171.124.1u2rhs-1.el6rhs.x86_64
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Please find the core dump and gluster log files in the following link
I am moving the severity and priority to Urgent, because this bug which used to be seen once in a while, now seen very often and that's a huge risk.
If there are new crashes, please provide their sos-reports. It'll help in faster resolution.
Root cause analysis:
Important observations that lead to the resolution.
1) Backtrace has a rebalance notification call stack of a volume that has been deleted.
2) Glusterd logs indicate that there are two (or more) connections to the rebalance process. This we know from the following,
[2013-12-03 09:48:40.120897] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now
[2013-12-03 09:48:40.120974] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now
[2013-12-03 09:48:43.121833] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now
[2013-12-03 09:48:43.121884] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now
[2013-12-03 09:48:46.122443] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now
[2013-12-03 09:48:46.122514] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now
We see that there are two connections that are attempting to reconnect every 3 secs, in tandem.
3) The following logs assert there is more than on rpc object (read unix domain socket connection) to the same (dead) rebalance process,
====> [2013-12-03 09:48:36.495445] W [socket.c:522:__socket_rwv] 0-management: readv on /var/lib/glusterd/vols/vol1/rebalance/b2461aa6-24bb-4d70-b43b-d6a73ab84698.sock failed (No data available)
====> [2013-12-03 09:48:36.506183] W [socket.c:522:__socket_rwv] 0-management: readv on /var/lib/glusterd/vols/vol1/rebalance/b2461aa6-24bb-4d70-b43b-d6a73ab84698.sock failed (No data available)
[2013-12-03 09:48:36.506238] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=1 total=1
[2013-12-03 09:48:36.506254] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=1 total=1
The arrow mark highlight the evidence we have.
Now for the root cause,
Every time a peer (re)joins the cluster, each member in the cluster, for every new (re)joinee (probably due to network partition or node reboot), checks if it needs to restart daemons for all the started volumes in the cluster.
Rebalance is once such daemon/process that each restarts conditionally. But, the corresponding management connection to rebalance process was being (re)created unconditionally, 'leaking' previously created rpc objects. Now these 'ghost' rpc objects go into reconnect loop, once every 3 seconds. With rebalance process stopped/dead. and the volume itself stopped and deleted, they have a stale reference to the deleted volume. Any access to this reference is a potential segmentation fault (SIGSEGV).
Finally, justice is delivered, glusterd segfaults, and thus we see the crash.
Following are the steps which were done before the crash, not very sure whether will be able to reproduce issue with the following steps.
1) Had glusterfs-server-126.96.36.199.1u2rhs-1.el6rhs.x86_64 and stopped all the volumes present to upgrade it to glusterfs-server-188.8.131.52.1u2rhs-1.el6rhs.x86_64.
2) updated to glusterfs-server-184.108.40.206.1u2rhs-1.el6rhs.x86_64.
3) started all the volumes which were present.
4) Now ran rebalance on the volume and rebalance failed because glusterd was down in one of the node.
5) Restarted glusterd , and saw the crash.
As KP is not around, I took the setup from Ramakasturi and adding the required info from the setup.
1. Crash report from glusterd contains this -
[2013-12-09 12:37:49.575786] I [glusterd-handler.c:3059:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.70.37.182 (0), ret: 0
[2013-12-09 12:37:49.581673] I [glusterd-sm.c:495:glusterd_ac_send_friend_update] 0-: Added uuid: c52e3602-683a-4ca5-a325-96fe55059693, host: 10.70.37.120
[2013-12-09 12:37:49.581725] I [glusterd-sm.c:495:glusterd_ac_send_friend_update] 0-: Added uuid: 9d6c28cf-f915-43ba-9687-41d3eab0b2a7, host: 10.70.37.177
[2013-12-09 12:37:49.581741] I [glusterd-sm.c:495:glusterd_ac_send_friend_update] 0-: Added uuid: 3717f3d0-1ff9-4353-9025-15248bc0cf90, host: 10.70.37.182
[2013-12-09 12:37:49.586202] I [glusterd-rpc-ops.c:554:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 9d6c28cf-f915-43ba-9687-41d3eab0b2a7
[2013-12-09 12:37:50.023703] I [glusterd-rpc-ops.c:554:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: c52e3602-683a-4ca5-a325-96fe55059693
[2013-12-09 12:37:50.023907] I [glusterd-rpc-ops.c:554:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: c52e3602-683a-4ca5-a325-96fe55059693
[2013-12-09 12:37:50.128810] I [glusterd-rpc-ops.c:554:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 3717f3d0-1ff9-4353-9025-15248bc0cf90
[2013-12-09 12:37:50.979593] I [glusterd-handshake.c:364:__server_event_notify] 0-: received defrag status updated
[2013-12-09 12:37:50.985510] W [socket.c:522:__socket_rwv] 0-management: readv on /var/lib/glusterd/vols/vol_dis/rebalance/b773cf81-230e-4440-b45d-e363aff74f12.sock failed (No data ava
signal received: 11
time of crash: 2013-12-09 12:37:50configuration details:
package-string: glusterfs 220.127.116.11.1u2rhs
2. Crash has happened only on one node, 10.70.37.109
3. Timestamp of the crash - time of crash: 2013-12-09 12:37:50
Requesting blocker flag for Corbett.
glusterd crash was seen even after the issue RCA'ed as per comment 6 and fixing it.
[There was subsequent crash even after the fix as per comment 9 & comment 10 ]
RCA and fix for this, would greatly help to VERIFY this bug with more cases.
The crash was observed when steps mentioned in comment#9 were carried out. This is different from the steps that are mentioned in the description of the bug.
Root cause analysis for the crash, whose backtrace is attached in comment#10 is as follows,
When a glusterd (re)joins a cluster, it receives information from each peer about volumes in the cluster. The volume information provided by one or more peers may differ (eg. multiple node failures at different points in time). When this happens, glusterd will 'destroy' the old volume information (volinfo) object and create a new volinfo object with the more recent information. Along with this, it also 'reaps' daemons/processes running on behalf of the older volinfo instance. Rebalance process is one among them.
For every rebalance process, glusterd has a unix domain socket connection with it. This connection object (aka rpc object) has a reference to the volinfo object. When the rebalance connection object lives longer than the (old) volinfo object, it could be referring to free'd memory. This is the reason why we see the crash.
The fix does two things,
- Makes volinfo object ref-counted.
- Rebalance connection object takes a ref' on volinfo before connecting to the rebalance process.
I have edited the doc text, please review and sign off its technical accuracy.
The edits look good to me.
As per discussion with krishnan simplified steps to reproduce the problem is
1. created a distributed-replicate volume
2. Run rebalance and remove-bricks on the volume
3. stop the volume and delete the volume
4 Run some gluster commands
No crash in glusterd
1. created a distributed volume using 2 node cluster
2. add-brick and ran rebalance on the volume
3. bring down one of the node
4. while a node is down run volume set command from another node
5. after node comes back run some gluster commands
No crash in glusterd
verified on 18.104.22.168rhs-2.el6rhs.x86_64
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.