Bug 1024316

Summary: glusterd crash seen after rebalance/remove-brick operations
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: RamaKasturi <knarra>
Component: glusterfsAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED ERRATA QA Contact: shylesh <shmohan>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: dpati, dtsang, grajaiya, kaushal, kparthas, mmahoney, nsathyan, pprakash, psriniva, sasundar, sdharane, ssampat, vagarwal, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 2.1.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.52rhs-1.el6rhs Doc Type: Bug Fix
Doc Text:
Previously, when remove-brick or rebalance operations were performed on a volume and subsequently if the volume was stopped and deleted, glusterd would crash upon executing any gluster CLI commands. With this update, glusterd does not crash when the same sequence of commands are executed on a volume.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-25 07:57:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description RamaKasturi 2013-10-29 11:35:37 UTC
Description of problem:
glusterd crash seen in glusterfs-3.4.0.35.1u2rhs-1.el6rhs.x86_64

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-libs-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.35.1u2rhs-1.el6rhs.x86_64
samba-glusterfs-3.6.9-160.3.el6rhs.x86_64


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 RamaKasturi 2013-10-29 12:14:50 UTC
Please find the core dump and gluster log files in the following link

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/rhsc/1024316/

Comment 3 Dusmant 2013-11-22 05:19:54 UTC
I am moving the severity and priority to Urgent, because this bug which used to be seen once in a while, now seen very often and that's a huge risk.

Comment 4 Kaushal 2013-11-22 07:01:46 UTC
If there are new crashes, please provide their sos-reports. It'll help in faster resolution.

Comment 6 krishnan parthasarathi 2013-12-04 10:20:40 UTC
Root cause analysis:
--------------------
Important observations that lead to the resolution.
---------------------------------------------------
1) Backtrace has a rebalance notification call stack of a volume that has been deleted.

2) Glusterd logs indicate that there are two (or more) connections to the rebalance process. This we know from the following,

<log snip>

[2013-12-03 09:48:40.120897] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now
[2013-12-03 09:48:40.120974] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now
[2013-12-03 09:48:43.121833] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now
[2013-12-03 09:48:43.121884] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now
[2013-12-03 09:48:46.122443] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now
[2013-12-03 09:48:46.122514] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now

</log snip>
We see that there are two connections that are attempting to reconnect every 3 secs, in tandem.

3) The following logs assert there is more than on rpc object (read unix domain socket connection) to the same (dead) rebalance process,

<log snip>

====> [2013-12-03 09:48:36.495445] W [socket.c:522:__socket_rwv] 0-management: readv on /var/lib/glusterd/vols/vol1/rebalance/b2461aa6-24bb-4d70-b43b-d6a73ab84698.sock failed (No data available)
====> [2013-12-03 09:48:36.506183] W [socket.c:522:__socket_rwv] 0-management: readv on /var/lib/glusterd/vols/vol1/rebalance/b2461aa6-24bb-4d70-b43b-d6a73ab84698.sock failed (No data available)
[2013-12-03 09:48:36.506238] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=1 total=1
[2013-12-03 09:48:36.506254] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=1 total=1

</log snip>

The arrow mark highlight the evidence we have.

Now for the root cause,
Every time a peer (re)joins the cluster, each member in the cluster, for every new (re)joinee (probably due to network partition or node reboot), checks if it needs to restart daemons for all the started volumes in the cluster.

Rebalance is once such daemon/process that each restarts conditionally. But, the corresponding management connection to rebalance process was being (re)created unconditionally, 'leaking' previously created rpc objects. Now these 'ghost' rpc objects go into reconnect loop, once every 3 seconds. With rebalance process stopped/dead. and the volume itself stopped and deleted, they have a stale reference to the deleted volume. Any access to this reference is a potential segmentation fault (SIGSEGV).
Finally, justice is delivered, glusterd segfaults, and thus we see the crash.

Comment 9 RamaKasturi 2013-12-09 07:39:46 UTC
Following are the steps which were done before the crash, not very sure whether will be able to reproduce issue with the following steps.

1) Had glusterfs-server-3.4.0.44.1u2rhs-1.el6rhs.x86_64 and stopped all the volumes present to upgrade it to glusterfs-server-3.4.0.47.1u2rhs-1.el6rhs.x86_64.

2) updated to glusterfs-server-3.4.0.47.1u2rhs-1.el6rhs.x86_64.

3) started all the volumes which were present.

4) Now ran rebalance on the volume and rebalance failed because glusterd was down in one of the node.

5) Restarted glusterd , and saw the crash.

Comment 10 SATHEESARAN 2013-12-09 10:53:44 UTC
As KP is not around, I took the setup from Ramakasturi and adding the required info from the setup.

1. Crash report from glusterd contains this - 

[2013-12-09 12:37:49.575786] I [glusterd-handler.c:3059:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.70.37.182 (0), ret: 0
[2013-12-09 12:37:49.581673] I [glusterd-sm.c:495:glusterd_ac_send_friend_update] 0-: Added uuid: c52e3602-683a-4ca5-a325-96fe55059693, host: 10.70.37.120
[2013-12-09 12:37:49.581725] I [glusterd-sm.c:495:glusterd_ac_send_friend_update] 0-: Added uuid: 9d6c28cf-f915-43ba-9687-41d3eab0b2a7, host: 10.70.37.177
[2013-12-09 12:37:49.581741] I [glusterd-sm.c:495:glusterd_ac_send_friend_update] 0-: Added uuid: 3717f3d0-1ff9-4353-9025-15248bc0cf90, host: 10.70.37.182
[2013-12-09 12:37:49.586202] I [glusterd-rpc-ops.c:554:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 9d6c28cf-f915-43ba-9687-41d3eab0b2a7
[2013-12-09 12:37:50.023703] I [glusterd-rpc-ops.c:554:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: c52e3602-683a-4ca5-a325-96fe55059693
[2013-12-09 12:37:50.023907] I [glusterd-rpc-ops.c:554:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: c52e3602-683a-4ca5-a325-96fe55059693
[2013-12-09 12:37:50.128810] I [glusterd-rpc-ops.c:554:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 3717f3d0-1ff9-4353-9025-15248bc0cf90
[2013-12-09 12:37:50.979593] I [glusterd-handshake.c:364:__server_event_notify] 0-: received defrag status updated
[2013-12-09 12:37:50.985510] W [socket.c:522:__socket_rwv] 0-management: readv on /var/lib/glusterd/vols/vol_dis/rebalance/b773cf81-230e-4440-b45d-e363aff74f12.sock failed (No data ava
ilable)
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2013-12-09 12:37:50configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.47.1u2rhs
/lib64/libc.so.6[0x37c3632960]
/usr/lib64/glusterfs/3.4.0.47.1u2rhs/xlator/mgmt/glusterd.so(__glusterd_defrag_notify+0x2d0)[0x7fac90b89be0]
/usr/lib64/glusterfs/3.4.0.47.1u2rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7fac90b3a210]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a8)[0x7fac9436d5d8]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x7fac94368df8]
/usr/lib64/glusterfs/3.4.0.47.1u2rhs/rpc-transport/socket.so(+0xa641)[0x7fac8f3e1641]
/usr/lib64/libglusterfs.so.0(+0x62387)[0x7fac945d8387]
/usr/sbin/glusterd(main+0x6c7)[0x4069d7]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x37c361ecdd]
/usr/sbin/glusterd[0x404619]
---------

2. Crash has happened only on one node, 10.70.37.109
3. Timestamp of the crash - time of crash: 2013-12-09 12:37:50

Comment 11 Gowrishankar Rajaiyan 2013-12-11 06:46:12 UTC
Requesting blocker flag for Corbett.

Comment 12 SATHEESARAN 2013-12-20 04:36:20 UTC
Krishnan,

glusterd crash was seen even after the issue RCA'ed as per comment 6 and fixing it.

[There was subsequent crash even after the fix as per comment 9 & comment 10 ]

RCA and fix for this, would greatly help to VERIFY this bug with more cases.

Comment 13 krishnan parthasarathi 2013-12-20 10:06:06 UTC
The crash was observed when steps mentioned in comment#9 were carried out. This is different from the steps that are mentioned in the description of the bug.
Root cause analysis for the crash, whose backtrace is attached in comment#10 is as follows,

When a glusterd (re)joins a cluster, it receives information from each peer about volumes in the cluster. The volume information provided by one or more peers may differ (eg. multiple node failures at different points in time). When this happens, glusterd will 'destroy' the old volume information (volinfo) object and create a new volinfo object with the more recent information. Along with this, it also 'reaps' daemons/processes running on behalf of the older volinfo instance. Rebalance process is one among them.

For every rebalance process, glusterd has a unix domain socket connection with it. This connection object (aka rpc object) has a reference to the volinfo object. When the rebalance connection object lives longer than the (old) volinfo object, it could be referring to free'd memory. This is the reason why we see the crash.

The fix does two things,
- Makes volinfo object ref-counted.
- Rebalance connection object takes a ref' on volinfo before connecting to the rebalance process.

Patches merged:
https://code.engineering.redhat.com/gerrit/#/c/17726/
https://code.engineering.redhat.com/gerrit/#/c/17503/
https://code.engineering.redhat.com/gerrit/#/c/17504/

Comment 14 Pavithra 2013-12-31 04:58:26 UTC
I have edited the doc text, please review and sign off its technical accuracy.

Comment 15 krishnan parthasarathi 2014-01-03 10:48:26 UTC
Pavithra,
The edits look good to me.

Comment 16 shylesh 2014-01-09 11:32:44 UTC
As per discussion with krishnan simplified steps to reproduce the problem is 

Scenario 1
----------
1. created a distributed-replicate volume
2. Run rebalance and remove-bricks on the volume
3. stop the volume and delete the volume
4 Run some gluster commands

result:
----
No crash in glusterd

scenario 2
----------
1. created a distributed volume using 2 node cluster
2. add-brick and ran rebalance on the volume 
3. bring down one of the node
4. while a node is down run volume set command from another node
5. after node comes back run some gluster commands

result:
------
No crash in glusterd


verified on 3.4.0.54rhs-2.el6rhs.x86_64

Comment 18 errata-xmlrpc 2014-02-25 07:57:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html