Bug 1480501

Summary: heketi crashed when concurrent operations were performed
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: krishnaram Karthick <kramdoss>
Component: heketiAssignee: Raghavendra Talur <rtalur>
Status: CLOSED ERRATA QA Contact: krishnaram Karthick <kramdoss>
Severity: high Docs Contact:
Priority: unspecified    
Version: cns-3.6CC: asriram, hchiramm, mliyazud, pprakash, rhs-bugs, rtalur, srmukher, sselvan, storage-qa-internal
Target Milestone: ---   
Target Release: CNS 3.6   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: heketi-5.0.0-10.el7rhgs Doc Type: Bug Fix
Doc Text:
Previously, performing concurrent operations which refer same Gluster node crashed Heketi. With this fix, no crash is observed when multiple operations are performed referring to the same Gluster node.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-11 07:09:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1445448    
Attachments:
Description Flags
heketi_logs none

Description krishnaram Karthick 2017-08-11 09:27:17 UTC
Description of problem:

Following Two parallel operations were performed on heketi
1) series of 'heketi device delete <>'
2) heketi device disable of the device where volumes are being deleted

Both commands errored and there seems to be a crash of heketi service.

[kubeexec] DEBUG 2017/08/11 08:58:46 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp47-49.lab.eng.blr.redhat.com Pod: glusterfs-t4gvj Command: lvremove -f vg_c0be1577809232e2c2a5e557ece2b050/tp_1806abf11f460f1329e545134355fcea
Result:   Logical volume "brick_1806abf11f460f1329e545134355fcea" successfully removed
  Logical volume "tp_1806abf11f460f1329e545134355fcea" successfully removed
2017/08/11 08:58:58 http: multiple response.WriteHeader calls
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x14dfdf7]
goroutine 2454 [running]:
github.com/heketi/heketi/apps/glusterfs.(*NodeEntry).SetState(0x0, 0xc42021a5a0, 0x2304520, 0xc4204060b0, 0x22fe560, 0xc42033ae80, 0xc420200268, 0x7, 0x48feb2, 0x598d71d2)
        /builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/heketi/apps/glusterfs/node_entry.go:301 +0x57
github.com/heketi/heketi/apps/glusterfs.(*App).NodeSetState.func2(0xed11f68d2, 0x588fcb2, 0x235cee0, 0xc420210f90)
        /builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/heketi/apps/glusterfs/app_node.go:360 +0x80
github.com/heketi/rest.(*AsyncHttpManager).AsyncHttpRedirectFunc.func1(0xc42066cb80, 0xc4204c24b0)
        /builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/rest/asynchttp.go:128 +0xf4
created by github.com/heketi/rest.(*AsyncHttpManager).AsyncHttpRedirectFunc
        /builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/rest/asynchttp.go:138 +0x60
[root@dhcp47-10 ~]# oc logs heketi-1-g1dcn 
Heketi 5.0.0
[kubeexec] WARNING 2017/08/11 08:59:23 Rebalance on volume expansion has been enabled.  This is an EXPERIMENTAL feature
[heketi] INFO 2017/08/11 08:59:23 Loaded kubernetes executor
[heketi] INFO 2017/08/11 08:59:23 Block: Auto Create Block Hosting Volume set to true
[heketi] INFO 2017/08/11 08:59:23 Block: New Block Hosting Volume size 500 GB
[heketi] INFO 2017/08/11 08:59:23 Loaded simple allocator
[heketi] INFO 2017/08/11 08:59:23 GlusterFS Application Loaded
Listening on port 8080


Version-Release number of selected component (if applicable):
heketi-client-5.0.0-7.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
1. create 200 volumes from a same device
2. once all volumes are created, delete all volumes. 
# heketi-cli volume list | awk {'print $1'} | cut -c 4- >> list
# while read id; do heketi-cli volume delete $id; done<list
3. From a different window, disable the device on which the volumes are created

Actual results:
Both the operations errored out
window-1:
=========
Volume 255d8fa3b1fa90e543a420b9a0a0626a deleted
Volume 2bee7175cc381e2d95a4834b66ba10b6 deleted
Volume 2c9faefa80170f05d4f4850578236180 deleted
Volume 2d35619c6bef760a336b957ef182bdad deleted
Volume 2d58b9eb77425d86e9c220a6a5ef389b deleted
Error:
Error:
Error:
Error:

Window-2:
==========
Error:

Expected results:
Both operations should complete seamlessly

Additional info:

Comment 2 krishnaram Karthick 2017-08-11 09:30:07 UTC
Created attachment 1312046 [details]
heketi_logs

Comment 3 Raghavendra Talur 2017-08-22 20:52:34 UTC
https://github.com/heketi/heketi/pull/839

Comment 7 krishnaram Karthick 2017-09-14 06:21:08 UTC
verified in build - cns-deploy-5.0.0-34.el7rhgs.x86_64

heketi volume create, delete, device remove operations were run concurrently and no crashes were seen.

Moving the bug to verified.

Comment 9 Raghavendra Talur 2017-10-04 15:46:56 UTC
doc text looks good to me

Comment 10 errata-xmlrpc 2017-10-11 07:09:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2879