Bug 1480501 - heketi crashed when concurrent operations were performed
heketi crashed when concurrent operations were performed
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: heketi (Show other bugs)
cns-3.6
Unspecified Unspecified
unspecified Severity high
: ---
: CNS 3.6
Assigned To: Raghavendra Talur
krishnaram Karthick
:
Depends On:
Blocks: 1445448
  Show dependency treegraph
 
Reported: 2017-08-11 05:27 EDT by krishnaram Karthick
Modified: 2017-10-11 03:09 EDT (History)
8 users (show)

See Also:
Fixed In Version: heketi-5.0.0-10.el7rhgs
Doc Type: Bug Fix
Doc Text:
Previously, performing concurrent operations which refer same Gluster node crashed Heketi. With this fix, no crash is observed when multiple operations are performed referring to the same Gluster node.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-10-11 03:09:46 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
heketi_logs (36.04 KB, text/plain)
2017-08-11 05:30 EDT, krishnaram Karthick
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:2879 normal SHIPPED_LIVE heketi bug fix and enhancement update 2017-10-11 07:07:06 EDT

  None (edit)
Description krishnaram Karthick 2017-08-11 05:27:17 EDT
Description of problem:

Following Two parallel operations were performed on heketi
1) series of 'heketi device delete <>'
2) heketi device disable of the device where volumes are being deleted

Both commands errored and there seems to be a crash of heketi service.

[kubeexec] DEBUG 2017/08/11 08:58:46 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp47-49.lab.eng.blr.redhat.com Pod: glusterfs-t4gvj Command: lvremove -f vg_c0be1577809232e2c2a5e557ece2b050/tp_1806abf11f460f1329e545134355fcea
Result:   Logical volume "brick_1806abf11f460f1329e545134355fcea" successfully removed
  Logical volume "tp_1806abf11f460f1329e545134355fcea" successfully removed
2017/08/11 08:58:58 http: multiple response.WriteHeader calls
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x14dfdf7]
goroutine 2454 [running]:
github.com/heketi/heketi/apps/glusterfs.(*NodeEntry).SetState(0x0, 0xc42021a5a0, 0x2304520, 0xc4204060b0, 0x22fe560, 0xc42033ae80, 0xc420200268, 0x7, 0x48feb2, 0x598d71d2)
        /builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/heketi/apps/glusterfs/node_entry.go:301 +0x57
github.com/heketi/heketi/apps/glusterfs.(*App).NodeSetState.func2(0xed11f68d2, 0x588fcb2, 0x235cee0, 0xc420210f90)
        /builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/heketi/apps/glusterfs/app_node.go:360 +0x80
github.com/heketi/rest.(*AsyncHttpManager).AsyncHttpRedirectFunc.func1(0xc42066cb80, 0xc4204c24b0)
        /builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/rest/asynchttp.go:128 +0xf4
created by github.com/heketi/rest.(*AsyncHttpManager).AsyncHttpRedirectFunc
        /builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/rest/asynchttp.go:138 +0x60
[root@dhcp47-10 ~]# oc logs heketi-1-g1dcn 
Heketi 5.0.0
[kubeexec] WARNING 2017/08/11 08:59:23 Rebalance on volume expansion has been enabled.  This is an EXPERIMENTAL feature
[heketi] INFO 2017/08/11 08:59:23 Loaded kubernetes executor
[heketi] INFO 2017/08/11 08:59:23 Block: Auto Create Block Hosting Volume set to true
[heketi] INFO 2017/08/11 08:59:23 Block: New Block Hosting Volume size 500 GB
[heketi] INFO 2017/08/11 08:59:23 Loaded simple allocator
[heketi] INFO 2017/08/11 08:59:23 GlusterFS Application Loaded
Listening on port 8080


Version-Release number of selected component (if applicable):
heketi-client-5.0.0-7.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
1. create 200 volumes from a same device
2. once all volumes are created, delete all volumes. 
# heketi-cli volume list | awk {'print $1'} | cut -c 4- >> list
# while read id; do heketi-cli volume delete $id; done<list
3. From a different window, disable the device on which the volumes are created

Actual results:
Both the operations errored out
window-1:
=========
Volume 255d8fa3b1fa90e543a420b9a0a0626a deleted
Volume 2bee7175cc381e2d95a4834b66ba10b6 deleted
Volume 2c9faefa80170f05d4f4850578236180 deleted
Volume 2d35619c6bef760a336b957ef182bdad deleted
Volume 2d58b9eb77425d86e9c220a6a5ef389b deleted
Error:
Error:
Error:
Error:

Window-2:
==========
Error:

Expected results:
Both operations should complete seamlessly

Additional info:
Comment 2 krishnaram Karthick 2017-08-11 05:30 EDT
Created attachment 1312046 [details]
heketi_logs
Comment 3 Raghavendra Talur 2017-08-22 16:52:34 EDT
https://github.com/heketi/heketi/pull/839
Comment 7 krishnaram Karthick 2017-09-14 02:21:08 EDT
verified in build - cns-deploy-5.0.0-34.el7rhgs.x86_64

heketi volume create, delete, device remove operations were run concurrently and no crashes were seen.

Moving the bug to verified.
Comment 9 Raghavendra Talur 2017-10-04 11:46:56 EDT
doc text looks good to me
Comment 10 errata-xmlrpc 2017-10-11 03:09:46 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2879

Note You need to log in before you can comment on or make changes to this bug.