Bug 1480501 - heketi crashed when concurrent operations were performed
Summary: heketi crashed when concurrent operations were performed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: cns-3.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: CNS 3.6
Assignee: Raghavendra Talur
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks: 1445448
TreeView+ depends on / blocked
 
Reported: 2017-08-11 09:27 UTC by krishnaram Karthick
Modified: 2019-01-12 13:37 UTC (History)
9 users (show)

Fixed In Version: heketi-5.0.0-10.el7rhgs
Doc Type: Bug Fix
Doc Text:
Previously, performing concurrent operations which refer same Gluster node crashed Heketi. With this fix, no crash is observed when multiple operations are performed referring to the same Gluster node.
Clone Of:
Environment:
Last Closed: 2017-10-11 07:09:46 UTC
Embargoed:


Attachments (Terms of Use)
heketi_logs (36.04 KB, text/plain)
2017-08-11 09:30 UTC, krishnaram Karthick
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:2879 0 normal SHIPPED_LIVE heketi bug fix and enhancement update 2017-10-11 11:07:06 UTC

Description krishnaram Karthick 2017-08-11 09:27:17 UTC
Description of problem:

Following Two parallel operations were performed on heketi
1) series of 'heketi device delete <>'
2) heketi device disable of the device where volumes are being deleted

Both commands errored and there seems to be a crash of heketi service.

[kubeexec] DEBUG 2017/08/11 08:58:46 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp47-49.lab.eng.blr.redhat.com Pod: glusterfs-t4gvj Command: lvremove -f vg_c0be1577809232e2c2a5e557ece2b050/tp_1806abf11f460f1329e545134355fcea
Result:   Logical volume "brick_1806abf11f460f1329e545134355fcea" successfully removed
  Logical volume "tp_1806abf11f460f1329e545134355fcea" successfully removed
2017/08/11 08:58:58 http: multiple response.WriteHeader calls
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x14dfdf7]
goroutine 2454 [running]:
github.com/heketi/heketi/apps/glusterfs.(*NodeEntry).SetState(0x0, 0xc42021a5a0, 0x2304520, 0xc4204060b0, 0x22fe560, 0xc42033ae80, 0xc420200268, 0x7, 0x48feb2, 0x598d71d2)
        /builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/heketi/apps/glusterfs/node_entry.go:301 +0x57
github.com/heketi/heketi/apps/glusterfs.(*App).NodeSetState.func2(0xed11f68d2, 0x588fcb2, 0x235cee0, 0xc420210f90)
        /builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/heketi/apps/glusterfs/app_node.go:360 +0x80
github.com/heketi/rest.(*AsyncHttpManager).AsyncHttpRedirectFunc.func1(0xc42066cb80, 0xc4204c24b0)
        /builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/rest/asynchttp.go:128 +0xf4
created by github.com/heketi/rest.(*AsyncHttpManager).AsyncHttpRedirectFunc
        /builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/rest/asynchttp.go:138 +0x60
[root@dhcp47-10 ~]# oc logs heketi-1-g1dcn 
Heketi 5.0.0
[kubeexec] WARNING 2017/08/11 08:59:23 Rebalance on volume expansion has been enabled.  This is an EXPERIMENTAL feature
[heketi] INFO 2017/08/11 08:59:23 Loaded kubernetes executor
[heketi] INFO 2017/08/11 08:59:23 Block: Auto Create Block Hosting Volume set to true
[heketi] INFO 2017/08/11 08:59:23 Block: New Block Hosting Volume size 500 GB
[heketi] INFO 2017/08/11 08:59:23 Loaded simple allocator
[heketi] INFO 2017/08/11 08:59:23 GlusterFS Application Loaded
Listening on port 8080


Version-Release number of selected component (if applicable):
heketi-client-5.0.0-7.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
1. create 200 volumes from a same device
2. once all volumes are created, delete all volumes. 
# heketi-cli volume list | awk {'print $1'} | cut -c 4- >> list
# while read id; do heketi-cli volume delete $id; done<list
3. From a different window, disable the device on which the volumes are created

Actual results:
Both the operations errored out
window-1:
=========
Volume 255d8fa3b1fa90e543a420b9a0a0626a deleted
Volume 2bee7175cc381e2d95a4834b66ba10b6 deleted
Volume 2c9faefa80170f05d4f4850578236180 deleted
Volume 2d35619c6bef760a336b957ef182bdad deleted
Volume 2d58b9eb77425d86e9c220a6a5ef389b deleted
Error:
Error:
Error:
Error:

Window-2:
==========
Error:

Expected results:
Both operations should complete seamlessly

Additional info:

Comment 2 krishnaram Karthick 2017-08-11 09:30:07 UTC
Created attachment 1312046 [details]
heketi_logs

Comment 3 Raghavendra Talur 2017-08-22 20:52:34 UTC
https://github.com/heketi/heketi/pull/839

Comment 7 krishnaram Karthick 2017-09-14 06:21:08 UTC
verified in build - cns-deploy-5.0.0-34.el7rhgs.x86_64

heketi volume create, delete, device remove operations were run concurrently and no crashes were seen.

Moving the bug to verified.

Comment 9 Raghavendra Talur 2017-10-04 15:46:56 UTC
doc text looks good to me

Comment 10 errata-xmlrpc 2017-10-11 07:09:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2879


Note You need to log in before you can comment on or make changes to this bug.