Bug 1596035
Summary: | On a 4 node setup heketi block volume creation fails when a node is powered off | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | vinutha <vinug> |
Component: | heketi | Assignee: | John Mulligan <jmulligan> |
Status: | CLOSED ERRATA | QA Contact: | vinutha <vinug> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | cns-3.10 | CC: | hchiramm, jmulligan, pprakash, rhs-bugs, rtalur, sankarshan, sselvan, storage-qa-internal, vinug |
Target Milestone: | --- | ||
Target Release: | CNS 3.10 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-09-12 09:23:45 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1568862 |
Description
vinutha
2018-06-28 06:23:13 UTC
As expected not able to login to pod hosted on the powered off node # oc rsh glusterfs-storage-2x6qg Error from server: error dialing backend: dial tcp 10.70.46.29:10250: getsockopt: no route to host -- process error messages # for i in {1..20} ; do heketi-cli blockvolume create --name=b-vol-$i --size=5 ; sleep 10; done Error: Unable to execute command on glusterfs-storage-fwh6t: Error: Failed to allocate new block volume: No space Error: Failed to allocate new block volume: No space Error: Failed to allocate new block volume: No space Error: Failed to allocate new block volume: No space Error: Failed to allocate new block volume: No space Error: Failed to allocate new block volume: No space Error: Failed to allocate new block volume: No space Error: Failed to allocate new block volume: No space --- heketi log snip --------------------- [kubeexec] DEBUG 2018/06/28 04:40:48 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp46-210.lab.eng.blr.redhat.com Pod: glusterfs-storage-rqcpl Command: gluster --mode=script volume stop vol_5d6a205b63f2455ba8a825d96102020a force Result: volume stop: vol_5d6a205b63f2455ba8a825d96102020a: success [heketi] WARNING 2018/06/28 04:40:49 failed to delete volume 5d6a205b63f2455ba8a825d96102020a via dhcp46-210.lab.eng.blr.redhat.com: Unable to delete volume vol_5d6a205b63f2455ba8a825d96102020a: Unable to execute command on glusterfs-storage-rqcpl: volume delete: vol_5d6a205b63f2455ba8a825d96102020a: failed: Some of the peers are down [kubeexec] ERROR 2018/06/28 04:40:49 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to run command [gluster --mode=script volume delete vol_5d6a205b63f2455ba8a825d96102020a] on glusterfs-storage-rqcpl: Err[command terminated with exit code 1]: Stdout []: Stderr [volume delete: vol_5d6a205b63f2455ba8a825d96102020a: failed: Some of the peers are down ] [cmdexec] ERROR 2018/06/28 04:40:49 /src/github.com/heketi/heketi/executors/cmdexec/volume.go:153: Unable to delete volume vol_5d6a205b63f2455ba8a825d96102020a: Unable to execute command on glusterfs-storage-rqcpl: volume delete: vol_5d6a205b63f2455ba8a825d96102020a: failed: Some of the peers are down [heketi] ERROR 2018/06/28 04:40:49 /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:372: failed to delete volume in cleanup: no hosts available (4 total) [heketi] ERROR 2018/06/28 04:40:49 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:873: Error on create volume rollback: failed to clean up volume: 5d6a205b63f2455ba8a825d96102020a [heketi] ERROR 2018/06/28 04:40:49 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1183: Create Block Volume Rollback error: failed to clean up volume: 5d6a205b63f2455ba8a825d96102020a [heketi] ERROR 2018/06/28 04:40:49 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Create Block Volume Failed: Unable to execute command on glusterfs-storage-fwh6t: [asynchttp] INFO 2018/06/28 04:40:49 asynchttp.go:292: Completed job bb4a4d0a2034567ed4d452e0c3aed545 in 25.416868303s [negroni] Started GET /queue/bb4a4d0a2034567ed4d452e0c3aed545 [negroni] Completed 500 Internal Server Error in 403.83µs [negroni] Started POST /blockvolumes [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #0 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #0 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #1 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #0 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #1 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #2 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #0 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #1 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #2 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #3 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #4 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #5 [heketi] ERROR 2018/06/28 04:40:59 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1167: Create Block Volume Build Failed: No space [negroni] Completed 500 Internal Server Error in 28.972259ms ---------------------- The test case for the title : "On a 4 node setup heketi block volume creation fails when a node is powered off" should be 1. create a 4 node cluster 2. bring down any one of the 4 nodes 3. create a blockvolume with ha count 3 (remember if you are using heketi-cli you must specify it) 4. blockvolume should get created. Corner case not part of the test case: If a node goes down *during* the blockvolume create operation, heketi won't pick other nodes. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2686 |