Description of problem: On a 4 node setup heketi block volume creation fails when a node is powered off Version-Release number of selected component (if applicable): # rpm -qa| grep openshift atomic-openshift-clients-3.10.0-0.67.0.git.0.ccd325f.el7.x86_64 openshift-ansible-roles-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch atomic-openshift-docker-excluder-3.10.0-0.67.0.git.0.ccd325f.el7.noarch atomic-openshift-excluder-3.10.0-0.67.0.git.0.ccd325f.el7.noarch atomic-openshift-3.10.0-0.67.0.git.0.ccd325f.el7.x86_64 openshift-ansible-docs-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch openshift-ansible-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch atomic-openshift-hyperkube-3.10.0-0.67.0.git.0.ccd325f.el7.x86_64 atomic-openshift-node-3.10.0-0.67.0.git.0.ccd325f.el7.x86_64 openshift-ansible-playbooks-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch # oc rsh glusterfs-storage-77pm2 sh-4.2# rpm -qa| grep gluster glusterfs-libs-3.8.4-54.12.el7rhgs.x86_64 glusterfs-3.8.4-54.12.el7rhgs.x86_64 glusterfs-api-3.8.4-54.12.el7rhgs.x86_64 glusterfs-cli-3.8.4-54.12.el7rhgs.x86_64 glusterfs-server-3.8.4-54.12.el7rhgs.x86_64 gluster-block-0.2.1-20.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-54.12.el7rhgs.x86_64 glusterfs-fuse-3.8.4-54.12.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-54.12.el7rhgs.x86_64 # oc rsh heketi-storage-1-bccs6 sh-4.2# rpm -qa | grep heketi python-heketi-7.0.0-1.el7rhgs.x86_64 heketi-client-7.0.0-1.el7rhgs.x86_64 heketi-7.0.0-1.el7rhgs.x86_64 How reproducible: 2:2 Steps to Reproduce: 1. Create a 4 node CNS setup using ansible with ha count=3 2. Created 20 10GB file volumes. 3. Poweroff 1 node manually. 4. Create 20 blockvolumes of size 5GB using heketi. # for i in {1..20} ; do heketi-cli blockvolume create --name=b-vol-$i --size=5 ; sleep 10; done 5. Blockvolue creation fails with error message "Failed to allocate new block volume: No space" even when there is sufficient space on the 3 node which are up. -- snip of heketi topology before block volume creation Node Id: b2a1923cf742cedfa9ea81ea78e304ed State: online Cluster Id: 14321f429bcd4c1c6e77017fd714ebe8 Zone: 1 Management Hostnames: dhcp47-70.lab.eng.blr.redhat.com Storage Hostnames: 10.70.47.70 Devices: Id:543b45dce8a977adfd6000c0eedd4cd4 Name:/dev/sdd State:online Size (GiB):199 Used (GiB):90 Free (GiB):109 Bricks: Id:12737b9e396ff18f196869d210ce1e92 Size (GiB):10 Path: /var/lib/heketi/mounts/vg_543b45dce8a977adfd6000c0eedd4cd4/brick_12737b9e396ff18f196869d210ce1e92/brick Id:1c8e02a6d92a37e652cbd5d18140af50 Size (GiB):10 Path: /var/lib/heketi/mounts/vg_543b45dce8a977adfd6000c0eedd4cd4/brick_1c8e02a6d92a37e652cbd5d18140af50/brick Id:a2f7169567380ab6fdaa56073547dee6 Size (GiB):10 Path: /var/lib/heketi/mounts/vg_543b45dce8a977adfd6000c0eedd4cd4/brick_a2f7169567380ab6fdaa56073547dee6/brick Id:b2dcb6da44c6e31f553e6f64f4e9b627 Size (GiB):10 Path: /var/lib/heketi/mounts/vg_543b45dce8a977adfd6000c0eedd4cd4/brick_b2dcb6da44c6e31f553e6f64f4e9b627/brick Id:c409fde9644556c3f1149c12fd697fda Size (GiB):10 Path: /var/lib/heketi/mounts/vg_543b45dce8a977adfd6000c0eedd4cd4/brick_c409fde9644556c3f1149c12fd697fda/brick Id:eba5e57293b817a54b2f0e236f95df76 Size (GiB):10 Path: /var/lib/heketi/mounts/vg_543b45dce8a977adfd6000c0eedd4cd4/brick_eba5e57293b817a54b2f0e236f95df76/brick Id:b87af79398381da12d08232eda69c722 Name:/dev/sde State:online Size (GiB):199 Used (GiB):42 Free (GiB):157 Bricks: Id:2d5c878c07be7b52491099cff96910bc Size (GiB):10 Path: /var/lib/heketi/mounts/vg_b87af79398381da12d08232eda69c722/brick_2d5c878c07be7b52491099cff96910bc/brick Id:400b7b1f147f2cce6f702f0e37e49a07 Size (GiB):10 Path: /var/lib/heketi/mounts/vg_b87af79398381da12d08232eda69c722/brick_400b7b1f147f2cce6f702f0e37e49a07/brick Id:5087442f15656c1b6c4ff0cc81c5cfd7 Size (GiB):2 Path: /var/lib/heketi/mounts/vg_b87af79398381da12d08232eda69c722/brick_5087442f15656c1b6c4ff0cc81c5cfd7/brick Id:aef10fc9ec18aca6e1f276ba977a72f8 Size (GiB):10 Path: /var/lib/heketi/mounts/vg_b87af79398381da12d08232eda69c722/brick_aef10fc9ec18aca6e1f276ba977a72f8/brick Node Id: fb1bbfb8c67284475bc21bb279007e8c State: online Cluster Id: 14321f429bcd4c1c6e77017fd714ebe8 Zone: 1 Management Hostnames: dhcp46-210.lab.eng.blr.redhat.com Storage Hostnames: 10.70.46.210 Devices: Id:adc7f09c5f19dd31c57ef72b3f2cb64e Name:/dev/sde State:online Size (GiB):199 Used (GiB):110 Free (GiB):89 Bricks: Id:4df3403eb063f838027c4c7b0423250b Size (GiB):10 Path: /var/lib/heketi/mounts/vg_adc7f09c5f19dd31c57ef72b3f2cb64e/brick_4df3403eb063f838027c4c7b0423250b/brick Id:67e2e56375e36deb99e15dc08eecce56 Size (GiB):10 Path: /var/lib/heketi/mounts/vg_adc7f09c5f19dd31c57ef72b3f2cb64e/brick_67e2e56375e36deb99e15dc08eecce56/brick Id:aedfe6854af18bb22fb86ea07c2a1465 Size (GiB):10 Path: /var/lib/heketi/mounts/vg_adc7f09c5f19dd31c57ef72b3f2cb64e/brick_aedfe6854af18bb22fb86ea07c2a1465/brick Id:b5c57402d83c5c92eeae822633e245ae Size (GiB):10 Path: /var/lib/heketi/mounts/vg_adc7f09c5f19dd31c57ef72b3f2cb64e/brick_b5c57402d83c5c92eeae822633e245ae/brick Id:bb8c0f58429a0bf0ee0ce6c80475e8f9 Size (GiB):10 Path: /var/lib/heketi/mounts/vg_adc7f09c5f19dd31c57ef72b3f2cb64e/brick_bb8c0f58429a0bf0ee0ce6c80475e8f9/brick Id:bdbfa8fe8e94583e0087bff49a012170 Size (GiB):10 Path: /var/lib/heketi/mounts/vg_adc7f09c5f19dd31c57ef72b3f2cb64e/brick_bdbfa8fe8e94583e0087bff49a012170/brick Id:c41287d78b90060e63f2c7dd8934072d Size (GiB):10 Path: /var/lib/heketi/mounts/vg_adc7f09c5f19dd31c57ef72b3f2cb64e/brick_c41287d78b90060e63f2c7dd8934072d/brick Id:c716f837c0e94ee690f83cf61adee31b Size (GiB):10 Path: /var/lib/heketi/mounts/vg_adc7f09c5f19dd31c57ef72b3f2cb64e/brick_c716f837c0e94ee690f83cf61adee31b/brick Id:ecc661dd93b2453a95a6ae6b48c2ea8a Size (GiB):10 Path: /var/lib/heketi/mounts/vg_adc7f09c5f19dd31c57ef72b3f2cb64e/brick_ecc661dd93b2453a95a6ae6b48c2ea8a/brick Id:ecd528618df514d2b1061c62afd86999 Name:/dev/sdd State:online Size (GiB):199 Used (GiB):62 Free (GiB):137 Bricks: Id:1b50e4eb496c7c2b325ff4b11708198f Size (GiB):10 Path: /var/lib/heketi/mounts/vg_ecd528618df514d2b1061c62afd86999/brick_1b50e4eb496c7c2b325ff4b11708198f/brick Id:3e1256bd97344681519613e92546a779 Size (GiB):10 Path: /var/lib/heketi/mounts/vg_ecd528618df514d2b1061c62afd86999/brick_3e1256bd97344681519613e92546a779/brick Id:7b642187486250dca487691e0bef5e81 Size (GiB):10 Path: /var/lib/heketi/mounts/vg_ecd528618df514d2b1061c62afd86999/brick_7b642187486250dca487691e0bef5e81/brick Id:80c00b04b09f566c65ebcb6f975c4fa2 Size (GiB):10 Path: /var/lib/heketi/mounts/vg_ecd528618df514d2b1061c62afd86999/brick_80c00b04b09f566c65ebcb6f975c4fa2/brick Id:9947ee0999f188218a42f588c1677cc3 Size (GiB):10 Path: /var/lib/heketi/mounts/vg_ecd528618df514d2b1061c62afd86999/brick_9947ee0999f188218a42f588c1677cc3/brick Id:9db57e4184a91f004d608f492b94433d Size (GiB):2 Path: /var/lib/heketi/mounts/vg_ecd528618df514d2b1061c62afd86999/brick_9db57e4184a91f004d608f492b94433d/brick -------------snip ------------ Actual results: Blockvolue creation fails with error message "Failed to allocate new block volume: No space" even when there is sufficient space on the 3 node which are up. Expected results: Blockvolume creation should be successful since 3 nodes are up Additional info: logs attached
As expected not able to login to pod hosted on the powered off node # oc rsh glusterfs-storage-2x6qg Error from server: error dialing backend: dial tcp 10.70.46.29:10250: getsockopt: no route to host -- process error messages # for i in {1..20} ; do heketi-cli blockvolume create --name=b-vol-$i --size=5 ; sleep 10; done Error: Unable to execute command on glusterfs-storage-fwh6t: Error: Failed to allocate new block volume: No space Error: Failed to allocate new block volume: No space Error: Failed to allocate new block volume: No space Error: Failed to allocate new block volume: No space Error: Failed to allocate new block volume: No space Error: Failed to allocate new block volume: No space Error: Failed to allocate new block volume: No space Error: Failed to allocate new block volume: No space --- heketi log snip --------------------- [kubeexec] DEBUG 2018/06/28 04:40:48 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp46-210.lab.eng.blr.redhat.com Pod: glusterfs-storage-rqcpl Command: gluster --mode=script volume stop vol_5d6a205b63f2455ba8a825d96102020a force Result: volume stop: vol_5d6a205b63f2455ba8a825d96102020a: success [heketi] WARNING 2018/06/28 04:40:49 failed to delete volume 5d6a205b63f2455ba8a825d96102020a via dhcp46-210.lab.eng.blr.redhat.com: Unable to delete volume vol_5d6a205b63f2455ba8a825d96102020a: Unable to execute command on glusterfs-storage-rqcpl: volume delete: vol_5d6a205b63f2455ba8a825d96102020a: failed: Some of the peers are down [kubeexec] ERROR 2018/06/28 04:40:49 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to run command [gluster --mode=script volume delete vol_5d6a205b63f2455ba8a825d96102020a] on glusterfs-storage-rqcpl: Err[command terminated with exit code 1]: Stdout []: Stderr [volume delete: vol_5d6a205b63f2455ba8a825d96102020a: failed: Some of the peers are down ] [cmdexec] ERROR 2018/06/28 04:40:49 /src/github.com/heketi/heketi/executors/cmdexec/volume.go:153: Unable to delete volume vol_5d6a205b63f2455ba8a825d96102020a: Unable to execute command on glusterfs-storage-rqcpl: volume delete: vol_5d6a205b63f2455ba8a825d96102020a: failed: Some of the peers are down [heketi] ERROR 2018/06/28 04:40:49 /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:372: failed to delete volume in cleanup: no hosts available (4 total) [heketi] ERROR 2018/06/28 04:40:49 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:873: Error on create volume rollback: failed to clean up volume: 5d6a205b63f2455ba8a825d96102020a [heketi] ERROR 2018/06/28 04:40:49 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1183: Create Block Volume Rollback error: failed to clean up volume: 5d6a205b63f2455ba8a825d96102020a [heketi] ERROR 2018/06/28 04:40:49 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Create Block Volume Failed: Unable to execute command on glusterfs-storage-fwh6t: [asynchttp] INFO 2018/06/28 04:40:49 asynchttp.go:292: Completed job bb4a4d0a2034567ed4d452e0c3aed545 in 25.416868303s [negroni] Started GET /queue/bb4a4d0a2034567ed4d452e0c3aed545 [negroni] Completed 500 Internal Server Error in 403.83µs [negroni] Started POST /blockvolumes [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #0 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #0 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #1 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #0 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #1 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #2 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #0 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #1 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #2 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #3 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #4 [heketi] INFO 2018/06/28 04:40:59 Allocating brick set #5 [heketi] ERROR 2018/06/28 04:40:59 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1167: Create Block Volume Build Failed: No space [negroni] Completed 500 Internal Server Error in 28.972259ms ----------------------
The test case for the title : "On a 4 node setup heketi block volume creation fails when a node is powered off" should be 1. create a 4 node cluster 2. bring down any one of the 4 nodes 3. create a blockvolume with ha count 3 (remember if you are using heketi-cli you must specify it) 4. blockvolume should get created. Corner case not part of the test case: If a node goes down *during* the blockvolume create operation, heketi won't pick other nodes.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2686