Description of problem: Heketi volume creation in loop fails during parallel node reboot. Version-Release number of selected component (if applicable): # rpm -qa | grep openshift atomic-openshift-clients-3.10.0-0.67.0.git.0.ccd325f.el7.x86_64 openshift-ansible-roles-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch atomic-openshift-docker-excluder-3.10.0-0.67.0.git.0.ccd325f.el7.noarch atomic-openshift-excluder-3.10.0-0.67.0.git.0.ccd325f.el7.noarch atomic-openshift-3.10.0-0.67.0.git.0.ccd325f.el7.x86_64 openshift-ansible-docs-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch openshift-ansible-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch atomic-openshift-hyperkube-3.10.0-0.67.0.git.0.ccd325f.el7.x86_64 atomic-openshift-node-3.10.0-0.67.0.git.0.ccd325f.el7.x86_64 openshift-ansible-playbooks-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch -- Heketi version in heketi pod # rpm -qa | grep heketi python-heketi-7.0.0-1.el7rhgs.x86_64 heketi-client-7.0.0-1.el7rhgs.x86_64 heketi-7.0.0-1.el7rhgs.x86_64 -- Gluster version in gluster pod # rpm -qa | grep gluster glusterfs-libs-3.8.4-54.12.el7rhgs.x86_64 glusterfs-3.8.4-54.12.el7rhgs.x86_64 glusterfs-api-3.8.4-54.12.el7rhgs.x86_64 glusterfs-cli-3.8.4-54.12.el7rhgs.x86_64 glusterfs-server-3.8.4-54.12.el7rhgs.x86_64 gluster-block-0.2.1-20.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-54.12.el7rhgs.x86_64 glusterfs-fuse-3.8.4-54.12.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-54.12.el7rhgs.x86_64 How reproducible: 1:1 Steps to Reproduce: 1. Create a 4 node CNS 3.10 setup 2. Initiated 30 heketi blockvolume creation of 10GB each 3. Rebooted 1 node. 4. Observed that heketi blockvolume creation fails during the reboot and resumes after the node is up. There is sufficient space on the other 3 nodes for blockvolumes to be created. 5. Initiated another reboot on the same node and observed the same behaviour. Actual results: Heketi block volume creation failed during the reboot and resumed after node is up Expected results: Heketi block volume creation should proceed unaffected during node reboot since 3 nodes are still up Additional info: Attached logs before initiating volume creation and after
Does it happen always? Without the loop and whatnot, trying to create a block volume when 1 out of 4 nodes is down?
(And could it be in any way related to bug 1595531 ?