[Tracker-RHGS-BZ#1519105] [Tracker-RHGS-BZ#1631664] If replace-brick command fails while performing device remove then all retries of device remove on the device return 'Id not found' error
Description of problem:
On introducing a flaky network environment in a OCP+CNS 3.10 setup with 4 nodes having 20 file volumes of 1 GB and 9 block hosting volumes of 100GB, device removal failed and a retry to remove the same device failed with error 'ID not found'
Version-Release number of selected component (if applicable):
# rpm -qa | grep openshift
atomic-openshift-clients-3.10.0-0.67.0.git.0.ccd325f.el7.x86_64
openshift-ansible-roles-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch
atomic-openshift-docker-excluder-3.10.0-0.67.0.git.0.ccd325f.el7.noarch
atomic-openshift-excluder-3.10.0-0.67.0.git.0.ccd325f.el7.noarch
atomic-openshift-3.10.0-0.67.0.git.0.ccd325f.el7.x86_64
openshift-ansible-docs-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch
openshift-ansible-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch
atomic-openshift-hyperkube-3.10.0-0.67.0.git.0.ccd325f.el7.x86_64
atomic-openshift-node-3.10.0-0.67.0.git.0.ccd325f.el7.x86_64
openshift-ansible-playbooks-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch
# oc rsh heketi-storage-1-fn45d
sh-4.2# rpm -qa | grep heketi
python-heketi-7.0.0-2.el7rhgs.x86_64
heketi-client-7.0.0-2.el7rhgs.x86_64
heketi-7.0.0-2.el7rhgs.x86_64
# oc rsh glusterfs-storage-n9dbf
sh-4.2# rpm -qa | grep gluster
glusterfs-client-xlators-3.8.4-54.12.el7rhgs.x86_64
glusterfs-fuse-3.8.4-54.12.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-54.12.el7rhgs.x86_64
glusterfs-libs-3.8.4-54.12.el7rhgs.x86_64
glusterfs-3.8.4-54.12.el7rhgs.x86_64
glusterfs-api-3.8.4-54.12.el7rhgs.x86_64
glusterfs-cli-3.8.4-54.12.el7rhgs.x86_64
glusterfs-server-3.8.4-54.12.el7rhgs.x86_64
gluster-block-0.2.1-20.el7rhgs.x86_64
How reproducible:
1/1
Steps to Reproduce:
1.
2.
3.
Actual results:
# heketi-cli device remove 49520972775494db8d05b556c16be1f3
Error: Failed to remove device, error: Unable to replace brick 10.70.47.70:/var/lib/heketi/mounts/vg_49520972775494db8d05b556c16be1f3/brick_601f8a3d425a47029588c8e4ab193d6a/brick with 10.70.46.124:/var/lib/heketi/mounts/vg_bf7c3091e08ca24ac91b37e25ab37336/brick_f74727daddd1932fb5dc42754c9dd423/brick for volume vol_4fbe50f549a36b7fb162297f3e15260f
# heketi-cli device remove 49520972775494db8d05b556c16be1f3
Error: Failed to remove device, error: Id not found
Expected results:
Device removal should be successful
Additional info:
heketi logs, topology info, journalctl logs, heketi db dump shall be attached