Fedora Account System
Red Hat Associate
Red Hat Customer
Description of problem: Volumes should be present in all 3 areas. However, some are missing in the cluster Version-Release number of selected component (if applicable): OCP 3.6/CNS 3.6 How reproducible: Uncertain but seems to be a known issue under various guises Steps to Reproduce: 1. oc describe pv 2. heketi-cli topology info 3. gluster volume list 4. Compare the 3 lists Actual results: Some volumes missing in some areas and absent in others Expected results: All volumes present in all areas Additional info: Initial workaround: 1) delete the PVs that don't have volumes in heketi 2) delete the heketi volumes that don't have PVs 3) delete the gluster volumes that don't have heketi volumes 4) Shutdown glusterd pods 5) on each GlusterFS node, for volumes that don't seem to have valid bricks: delete the volume directories 6) Startup glusterd
Additional information: 1) OCP Chronology: a) OCP cluster has been running for about 230 days. b) OCP cluster was initially 3.4/NFS c) OCP cluster was upgraded to 3.5/NFS d) OCP cluster was switched from 3.5/NFS to 3.5/CNS 3.5 e) OCP cluster was upgraded to 3.6/CNS 3.5 f) OCP cluster was upgraded to its current state: 3.6/CNS 3.6 g) OCP cluster has all GA patch with the exception of python-requests (due to other bz) 2) OCP Persistent Volumes: a) There are currently 13 PVs: b) The number can vary from about 10 to 40, depending on the mix. c) The mix comes from running CI/CD Jenkins jobs and adhoc requests from other projects 3) Topology: a) 1 Heketi instance (pod) b) 3 glusterfs instance (pods) in a single pool using local storage c) 1 gluster-s3-dc instance (pod) [defined but not currently used] d) 1 glusterblock-provisioner-dc (pod) [defined but not currently used]
Changing step 4 on Steps to Reproduce: 4) lvs 5) Compare the 4 lists
Initial workaround: 7) Delete any gluster-related Logical Volumes (brick-* and tp_*) For production environments, please carefully test the commands in another environment and contact GSS. This workaround has *not* been fully vetted
(In reply to Thom Carlin from comment #0) > Initial workaround: > 1) delete the PVs that don't have volumes in heketi > 2) delete the heketi volumes that don't have PVs > 3) delete the gluster volumes that don't have heketi volumes > 4) Shutdown glusterd pods > 5) on each GlusterFS node, for volumes that don't seem to have valid bricks: > delete the volume directories > 6) Startup glusterd I also found it necessary to cleanup (remove) logical volumes that had no associated gluster volumes. I stole these steps from the heketi logs... # umount /var/lib/heketi/mounts/vg_f067a6d1192e10332ef54923357f5d31/brick_9dff504c8f34d706c1a718f7b3f768da # lvremove -f vg_f067a6d1192e10332ef54923357f5d31/tp_9dff504c8f34d706c1a718f7b3f768da # sed -i.save "/brick_9dff504c8f34d706c1a718f7b3f768da/d" /var/lib/heketi/fstab ^^^ command were executed inside CNS pods (e.g. oc rsh glusterfs-storage-ABCXYZ)