Description of problem: In a CNS 4 node setup observed core file generated on 1 of the gluster pods while creating and deleting pvs in a loop alongwith running gluster volume heal on all gluster pods. Version-Release number of selected component (if applicable): # rpm -qa| grep openshift openshift-ansible-roles-3.9.31-1.git.34.154617d.el7.noarch atomic-openshift-excluder-3.9.31-1.git.0.ef9737b.el7.noarch atomic-openshift-master-3.9.31-1.git.0.ef9737b.el7.x86_64 atomic-openshift-sdn-ovs-3.9.31-1.git.0.ef9737b.el7.x86_64 atomic-openshift-3.9.31-1.git.0.ef9737b.el7.x86_64 openshift-ansible-docs-3.9.31-1.git.34.154617d.el7.noarch openshift-ansible-playbooks-3.9.31-1.git.34.154617d.el7.noarch atomic-openshift-docker-excluder-3.9.31-1.git.0.ef9737b.el7.noarch atomic-openshift-node-3.9.31-1.git.0.ef9737b.el7.x86_64 atomic-openshift-clients-3.9.31-1.git.0.ef9737b.el7.x86_64 openshift-ansible-3.9.31-1.git.34.154617d.el7.noarch # oc rsh glusterfs-storage-mrfh4 sh-4.2# rpm -qa| grep gluster glusterfs-libs-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 glusterfs-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 glusterfs-api-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 glusterfs-fuse-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 glusterfs-server-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 gluster-block-0.2.1-14.1.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 glusterfs-cli-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 glusterfs-geo-replication-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 glusterfs-debuginfo-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 # oc rsh heketi-storage-1-55bw4 sh-4.2# rpm -qa| grep heketi python-heketi-6.0.0-7.4.el7rhgs.x86_64 heketi-client-6.0.0-7.4.el7rhgs.x86_64 heketi-6.0.0-7.4.el7rhgs.x86_64 How reproducible: 1/1 Steps to Reproduce: CNS 4 node setup each node having 1TB device and CPU = 32 (4 cores) Memory = 72GB 1.Created 100 1Gb mongodb pods and ran IO (using dd) 2.Upgraded the system from 3.9 live build to the experian hotfix build 3.After all 4 gluster pods have spinned up and in 1/1 running state. All mongodb pods are also in running state. 4. Initiated creation and deletion of 200 pvs alongwith running gluster volume heal on all 4 gluster pods. ---- creation and delation of pvs ---------- while true do for i in {101..300} do ./pvc_create.sh c$i 1; sleep 30; done sleep 40 for i in {101..300} do oc delete pvc c$i; sleep 20; done done ---------------------pv creation/deletion------------- running gluster volume heal : while true; do for i in $(gluster v list | grep vol); do gluster v heal $i; sleep 2; done; done 5. A core is generated on a gluster pod Actual results: core file is generated on 1 of the gluster pod and 2 gluster pods are in 0/1 state Expected results: No core files should be generated and all gluster pods should be in 1/1 Running state. Additional info:
Considering the issue is about 'fini' path, and also the CTR xlator, used by tiering. We should be considering removing it from the volgen in CNS builds altogether. That should get it fixed, and for backward compatibility in RHGS (for 1-2% of customers), we can consider making it an option. Mohit's patch in this regard should help: https://review.gluster.org/#/c/20501/
(in reply to comment#5 ) Karthick has changed component to CNS as per comment#7. Hence clearing the needinfo
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3257