Description of problem: When trying to upgrade a setup from OCP3.11+OCS 3.11 to OCP3.11 + OCS3.11.1 where the no.of volumes are more than 850 , gluster pod has been stuck in 0/1 state for more than 4 hours and when digging deeper found that pvscan has been stuck since the number of lvs here are greater than 1000. Version-Release number of selected component (if applicable): oc v3.11.69 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://master.refarch311.ocsqeblr.com:443 openshift v3.11.69 kubernetes v1.11.0+d4cacc0 sh-4.2# rpm -qa | grep glusterfs glusterfs-libs-3.12.2-32.el7rhgs.x86_64 glusterfs-3.12.2-32.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-32.el7rhgs.x86_64 glusterfs-server-3.12.2-32.el7rhgs.x86_64 glusterfs-api-3.12.2-32.el7rhgs.x86_64 glusterfs-cli-3.12.2-32.el7rhgs.x86_64 glusterfs-fuse-3.12.2-32.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-32.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1. Have a setup of OCP3.11+OCS3.11 in aws environment 2. create 850 file and 50 block volumes. 3. create 100 cirrros pod attached to file and block volumes. 4. Try upgrading the setup to latest version which is OCS3.11.1 Actual results: I see that gluster pod is in 0/1 state for more than four hours and pvscan has been stuck since the number of lvs present were more than 1000. Expected results: gluster pod should not be in 0/1 state and pvscan should be run successfully. Additional info: Below is the workaround we followed to get the pod up and running. 1. removing glusterfs=storage-host label from node 2. rebooting the node / stop and start the aws instance 3. and relabeling the node back to glusterfs=storage-host.
Status?