Description of problem: ++++++++++++++++++++++++ We had an OCP 3.10 + OCS 3.10 setup with gluster-bits=3.12.2-15. Gluser-block version = gluster-block-0.2.1-24.el7rhgs.x86_64. The setup has logging pods configured and the metrics pods couldn't come up. Created around 50 block pvcs in two loops and then attached them to app pods Loop #1 : 101..130 and created app pods bk-101 to bk-130. Result : All pods were in running state. Loop #2: 131..150 and created app pods bk-101 to bk-130. Result: ++++++++++++ 1. None of the new pods came up and the oc describe pod printed following error message. iscsiadm logins were successful on the initator nodes though. ============================================================ Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 7m default-scheduler Successfully assigned bk142-1-fz5gx to dhcp46-65.lab.eng.blr.redhat.com Normal SuccessfulAttachVolume 7m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-ae117351-a45f-11e8-92b7-005056a52fd4" Warning FailedMount 6m (x3 over 6m) kubelet, dhcp46-65.lab.eng.blr.redhat.com MountVolume.MountDevice failed for volume "pvc-ae117351-a45f-11e8-92b7-005056a52fd4" : exit status 1 Warning FailedMount 1m (x3 over 5m) kubelet, dhcp46-65.lab.eng.blr.redhat.com Unable to mount volumes for pod "bk142-1-fz5gx_glusterfs(cd399d0f-a460-11e8-92b7-005056a52fd4)": timeout expired waiting for volumes to attach or mount for pod "glusterfs"/"bk142-1-fz5gx". list of unmounted volumes=[foo-vol]. list of unattached volumes=[foo-vol default-token-8fbpq] 2. Two existing RUNNING pods started going into CrashLoppBackState with following error message: ======================================================== oc describe pod bk124-1-zmq99 +++++++++++++++++++++++++++++++ Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 1h default-scheduler Successfully assigned bk124-1-zmq99 to dhcp46-181.lab.eng.blr.redhat.com Normal SuccessfulAttachVolume 1h attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-11a2ccad-a439-11e8-b3b0-005056a52fd4" Normal Started 1h kubelet, dhcp46-181.lab.eng.blr.redhat.com Started container Warning Unhealthy 16m kubelet, dhcp46-181.lab.eng.blr.redhat.com Liveness probe failed: /dev/sdr on /mnt type xfs (rw,seclabel,relatime,attr2,inode64,noquota) sh: can't create /mnt/random-data.log: Input/output error Normal Pulled 14m (x5 over 1h) kubelet, dhcp46-181.lab.eng.blr.redhat.com Container image "cirros" already present on machine Normal Created 14m (x5 over 1h) kubelet, dhcp46-181.lab.eng.blr.redhat.com Created container Warning Failed 14m (x4 over 16m) kubelet, dhcp46-181.lab.eng.blr.redhat.com Error: failed to start container "foo": Error response from daemon: error setting label on mount source '/var/lib/origin/openshift.local.volumes/pods/dde3d3d9-a458-11e8-92b7-005056a52fd4/volumes/kubernetes.io~iscsi/pvc-11a2ccad-a439-11e8-b3b0-005056a52fd4': SELinux relabeling of /var/lib/origin/openshift.local.volumes/pods/dde3d3d9-a458-11e8-92b7-005056a52fd4/volumes/kubernetes.io~iscsi/pvc-11a2ccad-a439-11e8-b3b0-005056a52fd4 is not allowed: "input/output error" Warning BackOff 1m (x57 over 16m) kubelet, dhcp46-181.lab.eng.blr.redhat.com Back-off restarting failed container oc describe pod bk129-1-qzcs5 ++++++++++++++++++++++++++++++++++ Message: error setting label on mount source '/var/lib/origin/openshift.local.volumes/pods/f0f77021-a458-11e8-92b7-005056a52fd4/volumes/kubernetes.io~iscsi/pvc-213178f1-a439-11e8-b3b0-005056a52fd4': SELinux relabeling of /var/lib/origin/openshift.local.volumes/pods/f0f77021-a458-11e8-92b7-005056a52fd4/volumes/kubernetes.io~iscsi/pvc-213178f1-a439-11e8-b3b0-005056a52fd4 is not allowed: "input/output error" Exit Code: 128 Some info about the setup: ============================ 1. it was seen that the brick for heketidbstorage was NOT ONLINE for the gluster pod -10.70.46.150. 2. Also, on 10.70.46.150, the 2 block-hosting vols had 2 separate PIDS and the brick for vol_9f93ae4c845f3910f5d1558cc5ae9f0a was NOT ONLINE. (We shall be raising a separate bug for the above two issues) Version-Release number of selected component (if applicable): ++++++++++++++++++++++++ [root@dhcp46-137 ~]# oc version oc v3.10.14 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://dhcp46-137.lab.eng.blr.redhat.com:8443 openshift v3.10.14 kubernetes v1.10.0+b81c8f8 [root@dhcp46-137 ~]# Gluster 3.4.0 ============== [root@dhcp46-137 ~]# oc rsh glusterfs-storage-q22cl rpm -qa|grep gluster glusterfs-client-xlators-3.12.2-15.el7rhgs.x86_64 glusterfs-cli-3.12.2-15.el7rhgs.x86_64 python2-gluster-3.12.2-15.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-15.el7rhgs.x86_64 glusterfs-libs-3.12.2-15.el7rhgs.x86_64 glusterfs-3.12.2-15.el7rhgs.x86_64 glusterfs-api-3.12.2-15.el7rhgs.x86_64 glusterfs-fuse-3.12.2-15.el7rhgs.x86_64 glusterfs-server-3.12.2-15.el7rhgs.x86_64 gluster-block-0.2.1-24.el7rhgs.x86_64 [root@dhcp46-137 ~]# [root@dhcp46-137 ~]# oc rsh heketi-storage-1-px7jd rpm -qa|grep heketi python-heketi-7.0.0-6.el7rhgs.x86_64 heketi-7.0.0-6.el7rhgs.x86_64 heketi-client-7.0.0-6.el7rhgs.x86_64 [root@dhcp46-137 ~]# gluster client version ========================= [root@dhcp46-65 ~]# rpm -qa|grep gluster glusterfs-libs-3.12.2-15.el7.x86_64 glusterfs-3.12.2-15.el7.x86_64 glusterfs-fuse-3.12.2-15.el7.x86_64 glusterfs-client-xlators-3.12.2-15.el7.x86_64 [root@dhcp46-65 ~]# How reproducible: ++++++++++++++++++++++++ The issue was seen on one setup. The setup is kept in the same condition. Steps to Reproduce: ++++++++++++++++++++++++ 1. Create an OCP +OCS 3.10 setup. 2. Upgrade the docker version to 1.13.1.74 and also update the gluster client packages. The pods will be restarted as docker is upgraded. 3. Once setup is up, create block pvcs and then bound them to app pods. 4. Check the pod status and the gluster v status. Actual results: ++++++++++++++++++++++++ The pods should Expected results: ++++++++++++++++++++++++ The new pods should have got created successfully and old ones should keep running.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0285