DescriptionMangirdas Judeikis
2018-03-09 09:38:24 UTC
Description of problem:
Docker 1.13 is available in OCP channels and is breaking 3.7 cluster, CNS to be specific.
Version-Release number of selected component (if applicable):
How reproducible:
Build new OCP cluster 3.7 with repos:
- rhel-7-server-rpms
- rhel-7-server-extras-rpms
- rhel-7-server-ose-3.7-rpms
- rhel-7-fast-datapath-rpms
- rh-gluster-3-client-for-rhel-7-server-rpms
Gluster (CNS on OCP broken)
[ec2-user@ip-10-105-0-90 ~]$ oc get pods
NAME READY STATUS RESTARTS AGE
glusterfs-storage-lbcbj 0/1 CrashLoopBackOff 6 6m
glusterfs-storage-nfvxt 0/1 CrashLoopBackOff 6 6m
glusterfs-storage-r4w5k 0/1 CrashLoopBackOff 6 6m
[ec2-user@ip-10-105-0-90 ~]$ oc logs -f glusterfs-storage-lbcbj
env variable is set. Update in gluster-blockd.service
Couldn't find an alternative telinit implementation to spawn.
On our working system (provisioned yesterday) we have:
Server:
Version: 1.12.6
API version: 1.24
Package version: docker-1.12.6-71.git3e8e77d.el7.x86_64
Go version: go1.8.3
Git commit: 3e8e77d/1.12.6
Built: Wed Dec 13 12:18:58 2017
OS/Arch: linux/amd64
On our broken systems (provisioned today) we have:
Server:
Version: 1.13.1
API version: 1.26 (minimum version 1.12)
Package version: <unknown>
Go version: go1.8.3
Git commit: 774336d/1.13.1
Built: Tue Feb 20 13:46:34 2018
OS/Arch: linux/amd64
Experimental: false
Didint tested anything else as we needed to rollback cluster back.
Yum search --showduplicates
https://gist.github.com/mjudeikis/418a09a357ce5fd3d9886ee42fbfbded
Comment 2Mangirdas Judeikis
2018-03-09 10:06:28 UTC
This is just one confirmed bug, we didn't had a chance to test all other Openshift ojects. It need validation before making it "only gluster bug"
This also affects RHEL Atomic Host 7.4.5 and higher. If you use any of the newer AWS Atomic Host images released this year, you run into the same issue.
(In reply to Billy Holmes from comment #6)
> This also affects RHEL Atomic Host 7.4.5 and higher. If you use any of the
> newer AWS Atomic Host images released this year, you run into the same issue.
I dont think we can do something from CNS side to resolve this issue. The feasible solution is already documented in the kcs article mentioned in the bz comment. I am closing this bugzilla for now. Please feel free to reopen this if anything left to be addressed.