Bug 1489358
Summary: | pods failed to start while using cri-o-centos image | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Gan Huang <ghuang> |
Component: | Containers | Assignee: | Giuseppe Scrivano <gscrivan> |
Status: | CLOSED UPSTREAM | QA Contact: | DeShuai Ma <dma> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 3.7.0 | CC: | aos-bugs, dcbw, dma, ghuang, gpei, gscrivan, jeder, jokerman, mmccomas, mpatel |
Target Milestone: | --- | Keywords: | TestBlocker |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-09-20 05:48:11 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Gan, does it work now for you? Can I close this BZ? Waiting https://github.com/openshift/openshift-ansible/pull/5354 merged, or we have no way to specify the upstream cri-o-centos image. @Gan, I've tagged also docker.io/gscrivano/cri-o to be the same as docker.io/gscrivano/cri-o-centos. Does that help? Thanks Giuseppe, now I'm able to continue the testing with the centos image :) Unfortunately it seems that cri-o service is not working well with OpenShift. After the installation, all pods were in ContainerCreating status: # oc get po NAME READY STATUS RESTARTS AGE docker-registry-1-deploy 0/1 ContainerCreating 0 23m registry-console-1-deploy 0/1 ContainerCreating 0 23m router-1-deploy 0/1 ContainerCreating 0 24m # oc describe po docker-registry-1-deploy <--snip--> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 26m 26m 1 default-scheduler Normal Scheduled Successfully assigned docker-registry-1-deploy to qe-master-registry-router-nfs-etcd-1.0913-i4r.qe.rhcloud.com 26m 26m 1 kubelet, qe-master-registry-router-nfs-etcd-1.0913-i4r.qe.rhcloud.com Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "deployer-token-489x3" 26m 1m 113 kubelet, qe-master-registry-router-nfs-etcd-1.0913-i4r.qe.rhcloud.com Warning FailedSync Error syncing pod the logs for atomic-openshift-node and cri-o services will be attached. might be related to https://github.com/projectatomic/atomic-system-containers/pull/113 I have already created new builds of the images including that change. After that, I was able to deploy a cluster that uses docker.io/gscrivano/cri-o-centos Thanks, the issue is gone when using the new build. But seems encountering another issue, confirming... Will paste the test result. Installation succeeded. But S2I build failed: Tested version: # cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.4 (Maipo) # uname -r 3.10.0-693.2.1.el7.x86_64 # atomic images list REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE TYPE > docker.io/gscrivano/cri-o latest 62986b1ae28c 2017-09-14 06:01 447.2 MB ostree # openshift version openshift v3.7.0-0.126.1 kubernetes v1.7.0+80709908fd etcd 3.2.1 1) Router and Register are running after the installation # oc get po NAME READY STATUS RESTARTS AGE docker-registry-1-qqtln 1/1 Running 0 5h registry-console-1-9h432 1/1 Running 0 5h router-1-7k09x 1/1 Running 0 5h 2) S2I build failed: # oc get po -n install-test NAME READY STATUS RESTARTS AGE cakephp-mysql-example-1-build 0/1 Init:Error 0 5h mongodb-1-deploy 0/1 Error 0 5h mysql-1-deploy 0/1 Error 0 5h nodejs-mongodb-example-1-build 0/1 Init:Error 0 5h # oc describe po cakephp-mysql-example-1-build -n install-test <--snip--> Init Containers: git-clone: Image ID: 51fb50a7b319edfeda417db03c602c3ee9279e652e8e12b3c40b5438f5a6b042 Port: <none> Command: openshift-git-clone Args: --loglevel=0 State: Terminated Reason: Error Message: Cloning "https://github.com/openshift/cakephp-ex.git" ... error: fatal: unable to access 'https://github.com/openshift/cakephp-ex.git/': Could not resolve host: github.com; Unknown error <--snip--> 3) Unable to run "oc rsh ${pod}" # oc rsh router-1-7k09x Error from server: error dialing backend: dial tcp 192.168.2.9:10010: getsockopt: no route to host Marking it TestBlocker temporarily as it's blocking QE to test cri-o-centos on OpenShift. Please let me know if any logs needed and feel free to reassign to proper component. can you quickly try? # ping -c 1 github.com (from the host) # runc exec cri-o ping -c 1 github.com what is the output of the two commands? # ping -c 1 github.com PING github.com (192.30.253.113) 56(84) bytes of data. 64 bytes from lb-192-30-253-113-iad.github.com (192.30.253.113): icmp_seq=1 ttl=55 time=70.7 ms --- github.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 70.708/70.708/70.708/0.000 ms # runc exec cri-o ping -c 1 github.com PING github.com (192.30.253.113) 56(84) bytes of data. 64 bytes from lb-192-30-253-113-iad.github.com (192.30.253.113): icmp_seq=1 ttl=55 time=70.4 ms --- github.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 70.495/70.495/70.495/0.000 ms it looks a SELinux issue, probably caused by NFS, in fact if I "setenforce 0" I get a bit further. I could not get the application deployed on your cluster, it seems there are some networking issues that prevent to pull from github.com. I've tried in local and it works fine for me (I then hit https://github.com/openshift/origin/issues/16349). Could you verify the same configuration you have used without crio works fine? I've just tried this on the all-in-one VM you have left running: # oadm policy add-cluster-role-to-user cluster-admin system:serviceaccount:default:default # setenforce 0 # oc new-app https://github.com/giuseppe/hello-openshift-plus.git and it was deployed correctly. When I try some of the OpenShift examples I get this error: # oc logs bc/cakephp-ex Cloning "https://github.com/openshift/cakephp-ex.git" ... Commit: 7969534afdf9490ca79e37e672f0b9c81887ec28 (Merge pull request #81 from bparees/readiness) Author: Ben Parees <bparees.github.com> Date: Mon Sep 11 01:15:51 2017 -0400 ERROR: Error writing header for "scripts": io: read/write on closed pipe ERROR: Error writing tar: io: read/write on closed pipe error: build error: Error response from daemon: {"message":"No such container: crio"} On the networking side... The CRIO RPM installs CNI network configs in /etc/cni/net.d, but its CNI implementation only uses the first one found in the directory just like Kubernetes does. That's a limitation of Kubernetes at this point, and something we want to remove from Kube once the multi-network stuff lands. The pattern that almost all complex CNI plugins for kube use is to write out a config to /etc/cni/net.d when they are ready. Which openshift-sdn does. But CRIO doesn't care, since it sees 100-crio-bridge.conf first and uses that. So the end result is that you've asked OpenShift to use the openshift-sdn network plugin, but underneath, CRIO isn't using the openshift-sdn network plugin but its default bridge config instead. So yeah, clearly your networking isn't going to work. One suggestion is that when openshift-sdn is selected in ansible, "rm -rf /etc/cni/net.d/100-crio-bridge.conf /etc/cni/net.d/200-loopback.conf" as part of the ansible playbook for oepnshift-sdn or sometihng like that. with the new image for the system container, I see this error: # oc new-app https://github.com/openshift/cakephp-ex.git # oc logs -f bc/cakephp-ex ERROR: Error writing header for "scripts": io: read/write on closed pipe ERROR: Error writing tar: io: read/write on closed pipe error: build error: Error response from daemon: {"message":"No such container: crio"} I've tried also with cri-o directly on the host but I see the same error. Mrunal, should we file this separately? This looks like it is build related and we are working with Ben Parees to fix that. We should track it separately and close it once we finish the build integration work. thanks for the explanation. @Gan, are you fine to close this? The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
Description of problem: Set "openshift_use_crio=True" to install OpenShift with cri-o. Installation failed at "Start the CRI-O service" during the installation. Version-Release number of the following components: openshift-ansible-3.7.0-0.125.0.git.0.91043b6.el7.noarch.rpm # atomic images list REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE TYPE > docker.io/gscrivano/cri-o-centos latest 217633c9f629 2017-09-07 01:26 374.33 MB ostree RHEL-7.4 How reproducible: always Steps to Reproduce: 1. Set "openshift_use_crio=True" 2. Trigger the installation 3. Actual results: TASK [docker : Start the CRI-O service] **************************************** Thursday 07 September 2017 06:17:35 +0000 (0:00:00.446) 0:02:35.209 **** fatal: [host-8-241-61.host.centralci.eng.rdu2.redhat.com]: FAILED! => { "changed": false, "failed": true } MSG: Unable to start service cri-o: Job for cri-o.service failed because the control process exited with error code. See "systemctl status cri-o.service" and "journalctl -xe" for details. Expected results: No errors Additional info: #journalctl -u cri-o Sep 07 02:34:25 qe-ghuang-master-etcd-nfs-1 systemd[1]: cri-o.service: control process exited, code=exited status=1 Sep 07 02:34:25 qe-ghuang-master-etcd-nfs-1 systemd[1]: Failed to start crio daemon. Sep 07 02:34:25 qe-ghuang-master-etcd-nfs-1 systemd[1]: Unit cri-o.service entered failed state. Sep 07 02:34:25 qe-ghuang-master-etcd-nfs-1 systemd[1]: cri-o.service failed. Sep 07 02:34:26 qe-ghuang-master-etcd-nfs-1 systemd[1]: cri-o.service holdoff time over, scheduling restart. Sep 07 02:34:26 qe-ghuang-master-etcd-nfs-1 systemd[1]: Starting crio daemon... Sep 07 02:34:26 qe-ghuang-master-etcd-nfs-1 runc[17681]: invalid --runtime value "stat /usr/bin/runc: no such file or directory" Sep 07 02:34:27 qe-ghuang-master-etcd-nfs-1 systemd[1]: cri-o.service: main process exited, code=exited, status=1/FAILURE Sep 07 02:34:27 qe-ghuang-master-etcd-nfs-1 runc[17706]: container "cri-o" does not exist Sep 07 02:34:27 qe-ghuang-master-etcd-nfs-1 systemd[1]: cri-o.service: control process exited, code=exited status=1 Sep 07 02:34:27 qe-ghuang-master-etcd-nfs-1 systemd[1]: Failed to start crio daemon. Sep 07 02:34:27 qe-ghuang-master-etcd-nfs-1 systemd[1]: Unit cri-o.service entered failed state. Sep 07 02:34:27 qe-ghuang-master-etcd-nfs-1 systemd[1]: cri-o.service failed. Sep 07 02:34:27 qe-ghuang-master-etcd-nfs-1 systemd[1]: cri-o.service holdoff time over, scheduling restart. Sep 07 02:34:27 qe-ghuang-master-etcd-nfs-1 systemd[1]: start request repeated too quickly for cri-o.service Sep 07 02:34:27 qe-ghuang-master-etcd-nfs-1 systemd[1]: Failed to start crio daemon. Sep 07 02:34:27 qe-ghuang-master-etcd-nfs-1 systemd[1]: Unit cri-o.service entered failed state. Sep 07 02:34:27 qe-ghuang-master-etcd-nfs-1 systemd[1]: cri-o.service failed.