Bug 1611830
| Summary: | Regression: Installation fails with CRI-O for various components | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Wolfgang Kulhanek <wkulhane> | ||||
| Component: | Containers | Assignee: | Giuseppe Scrivano <gscrivan> | ||||
| Status: | CLOSED WORKSFORME | QA Contact: | DeShuai Ma <dma> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.10.0 | CC: | amurdaca, aos-bugs, chezhang, jokerman, mmccomas, mpatel, wkulhane, zitang | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.10.z | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-08-21 14:52:01 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
How are these pods started? Are they automatically launched by the installer? Yes. They are launched by the installer (when logging and ASB are enabled). Can you share the playbook or the settings to enable these for the install? I found the one for ansible- ansible_service_broker_install: true What is the one for logging? openshift_logging_install_logging=true can you try disabling selinux? that error shows up when selinux complains about something most of the times (https://github.com/kubernetes-incubator/cri-o/issues/528) Hm. This is an old bugzilla. This worked fine with 3.9(.31/.33). It doesn't work with 3.10.14. I can >try< to turn SELinux off for testing. But I really don't want to for production environments. Will do so when I have an hour or two to test - probably tomorrow. I did try an install with SELinux set to permissive. The logging-es pod came up but the asb deploy still failed. What error did the asb deploy run into? Could you get the logs? Also, could you gather the SELinux AVCs? ausearch -m avc -ts recent There are no logs since the deploy never succeeds.
Events are like this (I did do an oc rollout latest asb to fix it - that's the asb-2 events):
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
1h 1h 1 asb-1-deploy.1548ad8100be7548 Pod Normal Scheduled default-scheduler Successfully assigned asb-1-deploy to infranode1.rhte-cloud2.internal
1h 1h 1 asb.1548ad80ffddb340 DeploymentConfig Normal DeploymentCreated deploymentconfig-controller Created new replication controller "asb-1" for version 1
1h 1h 1 asb-1-bgrw9.1548ad8174cb3063 Pod Normal Scheduled default-scheduler Successfully assigned asb-1-bgrw9 to infranode1.rhte-cloud2.internal
1h 1h 1 asb-1.1548ad817498cbab ReplicationController Normal SuccessfulCreate replication-controller Created pod: asb-1-bgrw9
1h 1h 1 asb-1-deploy.1548ad8169e7cb1c Pod spec.containers{deployment} Normal Started kubelet, infranode1.rhte-cloud2.internal Started container
1h 1h 1 asb-1-deploy.1548ad8168699ece Pod spec.containers{deployment} Normal Created kubelet, infranode1.rhte-cloud2.internal Created container
1h 1h 1 asb-1-deploy.1548ad815f1ba8ad Pod spec.containers{deployment} Normal Pulled kubelet, infranode1.rhte-cloud2.internal Container image "registry.access.redhat.com/openshift3/ose-deployer:v3.10.14" already present on machine
1h 1h 1 asb-1-bgrw9.1548ad81e4c4627c Pod spec.containers{asb} Normal Pulling kubelet, infranode1.rhte-cloud2.internal pulling image "registry.access.redhat.com/openshift3/ose-ansible-service-broker:v3.10"
1h 1h 1 asb-1-bgrw9.1548ad8d5815f5f4 Pod spec.containers{asb} Normal Pulled kubelet, infranode1.rhte-cloud2.internal Successfully pulled image "registry.access.redhat.com/openshift3/ose-ansible-service-broker:v3.10"
1h 1h 1 asb-1-bgrw9.1548ad8e5bdb1ed6 Pod spec.containers{asb} Normal Pulled kubelet, infranode1.rhte-cloud2.internal Container image "registry.access.redhat.com/openshift3/ose-ansible-service-broker:v3.10" already present on machine
1h 1h 2 asb-1-bgrw9.1548ad8d696467ca Pod spec.containers{asb} Normal Started kubelet, infranode1.rhte-cloud2.internal Started container
1h 1h 2 asb-1-bgrw9.1548ad8d67fc1b5f Pod spec.containers{asb} Normal Created kubelet, infranode1.rhte-cloud2.internal Created container
1h 1h 1 asb-1.1548ad9201e264d7 ReplicationController Normal SuccessfulDelete replication-controller Deleted pod: asb-1-bgrw9
1h 1h 1 asb-1-bgrw9.1548ad920a2ae6c7 Pod spec.containers{asb} Normal Killing kubelet, infranode1.rhte-cloud2.internal Killing container with id cri-o://asb:Need to kill Pod
1h 1h 1 asb.1548ad92017beea1 DeploymentConfig Normal ReplicationControllerScaled deploymentconfig-controller Scaled replication controller "asb-1" from 1 to 0
1h 1h 1 asb-2-deploy.1548adb1198eca60 Pod Normal Scheduled default-scheduler Successfully assigned asb-2-deploy to infranode1.rhte-cloud2.internal
1h 1h 1 asb.1548adb118bebbaf DeploymentConfig Normal DeploymentCreated deploymentconfig-controller Created new replication controller "asb-2" for version 2
1h 1h 1 asb-2-deploy.1548adb1d3e4848c Pod spec.containers{deployment} Normal Pulled kubelet, infranode1.rhte-cloud2.internal Container image "registry.access.redhat.com/openshift3/ose-deployer:v3.10.14" already present on machine
1h 1h 1 asb-2-deploy.1548adb1dc4a78d5 Pod spec.containers{deployment} Normal Created kubelet, infranode1.rhte-cloud2.internal Created container
1h 1h 1 asb-2-deploy.1548adb1dd6536ae Pod spec.containers{deployment} Normal Started kubelet, infranode1.rhte-cloud2.internal Started container
1h 1h 1 asb-2.1548adb1e7e515ab ReplicationController Normal SuccessfulCreate replication-controller Created pod: asb-2-55qj4
1h 1h 1 asb-2-55qj4.1548adb1e8009562 Pod Normal Scheduled default-scheduler Successfully assigned asb-2-55qj4 to infranode1.rhte-cloud2.internal
1h 1h 1 asb-2-55qj4.1548adb24e9faff7 Pod spec.containers{asb} Normal Started kubelet, infranode1.rhte-cloud2.internal Started container
1h 1h 1 asb-2-55qj4.1548adb24d4919f6 Pod spec.containers{asb} Normal Created kubelet, infranode1.rhte-cloud2.internal Created container
1h 1h 1 asb-2-55qj4.1548adb24453b5e6 Pod spec.containers{asb} Normal Pulled kubelet, infranode1.rhte-cloud2.internal Container image "registry.access.redhat.com/openshift3/ose-ansible-service-broker:v3.10" already present on machine
1h 1h 1 asb-2-deploy.1548adb5f5c18b4e Pod spec.containers{deployment} Normal Killing kubelet, infranode1.rhte-cloud2.internal Killing container with id cri-o://deployment:Need to kill Pod
ausearch -m avc -ts recent
returns (infranode1 is the one that's supposed to run asb)
[root@infranode1 ~]# ausearch -m avc -ts recent
<no matches>
could you please share the inventory file used for the installation? Created attachment 1474715 [details]
Ansible hosts file as requested
Ansible hosts file as requested
I cannot reproduce this problem, I've tried on a fresh RHEL 7.5 system: -bash-4.2# oc get --all-namespaces=true pods NAMESPACE NAME READY STATUS RESTARTS AGE default docker-registry-1-726hk 1/1 Running 2 2h default registry-console-1-l6njp 1/1 Running 2 2h default router-1-js98m 1/1 Running 2 2h kube-service-catalog apiserver-fpcrm 1/1 Running 2 2h kube-service-catalog controller-manager-jjfw9 1/1 Running 2 2h kube-system master-api-rhel7 1/1 Running 2 2h kube-system master-controllers-rhel7 1/1 Running 2 2h kube-system master-etcd-rhel7 1/1 Running 2 2h openshift-ansible-service-broker asb-1-fww6m 1/1 Running 3 2h openshift-logging logging-curator-1-h7szp 1/1 Running 2 2h openshift-logging logging-es-data-master-zavcw4k3-4-bq4kq 2/2 Running 0 1m openshift-logging logging-fluentd-mw9fn 1/1 Running 2 2h openshift-logging logging-kibana-1-c5g8d 2/2 Running 4 2h openshift-node sync-975nf 1/1 Running 2 2h openshift-sdn ovs-m9t4g 1/1 Running 2 2h openshift-sdn sdn-9qrth 1/1 Running 2 2h openshift-template-service-broker apiserver-h5l5m 1/1 Running 3 2h openshift-web-console webconsole-7f944b7c85-fzfw7 1/1 Running 4 2h I am using cri-o-1.10.6-1.rhaos3.10.git56d7d9a.el7.x86_64 and atomic-openshift-3.10.27-1.git.0.1695df4.el7.x86_64. I am running with SELinux enforcing. Looks like the problem is somewhere else. If you have a cluster with the problem I can try to log there and investigate the issue there have not gotten any update on this, so closing for now. Please reopen if the issue still persists. |
Description of problem: With 3.10(.14 and .21 installer) the rollout of various pods fails. Affected pods are: - asb (Ansible Service Broker) - kibana - Elasticsearch Version-Release number of selected component (if applicable): OCP 3.10.14 with 3.10.21 Installer How reproducible: Every time Steps to Reproduce: 1. Install OCP with CRI-O enabled 2. Rollout of the above pods fails. 3. Subsequent (manual) rollouts succeeds Actual results: oc get pod --all-namespaces openshift-ansible-service-broker asb-1-deploy 0/1 Error 0 20m openshift-logging logging-es-data-master-fx351ghs-1-deploy 0/1 Error 0 22m openshift-logging logging-kibana-1-deploy 0/1 Error 0 23m It is always these three pods. Seeing this event: 19m 19m 1 logging-es-data-master-fx351ghs-1-8cp9w.154729ce51f5385d Pod spec.containers{proxy} Warning Failed kubelet, infranode1.wk310g.internal Error: container create failed: container_linux.go:341: creating new parent process caused "container_linux.go:1713: running lstat on namespace path \"/proc/0/ns/ipc\" caused \"lstat /proc/0/ns/ipc: no such file or directory\"" The exact same configuration works without fault when using Docker as the runtime. This also happens with or without OCS. Expected results: Additional info: