Description of problem: As title, because the master-restart only support docker commands. openshift v3.10.0-0.29.0 kubernetes v1.10.0+b81c8f8 etcd 3.2.16 How reproducible: Always Steps to Reproduce: 1. Install ocp 3.10 with below config: OCP_3.10_Integrated Automaton Testing Round 1+ Sanity Testing_RPM_RHEL-7.5.0_crio-1.x_OVS-2.9_Overlay2_openshift-sdn_Multitenant_OpenStack_SAML_Ceph RBD_Ceph RBD_Prometheus_Haproxy_iptables_Enabled service catalog HA ansible-2.4.4.0-1.el7ae.noarch openshift v3.10.0-0.29.0 cri-o-1.10.0-1.beta.1.gitc956614.el7.x86_64 openvswitch-2.9.0-15.el7fdp.x86_64 etcd-3.2.15-2.el7.x86_64 2. # master-restart api + [[ -z api ]] + types=("atomic-openshift" "origin") + for type in '"${types[@]}"' + systemctl cat atomic-openshift-master-api.service + for type in '"${types[@]}"' + systemctl cat origin-master-api.service ++ docker ps -l -q --filter label=io.kubernetes.container.name=api + child_container= ++ docker ps -l -q --filter label=openshift.io/component=api --filter label=io.kubernetes.container.name=POD + container= + [[ -z '' ]] + echo 'Component api is already stopped' Component api is already stopped + exit 0 Actual results: As titles Expected results: Master-restart master service should support crio.
After looking the details, I think this is going to be a Containers issue since 1) requires crictl which in under Containers and 2) there are no kube commands here; it is simply killing the container from underneath kubelet assuming the kubelet will restart it.
Lokesh, we are shipping crictl now correct?
relaying comment from Justin Pierce on IRC: ---- If crictl is part of cri-tools, it shipped in 3.9 (cri-tools-1.0.0-2.alpha.0.git653cc8c.el7) . A new version will ship soon for 3.9: cri-tools-1.0.0-3.git8e6013a.el7 . 3.10 is not yet GA. ---- Does that help?
Before the bug fix we can do manual with crictl: export RUNTIME_ENDPOINT=/var/run/crio/crio.sock export IMAGE_ENDPOINT=${RUNTIME_ENDPOINT} crictl --image-endpoint=${IMAGE_ENDPOINT} --runtime-endpoint=${RUNTIME_ENDPOINT} rm api # crictl ps|grep -w api 90a971c50b676 registry.reg-aws.openshift.com:443/openshift3/ose-control-plane@sha256:3804c81531b9de6b67330e9d9357d67875bbd0b33b72e9a4ff118b616bd87089 6 hours ago CONTAINER_RUNNING api 0 # crictl --image-endpoint=${IMAGE_ENDPOINT} --runtime-endpoint=${RUNTIME_ENDPOINT} rm 90a971c50b676 # crictl ps|grep -w controllers 59c8e66d17f3d registry.reg-aws.openshift.com:443/openshift3/ose-control-plane@sha256:3804c81531b9de6b67330e9d9357d67875bbd0b33b72e9a4ff118b616bd87089 6 hours ago CONTAINER_RUNNING controllers 0 # crictl --image-endpoint=${IMAGE_ENDPOINT} --runtime-endpoint=${RUNTIME_ENDPOINT} rm 59c8e66d17f3d 59c8e66d17f3d
Manual walkaround for QE test in cri-o env: export RUNTIME_ENDPOINT=/var/run/crio/crio.sock export IMAGE_ENDPOINT=${RUNTIME_ENDPOINT} # crictl ps|grep -w api # crictl ps|grep -w controllers # crictl --image-endpoint=${IMAGE_ENDPOINT} --runtime-endpoint=${RUNTIME_ENDPOINT} rm ${api_container_id} # crictl --image-endpoint=${IMAGE_ENDPOINT} --runtime-endpoint=${RUNTIME_ENDPOINT} rm ${controllers_container_id}
master-logs does't work in CRIO too. In /usr/local/bin/master-logs also use docker command [root@ip-172-18-13-30 node]# master-logs controllers controllers Component controllers is stopped or not running
Since there's a workaround for this I'd wait for crictl to show up for 3.10 meanwhile we'll work to modify the script to use crictl? Which script is that?
(In reply to Antonio Murdaca from comment #8) > Since there's a workaround for this I'd wait for crictl to show up for 3.10 > meanwhile we'll work to modify the script to use crictl? Which script is > that? It's in https://github.com/openshift/openshift-ansible/tree/master/roles/openshift_control_plane/files/scripts/docker
*** Bug 1574660 has been marked as a duplicate of this bug. ***
(In reply to Antonio Murdaca from comment #8) > Since there's a workaround for this I'd wait for crictl to show up for 3.10 > meanwhile we'll work to modify the script to use crictl? Which script is > that? It's in 3.10 OCP repos already, just need to ensure that it's being installed when necessary.
(In reply to Scott Dodson from comment #11) > (In reply to Antonio Murdaca from comment #8) > > Since there's a workaround for this I'd wait for crictl to show up for 3.10 > > meanwhile we'll work to modify the script to use crictl? Which script is > > that? > > It's in 3.10 OCP repos already, just need to ensure that it's being > installed when necessary. alright, so we need to make sure it's installed with CRI-O (or in the installer) and we need to modify the scripts to use crictl as well. I'll work on porting the scripts to crictl.
I've opened https://github.com/openshift/openshift-ansible/pull/8457 Scott, can you take a look at that? Also if someone can try the scripts on a live system, that would help as well. We need to make sure to install crictl at installation time (or as a Requires of CRI-O). We also need to ship an /etc/crictl.yaml when installing cri-o (I believe we already do that) so that crictl needs no additional flags/interactions to talk to cri-o.
*** Bug 1587860 has been marked as a duplicate of this bug. ***
Please note that crictl isn't available on Atomic Host yet. At least the pr is going to block all the upgrades against Atomic Host.
(In reply to Gan Huang from comment #15) > Please note that crictl isn't available on Atomic Host yet. Atomic isn't supporting CRI-O rpm installation afaict and we're still supporting the docker scripts for docker based deployments > > At least the pr is going to block all the upgrades against Atomic Host. Moving to modified for qe to test it out as the PR from Scott is merged
https://github.com/openshift/openshift-ansible/pull/8661 is the PR that included these changes
checked on v3.10.0-0.67.0 with openshift-ansible-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch.rpm openshift-ansible-docs-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch.rpm openshift-ansible-playbooks-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch.rpm openshift-ansible-roles-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch.rpm And the issue is fixed now. // master-restart command # master-restart api W0614 06:27:51.675813 26468 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock". # master-restart controllers W0614 06:28:42.743307 27403 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock". # master-restart etcd W0614 06:29:34.421796 28257 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock". # crictl pods W0614 06:30:07.656050 28775 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock". POD ID CREATED STATE NAME NAMESPACE ATTEMPT 5092b42f2c4dd 21 seconds ago SANDBOX_READY master-etcd-qe-wjiang-310-crio-master-etcd-1 kube-system 1 1bf2105ffca9f About a minute ago SANDBOX_READY master-controllers-qe-wjiang-310-crio-master-etcd-1 kube-system 1 ab0c4ea0cbca0 2 minutes ago SANDBOX_READY master-api-qe-wjiang-310-crio-master-etcd-1 kube-system 2 // master-logs command # master-logs api api 2>&1| tail -n 3 ERROR: logging before flag.Parse: I0614 10:32:32.667050 1 pathrecorder.go:247] kube-aggregator: "/api/v1/namespaces/kube-system/configmaps/openshift-master-controllers" satisfied by prefix /api/ ERROR: logging before flag.Parse: I0614 10:32:32.667073 1 handler.go:149] kube-apiserver: PUT "/api/v1/namespaces/kube-system/configmaps/openshift-master-controllers" satisfied by gorestful with webservice /api/v1 ERROR: logging before flag.Parse: I0614 10:32:32.670376 1 wrap.go:42] PUT /api/v1/namespaces/kube-system/configmaps/openshift-master-controllers: (3.731105ms) 200 [[openshift/v1.10.0+b81c8f8 (linux/amd64) kubernetes/b81c8f8] 10.240.0.50:39892] # master-logs controllers controllers 2>&1| tail -n 3 ERROR: logging before flag.Parse: I0614 10:32:41.795895 1 graph_builder.go:603] GraphBuilder process object: v1/ConfigMap, namespace kube-system, name kube-scheduler, uid 0bd8dfb6-6f85-11e8-8175-42010af00032, event type update ERROR: logging before flag.Parse: I0614 10:32:41.796240 1 leaderelection.go:199] successfully renewed lease kube-system/kube-scheduler ERROR: logging before flag.Parse: I0614 10:32:41.969651 1 garbagecollector.go:185] no resource updates from discovery, skipping garbage collector sync # master-logs etcd etcd 2>&1| tail -n 3 2018-06-14 10:30:30.542813 I | embed: ready to serve client requests 2018-06-14 10:30:30.543167 I | embed: serving client requests on 10.240.0.50:2379 WARNING: 2018/06/14 10:30:30 Failed to dial 10.240.0.50:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry. // master-exec command # master-exec api api date W0614 06:25:14.594749 24843 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock". Thu Jun 14 10:25:14 UTC 2018 # master-exec controllers controllers date W0614 06:25:25.889748 24991 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock". Thu Jun 14 10:25:25 UTC 2018 # master-exec etcd etcd date W0614 06:25:40.124766 25161 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock". Thu Jun 14 10:25:40 UTC 2018