1572440 – 'master-restart' and 'master-logs' does't work in CRIO

Bug 1572440 - 'master-restart' and 'master-logs' does't work in CRIO

Summary: 'master-restart' and 'master-logs' does't work in CRIO

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Containers
Sub Component:
Version:	3.10.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Antonio Murdaca
QA Contact:	ge liu
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1574660 1587860 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-27 02:39 UTC by ge liu
Modified:	2018-07-23 13:11 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-07-10 17:28:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description ge liu 2018-04-27 02:39:30 UTC

Description of problem:

As title, because the master-restart only support docker commands.

openshift v3.10.0-0.29.0
kubernetes v1.10.0+b81c8f8
etcd 3.2.16

How reproducible:
Always

Steps to Reproduce:
1. Install ocp 3.10 with below config:

OCP_3.10_Integrated Automaton Testing Round 1+ Sanity Testing_RPM_RHEL-7.5.0_crio-1.x_OVS-2.9_Overlay2_openshift-sdn_Multitenant_OpenStack_SAML_Ceph RBD_Ceph RBD_Prometheus_Haproxy_iptables_Enabled service catalog HA
ansible-2.4.4.0-1.el7ae.noarch
openshift v3.10.0-0.29.0
cri-o-1.10.0-1.beta.1.gitc956614.el7.x86_64
openvswitch-2.9.0-15.el7fdp.x86_64
etcd-3.2.15-2.el7.x86_64

2. # master-restart api
+ [[ -z api ]]
+ types=("atomic-openshift" "origin")
+ for type in '"${types[@]}"'
+ systemctl cat atomic-openshift-master-api.service
+ for type in '"${types[@]}"'
+ systemctl cat origin-master-api.service
++ docker ps -l -q --filter label=io.kubernetes.container.name=api
+ child_container=
++ docker ps -l -q --filter label=openshift.io/component=api --filter label=io.kubernetes.container.name=POD
+ container=
+ [[ -z '' ]]
+ echo 'Component api is already stopped'
Component api is already stopped
+ exit 0


Actual results:
As titles
Expected results:
Master-restart master service should support crio.

Comment 2 Seth Jennings 2018-05-02 16:43:06 UTC

After looking the details, I think this is going to be a Containers issue since 1) requires crictl which in under Containers and 2) there are no kube commands here; it is simply killing the container from underneath kubelet assuming the kubelet will restart it.

Comment 3 Daniel Walsh 2018-05-03 19:53:37 UTC

Lokesh, we are shipping crictl now correct?

Comment 4 Lokesh Mandvekar 2018-05-03 20:21:02 UTC

relaying comment from Justin Pierce on IRC:
----
If crictl is part of cri-tools, it shipped in 3.9 (cri-tools-1.0.0-2.alpha.0.git653cc8c.el7) . A new version will ship soon for 3.9: cri-tools-1.0.0-3.git8e6013a.el7 . 3.10 is not yet GA.
----

Does that help?

Comment 5 DeShuai Ma 2018-05-11 07:11:36 UTC

Before the bug fix we can do manual with crictl:


export RUNTIME_ENDPOINT=/var/run/crio/crio.sock
export IMAGE_ENDPOINT=${RUNTIME_ENDPOINT}
crictl --image-endpoint=${IMAGE_ENDPOINT} --runtime-endpoint=${RUNTIME_ENDPOINT} rm api

# crictl ps|grep -w api
90a971c50b676       registry.reg-aws.openshift.com:443/openshift3/ose-control-plane@sha256:3804c81531b9de6b67330e9d9357d67875bbd0b33b72e9a4ff118b616bd87089             6 hours ago         CONTAINER_RUNNING   api                       0
# crictl --image-endpoint=${IMAGE_ENDPOINT} --runtime-endpoint=${RUNTIME_ENDPOINT} rm 90a971c50b676

# crictl ps|grep -w controllers
59c8e66d17f3d       registry.reg-aws.openshift.com:443/openshift3/ose-control-plane@sha256:3804c81531b9de6b67330e9d9357d67875bbd0b33b72e9a4ff118b616bd87089             6 hours ago         CONTAINER_RUNNING   controllers               0
# crictl --image-endpoint=${IMAGE_ENDPOINT} --runtime-endpoint=${RUNTIME_ENDPOINT} rm 59c8e66d17f3d
59c8e66d17f3d

Comment 6 DeShuai Ma 2018-05-11 07:18:40 UTC

Manual walkaround for QE test in cri-o env:

export RUNTIME_ENDPOINT=/var/run/crio/crio.sock
export IMAGE_ENDPOINT=${RUNTIME_ENDPOINT}

# crictl ps|grep -w api
# crictl ps|grep -w controllers
# crictl --image-endpoint=${IMAGE_ENDPOINT} --runtime-endpoint=${RUNTIME_ENDPOINT} rm ${api_container_id}
# crictl --image-endpoint=${IMAGE_ENDPOINT} --runtime-endpoint=${RUNTIME_ENDPOINT} rm ${controllers_container_id}

Comment 7 DeShuai Ma 2018-05-21 03:22:14 UTC

master-logs does't work in CRIO too.
In /usr/local/bin/master-logs also use docker command

[root@ip-172-18-13-30 node]# master-logs controllers controllers
Component controllers is stopped or not running

Comment 8 Antonio Murdaca 2018-05-21 09:33:48 UTC

Since there's a workaround for this I'd wait for crictl to show up for 3.10 meanwhile we'll work to modify the script to use crictl? Which script is that?

Comment 9 DeShuai Ma 2018-05-21 09:38:09 UTC

(In reply to Antonio Murdaca from comment #8)
> Since there's a workaround for this I'd wait for crictl to show up for 3.10
> meanwhile we'll work to modify the script to use crictl? Which script is
> that?

It's in https://github.com/openshift/openshift-ansible/tree/master/roles/openshift_control_plane/files/scripts/docker

Comment 10 Antonio Murdaca 2018-05-21 09:38:51 UTC

*** Bug 1574660 has been marked as a duplicate of this bug. ***

Comment 11 Scott Dodson 2018-05-21 12:47:16 UTC

(In reply to Antonio Murdaca from comment #8)
> Since there's a workaround for this I'd wait for crictl to show up for 3.10
> meanwhile we'll work to modify the script to use crictl? Which script is
> that?

It's in 3.10 OCP repos already, just need to ensure that it's being installed when necessary.

Comment 12 Antonio Murdaca 2018-05-21 12:58:11 UTC

(In reply to Scott Dodson from comment #11)
> (In reply to Antonio Murdaca from comment #8)
> > Since there's a workaround for this I'd wait for crictl to show up for 3.10
> > meanwhile we'll work to modify the script to use crictl? Which script is
> > that?
> 
> It's in 3.10 OCP repos already, just need to ensure that it's being
> installed when necessary.

alright, so we need to make sure it's installed with CRI-O (or in the installer) and we need to modify the scripts to use crictl as well. I'll work on porting the scripts to crictl.

Comment 13 Antonio Murdaca 2018-05-21 13:46:32 UTC

I've opened https://github.com/openshift/openshift-ansible/pull/8457

Scott, can you take a look at that? Also if someone can try the scripts on a live system, that would help as well.

We need to make sure to install crictl at installation time (or as a Requires of CRI-O). We also need to ship an /etc/crictl.yaml when installing cri-o (I believe we already do that) so that crictl needs no additional flags/interactions to talk to cri-o.

Comment 14 Scott Dodson 2018-06-06 18:33:19 UTC

*** Bug 1587860 has been marked as a duplicate of this bug. ***

Comment 15 Gan Huang 2018-06-07 06:52:46 UTC

Please note that crictl isn't available on Atomic Host yet.

At least the pr is going to block all the upgrades against Atomic Host.

Comment 16 Antonio Murdaca 2018-06-12 15:13:49 UTC

(In reply to Gan Huang from comment #15)
> Please note that crictl isn't available on Atomic Host yet.

Atomic isn't supporting CRI-O rpm installation afaict and we're still supporting the docker scripts for docker based deployments

> 
> At least the pr is going to block all the upgrades against Atomic Host.

Moving to modified for qe to test it out as the PR from Scott is merged

Comment 17 Scott Dodson 2018-06-12 15:15:34 UTC

https://github.com/openshift/openshift-ansible/pull/8661 is the PR that included these changes

Comment 19 weiwei jiang 2018-06-14 10:34:41 UTC

checked on v3.10.0-0.67.0 with
openshift-ansible-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch.rpm
openshift-ansible-docs-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch.rpm
openshift-ansible-playbooks-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch.rpm
openshift-ansible-roles-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch.rpm

And the issue is fixed now.


// master-restart command

# master-restart api
W0614 06:27:51.675813   26468 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".
# master-restart controllers
W0614 06:28:42.743307   27403 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".
# master-restart etcd                                                                                                                                                    
W0614 06:29:34.421796   28257 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock". 

# crictl pods
W0614 06:30:07.656050   28775 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".
POD ID              CREATED              STATE               NAME                                                  NAMESPACE                           ATTEMPT
5092b42f2c4dd       21 seconds ago       SANDBOX_READY       master-etcd-qe-wjiang-310-crio-master-etcd-1          kube-system                         1
1bf2105ffca9f       About a minute ago   SANDBOX_READY       master-controllers-qe-wjiang-310-crio-master-etcd-1   kube-system                         1
ab0c4ea0cbca0       2 minutes ago        SANDBOX_READY       master-api-qe-wjiang-310-crio-master-etcd-1           kube-system                         2


// master-logs command

# master-logs api api 2>&1| tail -n 3
ERROR: logging before flag.Parse: I0614 10:32:32.667050       1 pathrecorder.go:247] kube-aggregator: "/api/v1/namespaces/kube-system/configmaps/openshift-master-controllers" satisfied by prefix /api/
ERROR: logging before flag.Parse: I0614 10:32:32.667073       1 handler.go:149] kube-apiserver: PUT "/api/v1/namespaces/kube-system/configmaps/openshift-master-controllers" satisfied by gorestful with webservice /api/v1
ERROR: logging before flag.Parse: I0614 10:32:32.670376       1 wrap.go:42] PUT /api/v1/namespaces/kube-system/configmaps/openshift-master-controllers: (3.731105ms) 200 [[openshift/v1.10.0+b81c8f8 (linux/amd64) kubernetes/b81c8f8] 10.240.0.50:39892]
# master-logs controllers controllers 2>&1| tail -n 3
ERROR: logging before flag.Parse: I0614 10:32:41.795895       1 graph_builder.go:603] GraphBuilder process object: v1/ConfigMap, namespace kube-system, name kube-scheduler, uid 0bd8dfb6-6f85-11e8-8175-42010af00032, event type update
ERROR: logging before flag.Parse: I0614 10:32:41.796240       1 leaderelection.go:199] successfully renewed lease kube-system/kube-scheduler
ERROR: logging before flag.Parse: I0614 10:32:41.969651       1 garbagecollector.go:185] no resource updates from discovery, skipping garbage collector sync
# master-logs etcd etcd 2>&1| tail -n 3
2018-06-14 10:30:30.542813 I | embed: ready to serve client requests
2018-06-14 10:30:30.543167 I | embed: serving client requests on 10.240.0.50:2379
WARNING: 2018/06/14 10:30:30 Failed to dial 10.240.0.50:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.


// master-exec command

# master-exec api api date 
W0614 06:25:14.594749   24843 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".
Thu Jun 14 10:25:14 UTC 2018
# master-exec controllers controllers date
W0614 06:25:25.889748   24991 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".
Thu Jun 14 10:25:25 UTC 2018
# master-exec etcd etcd date
W0614 06:25:40.124766   25161 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".
Thu Jun 14 10:25:40 UTC 2018

Note You need to log in before you can comment on or make changes to this bug.