1666198 – [CRIO] crioctl inconsistent number of pods

Bug 1666198 - [CRIO] crioctl inconsistent number of pods

Summary: [CRIO] crioctl inconsistent number of pods

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	3.9.0
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	3.9.z
Assignee:	Mrunal Patel
QA Contact:	weiwei jiang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1186913
TreeView+	depends on / blocked

Reported:	2019-01-15 07:57 UTC by Franck Grosjean
Modified:	2019-08-08 02:59 UTC (History)
CC List:	14 users (show)
Fixed In Version:	cri-o-1.9.16-2.git858756d.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-07-05 06:58:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	kubernetes-sigs cri-o pull 2077	0	'None'	closed	Increase max grpc message size	2021-02-16 08:49:05 UTC
Red Hat Product Errata	RHBA-2019:1642	0	None	None	None	2019-07-05 06:59:13 UTC

Description Franck Grosjean 2019-01-15 07:57:42 UTC

Description of problem:
The existing pods in a node reported by CRI-O do not match with the ones given by OpenShift API, which are many less.

oc adm manage-node node01 --list-pods  | wc -l
Listing matched pods on node node01 : 27

oc adm manage-node node01 --list-pods  | grep -i run | wc -l
Listing matched pods on node node01: 23

# crictl pods |  grep -i ready | wc -l
Listing matched pods on node node01: 428

Version-Release number of selected component (if applicable):
- Openshift 3.9.51 with crio 1.19.14:
    atomic-openshift-3.9.51-1.git.0.dc3a40b.el7.x86_64
    cri-o-1.9.14-1.git4e220eb.el7.x86_64

How reproducible:
Install Openshift 3.9 with Cri-o and garbage collector enable

Actual results:
Huge amont of pods are seen by crio on node but not by Openshift 

Expected results:
Reporting should match

Additional info:
none

Comment 5 Mrunal Patel 2019-01-20 03:39:19 UTC

Hi, 
I am working on a fix for the grpc message size issue. I think that may be causing cascading issues.

Thanks!

Comment 16 Samuel 2019-03-01 07:59:54 UTC

Hi,

I see in the comments for that PR some mention of not backporting this fix to cri-i 1.9.

Is it really wise? Any rational?

Crio is sold as GA since OCP 3.9.
Right now, we have several production clusters suffering from those issue.


Regards.

Comment 17 Mrunal Patel 2019-03-01 18:19:37 UTC

Samuel, I ported this to cri-o 1.9.15.

Comment 22 weiwei jiang 2019-03-19 04:54:46 UTC

Checked with 
[root@qe-wjiang39-crio-merrn-1 ~]# crictl version 
Version:  0.1.0
RuntimeName:  cri-o
RuntimeVersion:  1.9.15
RuntimeApiVersion:  v1alpha1

[root@qe-wjiang39-crio-merrn-1 ~]# oc version 
oc v3.9.73
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://qe-wjiang39-crio-merrn-1:8443
openshift v3.9.73
kubernetes v1.9.1+a0ce1bc657

[root@qe-wjiang39-crio-merrn-1 ~]# oc get nodes  -o wide 
NAME                       STATUS    ROLES            AGE       VERSION             EXTERNAL-IP   OS-IMAGE                                      KERNEL-VERSION               CONTAINER-RUNTIME
qe-wjiang39-crio-merrn-1   Ready     compute,master   47m       v1.9.1+a0ce1bc657   <none>        Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-957.10.1.el7.x86_64   cri-o://1.9.15


And still got this inconsistent issue.

[root@qe-wjiang39-crio-merrn-1 ~]# crictl pods |wc -l 
1663
[root@qe-wjiang39-crio-merrn-1 ~]# crictl ps 
CONTAINER ID        IMAGE                                                                                                                                                 CREATED             STATE               NAME                     ATTEMPT
2e97bdf6a147e       docker-registry.default.svc:5000/install-test/nodejs-mongodb-example@sha256:73146816e4d2e02162ac45a4b4329d9317cbea0c99674b082671b1a9c708b833          2 hours ago         CONTAINER_RUNNING   nodejs-mongodb-example   0
ece2ae5a966be       brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhscl/mongodb-32-rhel7@sha256:4d5660778ccce1233145830928e289b77d9681ad9ec96b0e279747b7dbc5e5c5   3 hours ago         CONTAINER_RUNNING   mongodb                  0
e1807c726636d       registry.reg-aws.openshift.com:443/openshift3/ose-template-service-broker@sha256:6d9d08631c558be88807113e4f7d6f2b5083fa72f472c2f4a6e871486f677c43     3 hours ago         CONTAINER_RUNNING   c                        0
10f36cdc1f6f2       registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog@sha256:7b7507fc3896aff506adf54e0b068debcb88b1af8efa0620f9b383d561a48478             3 hours ago         CONTAINER_RUNNING   controller-manager       0
b428b1532af8c       registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog@sha256:7b7507fc3896aff506adf54e0b068debcb88b1af8efa0620f9b383d561a48478             3 hours ago         CONTAINER_RUNNING   apiserver                0
cae04244189d2       registry.reg-aws.openshift.com:443/openshift3/ose-web-console@sha256:3d65332942ad073a6c7f5a24761bcbd479651df20147cd7153331fda6cd42246                 3 hours ago         CONTAINER_RUNNING   webconsole               0
e5933071d128a       registry.reg-aws.openshift.com:443/openshift3/registry-console@sha256:af0f6f86d0e0ac546222c86666aa0de7d78c727b2aaf4847d8f99d9cf5fb24b8                3 hours ago         CONTAINER_RUNNING   registry-console         0
b63f87edfb2d0       registry.reg-aws.openshift.com:443/openshift3/ose@sha256:6ea849633fb1358ff726f57c41f89767c49632f5c60a6779db9f8dc51a5d1b86                             3 hours ago         CONTAINER_RUNNING   dockergc                 0
740bf027c8da6       registry.reg-aws.openshift.com:443/openshift3/ose-docker-registry@sha256:e1b2e8b15bd110ac0a86f0a906b32611d9430c303d01d1c2e5b83d2db674f316             3 hours ago         CONTAINER_RUNNING   registry                 0
691d111ebbcff       registry.reg-aws.openshift.com:443/openshift3/ose-haproxy-router@sha256:b91f686500222eeb9d244a561a608077c6ee513a2502ca915ae338dedbe37517              3 hours ago         CONTAINER_RUNNING   router                   0
[root@qe-wjiang39-crio-merrn-1 ~]# oc adm manage-node `hostname` --list-pods 

Listing matched pods on node: qe-wjiang39-crio-merrn-1

NAMESPACE                           NAME                             READY     STATUS      RESTARTS   AGE
default                             docker-registry-1-fcrrj          1/1       Running     0          2h
default                             dockergc-dqjh6                   1/1       Running     0          2h
default                             registry-console-1-nwm8z         1/1       Running     0          2h
default                             router-1-4kpn7                   1/1       Running     0          2h
install-test                        mongodb-1-hmmhw                  1/1       Running     0          2h
install-test                        nodejs-mongodb-example-1-build   0/1       Completed   0          2h
install-test                        nodejs-mongodb-example-1-jbs6p   1/1       Running     0          2h
kube-service-catalog                apiserver-n9wpj                  1/1       Running     0          2h
kube-service-catalog                controller-manager-tz4xh         1/1       Running     0          2h
openshift-ansible-service-broker    asb-1-deploy                     0/1       Error       0          2h
openshift-ansible-service-broker    asb-etcd-1-deploy                0/1       Error       0          2h
openshift-template-service-broker   apiserver-pvmlq                  1/1       Running     0          2h
openshift-web-console               webconsole-5449c78b44-hz24z      1/1       Running     0          2h


[root@qe-wjiang39-crio-merrn-1 ~]# crictl logs $(crictl ps --label io.kubernetes.container.name=dockergc --quiet) 2>&1| tail -n 10
I0319 04:48:50.143501       1 dockergc.go:150] gathering disk usage data
I0319 04:48:50.148970       1 dockergc.go:159] usage is under high threshold (15180MB < 36276MB)
I0319 04:49:50.149305       1 dockergc.go:150] gathering disk usage data
I0319 04:49:50.154950       1 dockergc.go:159] usage is under high threshold (15180MB < 36276MB)
I0319 04:50:50.155239       1 dockergc.go:150] gathering disk usage data
I0319 04:50:50.157362       1 dockergc.go:159] usage is under high threshold (15183MB < 36276MB)
I0319 04:51:50.157533       1 dockergc.go:150] gathering disk usage data
I0319 04:51:50.159449       1 dockergc.go:159] usage is under high threshold (15170MB < 36276MB)
I0319 04:52:50.159778       1 dockergc.go:150] gathering disk usage data
I0319 04:52:50.161582       1 dockergc.go:159] usage is under high threshold (15180MB < 36276MB)


So how to deal with this?

Comment 23 Samuel 2019-03-19 09:49:48 UTC

FYI, customer is letting me know they've been discussing that case over there: https://access.redhat.com/support/cases/#/case/02335236

They haven't tried upgrading to cri-o 1.9.15 yet. I probably won't be able to convince them to go further right now, considering #22.

AFAIU, their large amounts of rogue Pods could be somewhat related to openshift users having a few CronJobs.
Which I can confirm on my own OKD 3.11 / crio 1.11.7-1: I have several pods related to my LDAP group sync job, all of which show as Completed, or no longer show from oc point of view, yet are still listed by crictl pods:

POD ID              CREATED             STATE               NAME                                                              NAMESPACE                    ATTEMPT
f648111d1084a       5 minutes ago       NotReady            sync-ldap-groups-1552986000-z44z5                                 utgb-infra                   0
d5f41239b8364       About an hour ago   NotReady            sync-ldap-groups-1552982400-4qt8x                                 utgb-infra                   0
b90619921f8f8       2 hours ago         NotReady            sync-ldap-groups-1552978800-hbrfn                                 utgb-infra                   0
ccfe2a5ba8a09       3 hours ago         NotReady            sync-ldap-groups-1552975200-6qjbn                                 utgb-infra                   0
d5e1870f066a4       4 hours ago         NotReady            sync-ldap-groups-1552971600-6x4qf                                 utgb-infra                   0


Then again, CronJob is just a way to reproduce that issue.
I can find inconsistent pods such as the following as well:

POD ID              CREATED             STATE               NAME                                                              NAMESPACE                    ATTEMPT
0774b4ef6a9a4       6 hours ago         NotReady            logging-curator-1552966200-hzlc6                                  openshift-logging            0
b4423ddd6c52f       14 hours ago        NotReady            console-5677c7c58d-cnwvd                                          openshift-console            0
11510d0c64944       14 hours ago        NotReady            webconsole-7df4f9f689-cfth6                                       openshift-web-console        0
eadd65a873e8b       14 hours ago        NotReady            node-exporter-6wkq7                                               openshift-monitoring         4
1da299d964fce       14 hours ago        NotReady            console-5677c7c58d-k9kz2                                          openshift-console            4
1d722435acd74       14 hours ago        NotReady            logging-fluentd-9sm7w                                             openshift-logging            1

As far as I can see, that "STATE" column properly identifies rogue Pods.

Considering I have no more than 85 pods, and 11 nodes (like 10x below the volume of pods per nodes seen with Orange), I'm not convinced it has to do with some grpc buffer overflow.
Still, any theory would be welcome at that point.

Comment 24 Mrunal Patel 2019-03-19 18:28:04 UTC

There are 3 things to check here:
1. What are the settings for kubelet GC for containers -
 max-pods
 maximum-dead-containers
 maximum-dead-containers-per-container

2. Can we enable cri-o debug and see if we are getting any remove pod sandbox calls from the kubelet for the pods that are piling up in NOT_READY status?
If so is there any failure being logged? 


3. Is kubelet issuing remove pod sandbox calls to CRI-O for these pods? If so, are there any failures being logged?

My suspicion is that either kubelet isn't sending remove calls to CRI-O or CRI-O is failing to remove.

Comment 25 Mrunal Patel 2019-03-19 18:36:01 UTC

Meanwhile, I am backporting another patch which could also be at play here.

Comment 26 Samuel 2019-03-21 14:56:30 UTC

1.

Kubelet settings, as per customers' openshift_node_groups definition:


        - key: kubeletArguments.max-pods
          value: ['40']
[...]
        - key: kubeletArguments.maximum-dead-containers
          value: ['20']
        - key: kubeletArguments.maximum-dead-containers-per-container
          value: ['2']


2.

I could enable debugs on my own setup. Probably not on customers' clusters though.
Setting log_level = "debug" into /etc/crio/crio.conf? Or do you need me to do something else?


3.

I should point out that in the last couple days, I've been trying out some new crio package ( see https://bugzilla.redhat.com/show_bug.cgi?id=1690259 ), which is already spamming my logs (x20 over my average /var/log/messages size).
Anything you want me to grep?
I've got a shitload of errors, that are probably unrelated to that specific case.

Comment 27 Mrunal Patel 2019-03-27 03:58:52 UTC

We just merged https://github.com/kubernetes-sigs/cri-o/pull/2142 with some fixes and will release a new version of CRI-O with it.

Comment 30 Samuel 2019-04-02 13:54:55 UTC

Regarding crio debugs, today, I have the following Pod:

# crictl pods
[...]
b1c11205ffd96       2 hours ago         NotReady            sync-ldap-groups-1554206400-2zrxj                                 utgb-infra             0
[...]


grepping logs matching that ID, I would find the following:


Apr  2 14:00:06 master1 dockerd-current: time="2019-04-02T14:00:06.014449108+02:00" level=error msg="Handler for GET /containers/b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739/json returned error: No such container: b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739"
Apr  2 14:00:06 master1 systemd: Started libcontainer container b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739. 
Apr  2 14:00:06 master1 crio: time="2019-04-02 14:00:06.085805036+02:00" level=debug msg="Received container pid: 84280"
Apr  2 14:00:06 master1 kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Apr  2 14:00:06 master1 kernel: IPv6: ADDRCONF(NETDEV_UP): vetheda14c4f: link is not ready
Apr  2 14:00:06 master1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vetheda14c4f: link becomes ready
Apr  2 14:00:06 master1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Apr  2 14:00:06 master1 kernel: device vetheda14c4f entered promiscuous mode
Apr  2 14:00:06 master1 kernel: type=1700 audit(1554206406.285:7568): dev=vetheda14c4f prom=256 old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
Apr  2 14:00:06 master1 kernel: type=1300 audit(1554206406.285:7568): arch=c000003e syscall=46 success=yes exit=64 a0=31 a1=7ffe18da98a0 a2=0 a3=40 items=0 ppid=9434 pid=9435 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ovs-vswitchd" exe="/usr/sbin/ovs-vswitchd" subj=system_u:system_r:spc_t:s0 key=(null)
Apr  2 14:00:06 master1 kernel: type=1327 audit(1554206406.285:7568): proctitle=6F76732D767377697463686400756E69783A2F7661722F72756E2F6F70656E767377697463682F64622E736F636B002D76636F6E736F6C653A656D6572002D767379736C6F673A657272002D7666696C653A696E666F002D2D6D6C6F636B616C6C002D2D6E6F2D6368646972002D2D6C6F672D66696C653D2F7661722F6C6F67
Apr  2 14:00:06 master1 crio: time="2019-04-02 14:00:06.203241853+02:00" level=info msg="Got pod network {Name:sync-ldap-groups-1554206400-2zrxj Namespace:utgb-infra ID:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739 NetNS:/proc/84280/ns/net PortMappings:[]}"
Apr  2 14:00:06 master1 crio: time="2019-04-02 14:00:06.203342486+02:00" level=info msg="About to add CNI network cni-loopback (type=loopback)"
Apr  2 14:00:06 master1 crio: time="2019-04-02 14:00:06.214424393+02:00" level=info msg="Got pod network {Name:sync-ldap-groups-1554206400-2zrxj Namespace:utgb-infra ID:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739 NetNS:/proc/84280/ns/net PortMappings:[]}"


[...]

Apr  2 14:00:07 master1 crio: time="2019-04-02 14:00:07.365754646+02:00" level=debug msg="RunPodSandboxResponse: &RunPodSandboxResponse{PodSandboxId:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739,}"
Apr  2 14:00:07 master1 crio: time="2019-04-02 14:00:07.366745976+02:00" level=debug msg="PodSandboxStatusRequest &PodSandboxStatusRequest{PodSandboxId:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739,Verbose:false,}"
Apr  2 14:00:07 master1 crio: time="2019-04-02 14:00:07.366904278+02:00" level=debug msg="PodSandboxStatusResponse: &PodSandboxStatusResponse{Status:&PodSandboxStatus{Id:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739,Metadata:&PodSandboxMetadata{Name:sync-ldap-groups-1554206400-2zrxj,Uid:da63293d-553e-11e9-b3a1-52540016f6b8,Namespace:utgb-infra,Attempt:0,},State:SANDBOX_READY,CreatedAt:1554206406080244558,Network:&PodSandboxNetworkStatus{Ip:10.129.1.22,},Linux:&LinuxPodSandboxStatus{Namespaces:&Namespace{Options:&NamespaceOption{Network:POD,Pid:CONTAINER,Ipc:POD,},},},Labels:map[string]string{controller-uid: da4d6e97-553e-11e9-b3a1-52540016f6b8,io.kubernetes.container.name: POD,io.kubernetes.pod.name: sync-ldap-groups-1554206400-2zrxj,io.kubernetes.pod.namespace: utgb-infra,io.kubernetes.pod.uid: da63293d-553e-11e9-b3a1-52540016f6b8,job-name: sync-ldap-groups-1554206400,},Annotations:map[string]string{kubernetes.io/config.seen: 2019-04-02T14:00:05.301532436+02:00,kubernetes.io/config.source: api,openshift.io/scc: restricted,},},Info:map[string]string{},}"
Apr  2 14:00:07 master1 crio: time="2019-04-02 14:00:07.368479200+02:00" level=debug msg="ImageStatusRequest: &ImageStatusRequest{Image:&ImageSpec{Image:docker.io/openshift/jenkins-slave-base-centos7:v3.11,},Verbose:false,}"

[...]

Apr  2 14:00:07 master1 crio: time="2019-04-02 14:00:07.474804782+02:00" level=debug msg="CreateContainerRequest &CreateContainerRequest{PodSandboxId:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739,Config:&ContainerConfig{Metadata:&ContainerMetadata{Name:sync-ldap-groups,Attempt:0,},Image:&ImageSpec{Image:38f3ae11b29463cf07734ca48a056c9794daa488003dee30c64c61fab384f452,},Command:[/bin/bash -c oc adm groups sync --sync-config=/etc/config/ldap-group-sync.yaml --confirm || :  [.....]

Apr  2 14:00:07 master1 crio: time="2019-04-02 14:00:07.485898024+02:00" level=debug msg="pod container state &{State:{Version:1.0.0 ID:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739 Status:running Pid:84280 Bundle:/run/containers/storage/overlay-containers/b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739/userdata Annotations:map[io.kubernetes.cri-o.KubeName:sync-ldap-groups-1554206400-2zrxj io.kubernetes.cri-o.NamespaceOptions:{"pid":1} io.kubernetes.cri-o.PrivilegedRuntime:false io.kubernetes.cri-o.ShmPath:/var/run/containers/storage/overlay-containers/b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739/userdata/shm io.kubernetes.cri-o.ContainerID:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739 io.kubernetes.cri-o.ContainerName:k8s_POD_sync-ldap-groups-1554206400-2zrxj_utgb-infra_da63293d-553e-11e9-b3a1-52540016f6b8_0 io.kubernetes.cri-o.Created:2019-04-02T14:00:05.782991996+02:00 io.kubernetes.cri-o.HostnamePath:/var/run/containers/storage/overlay-containers/b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739/userdata/hostname io.kubernetes.cri-o.Metadata:{"name":"sync-ldap-groups-1554206400-2zrxj","uid":"da63293d-553e-11e9-b3a1-52540016f6b8","namespace":"utgb-infra"} controller-uid:da4d6e97-553e-11e9-b3a1-52540016f6b8 io.kubernetes.container.name:POD io.kubernetes.cri-o.PortMappings:null io.kubernetes.cri-o.ResolvPath:/var/run/containers/storage/overlay-containers/b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739/userdata/resolv.conf kubernetes.io/config.source:api io.kubernetes.cri-o.ContainerType:sandbox io.kubernetes.cri-o.Labels:{"controller-uid":"da4d6e97-553e-11e9-b3a1-52540016f6b8","io.kubernetes.container.name":"POD","io.kubernetes.pod.name":"sync-ldap-groups-1554206400-2zrxj","io.kubernetes.pod.namespace":"utgb-infra","io.kubernetes.pod.uid":"da63293d-553e-11e9-b3a1-52540016f6b8","job-name":"sync-ldap-groups-1554206400"} io.kubernetes.pod.name:sync-ldap-groups-1554206400-2zrxj io.kubernetes.pod.uid:da63293d-553e-11e9-b3a1-52540016f6b8 io.kubernetes.cri-o.HostNetwork:false io.kubernetes.cri-o.LogPath:/var/log/pods/da63293d-553e-11e9-b3a1-52540016f6b8/b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739.log io.kubernetes.cri-o.MountPoint:/var/lib/containers/storage/overlay2/445d60551167ed6831f9611c83571be738049266634a70f1b6a404ea845597ef/merged kubernetes.io/config.seen:2019-04-02T14:00:05.301532436+02:00 io.kubernetes.cri-o.HostName:sync-ldap-groups-1554206400-2zrxj io.kubernetes.cri-o.Name:k8s_sync-ldap-groups-1554206400-2zrxj_utgb-infra_da63293d-553e-11e9-b3a1-52540016f6b8_0 io.kubernetes.cri-o.Namespace:utgb-infra io.kubernetes.cri-o.SandboxID:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739 io.kubernetes.cri-o.SeccompProfilePath: io.kubernetes.cri-o.TrustedSandbox:true job-name:sync-ldap-groups-1554206400 io.kubernetes.cri-o.Annotations:{"kubernetes.io/config.seen":"2019-04-02T14:00:05.301532436+02:00","kubernetes.io/config.source":"api","openshift.io/scc":"restricted"} io.kubernetes.cri-o.CgroupParent:kubepods-besteffort-podda63293d_553e_11e9_b3a1_52540016f6b8.slice io.kubernetes.pod.namespace:utgb-infra openshift.io/scc:restricted]} Created:2019-04-02 12:00:06.080244558 +0000 UTC Started:2019-04-02 14:00:06.115322034 +0200 CEST m=+74219.543908190 Finished:0001-01-01 00:00:00 +0000 UTC ExitCode:0 OOMKilled:false Error:}"

Apr  2 14:00:07 master1 crio: time="2019-04-02 14:00:07.610339786+02:00" level=debug msg="ListPodSandboxResponse &ListPodSandboxResponse{Items:[&PodSandbox{Id:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739,Metadata:&PodSandboxMetadata{Name:sync-ldap-groups-1554206400-2zrxj,Uid:da63293d-553e-11e9-b3a1-52540016f6b8,Namespace:utgb-infra,Attempt:0,},State:SANDBOX_READY,CreatedAt:1554206406080244558,Labels:map[string]string{controller-uid: da4d6e97-553e-11e9-b3a1-52540016f6b8,io.kubernetes.container.name: POD,io.kubernetes.pod.name: sync-ldap-groups-1554206400-2zrxj [....]

Apr  2 14:00:08 master1 crio: time="2019-04-02 14:00:08.057265025+02:00" level=debug msg="ListContainersResponse: &ListContainersResponse{Containers:[&Container{Id:85630c75b695557a44cca9e0f379f6af72e3355d0ef50017db9867430a65cd74,PodSandboxId:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739,Metadata:&ContainerMetadata{Name:sync-ldap-groups,Attempt:0,},Image:&ImageSpec{Image:38f3ae11b29463cf07734ca48a056c9794daa488003dee30c64c61fab384f452,},ImageRef:docker.io/openshift/jenkins-slave-base-centos7@sha256:93d974db4cc92a72d63767f8fe7d28d145bd124493b449e82177476a2fa783c9,State:CONTAINER_RUNNING,CreatedAt:1554206407937015579,Labels:map[string]string{io.kubernetes.container.name: sync-ldap-groups,io.kubernetes.pod.name: sync-ldap-groups-1554206400-2zrxj, [...]

Apr  2 14:00:08 master1 crio: time="2019-04-02 14:00:08.077226094+02:00" level=debug msg="ListPodSandboxResponse &ListPodSandboxResponse{Items:[&PodSandbox{Id:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739,Metadata:&PodSandboxMetadata{Name:sync-ldap-groups-1554206400-2zrxj,Uid:da63293d-553e-11e9-b3a1-52540016f6b8,Namespace:utgb-infra,Attempt:0,},State:SANDBOX_READY,CreatedAt:1554206406080244558,Labels:map[string]string{controller-uid: da4d6e97-553e-11e9-b3a1-52540016f6b8,io.kubernetes.container.name: POD,io.kubernetes.pod.name: sync-ldap-groups-1554206400-2zrxj,io.kubernetes.pod.namespace: utgb-infra,io.kubernetes.pod.uid: da63293d-553e-11e9-b3a1-52540016f6b8,job-name: sync-ldap-groups-1554206400,},Annotations:map[string]string{kubernetes.io/config.seen: 2019-04-02T14:00:05.301532436+02:00,kubernetes.io/config.source: api,openshift.io/scc: restricted,},}],}"
Apr  2 14:00:08 master1 crio: time="2019-04-02 14:00:08.077979786+02:00" level=debug msg="PodSandboxStatusRequest &PodSandboxStatusRequest{PodSandboxId:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739,Verbose:false,}"
Apr  2 14:00:08 master1 crio: time="2019-04-02 14:00:08.078091575+02:00" level=debug msg="PodSandboxStatusResponse: &PodSandboxStatusResponse{Status:&PodSandboxStatus{Id:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739,Metadata:&PodSandboxMetadata{Name:sync-ldap-groups-1554206400-2zrxj,Uid:da63293d-553e-11e9-b3a1-52540016f6b8,Namespace:utgb-infra,Attempt:0,},State:SANDBOX_READY,CreatedAt:1554206406080244558,Network:&PodSandboxNetworkStatus{Ip:10.129.1.22,},Linux:&LinuxPodSandboxStatus{Namespaces:&Namespace{Options:&NamespaceOption{Network:POD,Pid:CONTAINER,Ipc:POD,},},},Labels:map[string]string{controller-uid: da4d6e97-553e-11e9-b3a1-52540016f6b8,io.kubernetes.container.name: POD,io.kubernetes.pod.name: sync-ldap-groups-1554206400-2zrxj,io.kubernetes.pod.namespace: utgb-infra,io.kubernetes.pod.uid: da63293d-553e-11e9-b3a1-52540016f6b8,job-name: sync-ldap-groups-1554206400,},Annotations:map[string]string{kubernetes.io/config.seen: 2019-04-02T14:00:05.301532436+02:00,kubernetes.io/config.source: api,openshift.io/scc: restricted,},},Info:map[string]string{},}"
Apr  2 14:00:08 master1 crio: time="2019-04-02 14:00:08.078635162+02:00" level=debug msg="ListContainersRequest &ListContainersRequest{Filter:&ContainerFilter{Id:,State:nil,PodSandboxId:,LabelSelector:map[string]string{io.kubernetes.pod.uid: da63293d-553e-11e9-b3a1-52540016f6b8,},},}"
Apr  2 14:00:08 master1 crio: time="2019-04-02 14:00:08.078675202+02:00" level=debug msg="no filters were applied, returning full container list"
Apr  2 14:00:08 master1 crio: time="2019-04-02 14:00:08.078776284+02:00" level=debug msg="ListContainersResponse: &ListContainersResponse{Containers:[&Container{Id:85630c75b695557a44cca9e0f379f6af72e3355d0ef50017db9867430a65cd74,PodSandboxId:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739,Metadata:&ContainerMetadata{Name:sync-ldap-groups,Attempt:0,},Image:&ImageSpec{Image:38f3ae11b29463cf07734ca48a056c9794daa488003dee30c64c61fab384f452,},ImageRef:docker.io/openshift/jenkins-slave-base-centos7@sha256:93d974db4cc92a72d63767f8fe7d28d145bd124493b449e82177476a2fa783c9,State:CONTAINER_RUNNING,CreatedAt:1554206407937015579,Labels:map[string]string{io.kubernetes.container.name: sync-ldap-groups,io.kubernetes.pod.name: sync-ldap-groups-1554206400-2zrxj,io.kubernetes.pod.namespace: utgb-infra,io.kubernetes.pod.uid: da63293d-553e-11e9-b3a1-52540016f6b8,},Annotations:map[string]string{io.kubernetes.container.hash: cd18d43f,io.kubernetes.container.restartCount: 0,io.kubernetes.container.terminationMessagePath: /dev/termination-log,io.kubernetes.container.terminationMessagePolicy: File,io.kubernetes.pod.terminationGracePeriod: 30,},}],}"

[...]

Apr  2 14:00:08 master1 origin-node: I0402 14:00:08.083679    9780 kubelet.go:1865] SyncLoop (PLEG): "sync-ldap-groups-1554206400-2zrxj_utgb-infra(da63293d-553e-11e9-b3a1-52540016f6b8)", event: &pleg.PodLifecycleEvent{ID:"da63293d-553e-11e9-b3a1-52540016f6b8", Type:"ContainerStarted", Data:"b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739"}

[...]

Apr  2 14:00:09 master1 crio: time="2019-04-02 14:00:09.101349968+02:00" level=debug msg="ListPodSandboxResponse &ListPodSandboxResponse{Items:[&PodSandbox{Id:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739,Metadata:&PodSandboxMetadata{Name:sync-ldap-groups-1554206400-2zrxj,Uid:da63293d-553e-11e9-b3a1-52540016f6b8,Namespace:utgb-infra,Attempt:0,} [...]

[...]

Apr  2 14:00:11 master1 crio: time="2019-04-02 14:00:11.229781423+02:00" level=info msg="Got pod network {Name:sync-ldap-groups-1554206400-2zrxj Namespace:utgb-infra ID:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739 NetNS:/proc/84280/ns/net PortMappings:[]}"

[...]

Apr  2 14:00:11 master1 crio: time="2019-04-02 14:00:11.419417902+02:00" level=debug msg="event: "/var/run/crio/exits/b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739.ZKTOZZ": RENAME"
Apr  2 14:00:11 master1 crio: time="2019-04-02 14:00:11.419507866+02:00" level=debug msg="event: "/var/run/crio/exits/b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739": CREATE"
Apr  2 14:00:11 master1 crio: time="2019-04-02 14:00:11.419542244+02:00" level=debug msg="container or sandbox exited: b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739"
Apr  2 14:00:11 master1 crio: time="2019-04-02 14:00:11.419575049+02:00" level=debug msg="sandbox exited and found: b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739"
Apr  2 14:00:11 master1 origin-node: I0402 14:00:11.418929    9780 reconciler.go:301] Volume detached for volume "ldap-ca-cert" (UniqueName: "kubernetes.io/configmap/da63293d-553e-11e9-b3a1-52540016f6b8-ldap-ca-cert") on node "master1.friends.intra.example.com" DevicePath ""
Apr  2 14:00:11 master1 origin-node: I0402 14:00:11.418977    9780 reconciler.go:301] Volume detached for volume "ldap-sync-volume" (UniqueName: "kubernetes.io/configmap/da63293d-553e-11e9-b3a1-52540016f6b8-ldap-sync-volume") on node "master1.friends.intra.example.com" DevicePath ""
Apr  2 14:00:11 master1 origin-node: I0402 14:00:11.419009    9780 reconciler.go:301] Volume detached for volume "ldap-group-syncer-token-sn67w" (UniqueName: "kubernetes.io/secret/da63293d-553e-11e9-b3a1-52540016f6b8-ldap-group-syncer-token-sn67w") on node "master1.friends.intra.example.com" DevicePath ""
Apr  2 14:00:11 master1 crio: time="2019-04-02 14:00:11.588246944+02:00" level=debug msg="ListPodSandboxRequest &ListPodSandboxRequest{Filter:&PodSandboxFilter{Id:,State:&PodSandboxStateValue{State:SANDBOX_READY,},LabelSelector:map[string]string{},},}"

[...]

Apr  2 14:00:11 master1 crio: time="2019-04-02 14:00:11.618018786+02:00" level=debug msg="unmounted container "b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739""
Apr  2 14:00:11 master1 crio: time="2019-04-02 14:00:11.618053734+02:00" level=debug msg="StopPodSandboxResponse b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739: &StopPodSandboxResponse{}"
Apr  2 14:00:11 master1 systemd: Removed slice libcontainer container kubepods-besteffort-podda63293d_553e_11e9_b3a1_52540016f6b8.slice.
Apr  2 14:00:12 master1 crio: time="2019-04-02 14:00:12.239509223+02:00" level=debug msg="ListPodSandboxRequest &ListPodSandboxRequest{Filter:nil,}"

[...]

Apr  2 14:00:12 master1 crio: time="2019-04-02 14:00:12.240581861+02:00" level=debug msg="ListPodSandboxResponse &ListPodSandboxResponse{Items:[&PodSandbox{Id:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739,Metadata:&PodSandboxMetadata{Name:sync-ldap-groups-1554206400-2zrxj,Uid:da63293d-553e-11e9-b3a1-52540016f6b8,Namespace:utgb-infra,Attempt:0,} [...]

[...]

Apr  2 14:00:12 master1 crio: time="2019-04-02 14:00:12.243374065+02:00" level=debug msg="ListContainersResponse: &ListContainersResponse{Containers:[&Container{Id:85630c75b695557a44cca9e0f379f6af72e3355d0ef50017db9867430a65cd74,PodSandboxId:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739,Metadata:&ContainerMetadata{Name:sync-ldap-groups,Attempt:0,} [...]

[...]

Apr  2 14:00:12 master1 crio: time="2019-04-02 14:00:12.253872302+02:00" level=debug msg="ListPodSandboxResponse &ListPodSandboxResponse{Items:[&PodSandbox{Id:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739,Metadata:&PodSandboxMetadata{Name:sync-ldap-groups-1554206400-2zrxj,Uid:da63293d-553e-11e9-b3a1-52540016f6b8,Namespace:utgb-infra,Attempt:0,},State:SANDBOX_NOTREADY,CreatedAt:1554206406080244558,Labels:map[string]string{controller-uid: da4d6e97-553e-11e9-b3a1-52540016f6b8,io.kubernetes.container.name: POD,io.kubernetes.pod.name: sync-ldap-groups-1554206400-2zrxj,io.kubernetes.pod.namespace: utgb-infra,io.kubernetes.pod.uid: da63293d-553e-11e9-b3a1-52540016f6b8,job-name: sync-ldap-groups-1554206400,},Annotations:map[string]string{kubernetes.io/config.seen: 2019-04-02T14:00:05.301532436+02:00,kubernetes.io/config.source: api,openshift.io/scc: restricted,},}],}"
Apr  2 14:00:12 master1 crio: time="2019-04-02 14:00:12.254529872+02:00" level=debug msg="PodSandboxStatusRequest &PodSandboxStatusRequest{PodSandboxId:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739,Verbose:false,}"
Apr  2 14:00:12 master1 crio: time="2019-04-02 14:00:12.254662182+02:00" level=debug msg="PodSandboxStatusResponse: &PodSandboxStatusResponse{Status:&PodSandboxStatus{Id:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739,Metadata:&PodSandboxMetadata{Name:sync-ldap-groups-1554206400-2zrxj,Uid:da63293d-553e-11e9-b3a1-52540016f6b8,Namespace:utgb-infra,Attempt:0,},State:SANDBOX_NOTREADY,CreatedAt:1554206406080244558,Network:&PodSandboxNetworkStatus{Ip:10.129.1.22,},Linux:&LinuxPodSandboxStatus{Namespaces:&Namespace{Options:&NamespaceOption{Network:POD,Pid:CONTAINER,Ipc:POD,},},},Labels:map[string]string{controller-uid: da4d6e97-553e-11e9-b3a1-52540016f6b8,io.kubernetes.container.name: POD,io.kubernetes.pod.name: sync-ldap-groups-1554206400-2zrxj,io.kubernetes.pod.namespace: utgb-infra,io.kubernetes.pod.uid: da63293d-553e-11e9-b3a1-52540016f6b8,job-name: sync-ldap-groups-1554206400,},Annotations:map[string]string{kubernetes.io/config.seen: 2019-04-02T14:00:05.301532436+02:00,kubernetes.io/config.source: api,openshift.io/scc: restricted,},},Info:map[string]string{},}"
Apr  2 14:00:12 master1 crio: time="2019-04-02 14:00:12.255311625+02:00" level=debug msg="ListContainersRequest &ListContainersRequest{Filter:&ContainerFilter{Id:,State:nil,PodSandboxId:,LabelSelector:map[string]string{io.kubernetes.pod.uid: da63293d-553e-11e9-b3a1-52540016f6b8,},},}"
Apr  2 14:00:12 master1 crio: time="2019-04-02 14:00:12.255360093+02:00" level=debug msg="no filters were applied, returning full container list"
Apr  2 14:00:12 master1 crio: time="2019-04-02 14:00:12.255472103+02:00" level=debug msg="ListContainersResponse: &ListContainersResponse{Containers:[&Container{Id:85630c75b695557a44cca9e0f379f6af72e3355d0ef50017db9867430a65cd74,PodSandboxId:b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739,Metadata:&ContainerMetadata{Name:sync-ldap-groups,Attempt:0,},Image:&ImageSpec{Image:38f3ae11b29463cf07734ca48a056c9794daa488003dee30c64c61fab384f452,},ImageRef:docker.io/openshift/jenkins-slave-base-centos7@sha256:93d974db4cc92a72d63767f8fe7d28d145bd124493b449e82177476a2fa783c9,State:CONTAINER_EXITED,CreatedAt:1554206407937015579,Labels:map[string]string{io.kubernetes.container.name: sync-ldap-groups,io.kubernetes.pod.name: sync-ldap-groups-1554206400-2zrxj,io.kubernetes.pod.namespace: utgb-infra,io.kubernetes.pod.uid: da63293d-553e-11e9-b3a1-52540016f6b8,},Annotations:map[string]string{io.kubernetes.container.hash: cd18d43f,io.kubernetes.container.restartCount: 0,io.kubernetes.container.terminationMessagePath: /dev/termination-log,io.kubernetes.container.terminationMessagePolicy: File,io.kubernetes.pod.terminationGracePeriod: 30,},}],}"

[...]

Apr  2 14:00:12 master1 origin-node: I0402 14:00:12.261481    9780 kubelet.go:1865] SyncLoop (PLEG): "sync-ldap-groups-1554206400-2zrxj_utgb-infra(da63293d-553e-11e9-b3a1-52540016f6b8)", event: &pleg.PodLifecycleEvent{ID:"da63293d-553e-11e9-b3a1-52540016f6b8", Type:"ContainerDied", Data:"b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739"}
Apr  2 14:00:12 master1 origin-node: W0402 14:00:12.261632    9780 pod_container_deletor.go:75] Container "b1c11205ffd9672b31c567433afff847cb815e80943c8106958b7985d6719739" not found in pod's containers

[...]


Not sure there's anything relevant in there, should I grep for something specific?

Regards.

Comment 31 weiwei jiang 2019-04-03 04:55:28 UTC

(In reply to Mrunal Patel from comment #24)
> There are 3 things to check here:
> 1. What are the settings for kubelet GC for containers -
>  max-pods
>  maximum-dead-containers
>  maximum-dead-containers-per-container
> 
> 2. Can we enable cri-o debug and see if we are getting any remove pod
> sandbox calls from the kubelet for the pods that are piling up in NOT_READY
> status?
> If so is there any failure being logged? 
> 
> 
> 3. Is kubelet issuing remove pod sandbox calls to CRI-O for these pods? If
> so, are there any failures being logged?
> 
> My suspicion is that either kubelet isn't sending remove calls to CRI-O or
> CRI-O is failing to remove.


Checked and still got same result with https://bugzilla.redhat.com/show_bug.cgi?id=1666198#c22

[root@qe-wjiang39crio-master-etcd-1 ~]# oc version 
oc v3.9.74
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://qe-wjiang39crio-master-etcd-1:8443
openshift v3.9.74
kubernetes v1.9.1+a0ce1bc657

[root@qe-wjiang39crio-master-etcd-1 ~]# oc get nodes -o wide 
NAME                                     STATUS    ROLES     AGE       VERSION             EXTERNAL-IP   OS-IMAGE                                      KERNEL-VERSION               CONTAINER-RUNTIME
qe-wjiang39crio-master-etcd-1            Ready     master    2h        v1.9.1+a0ce1bc657   <none>        Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-957.10.1.el7.x86_64   cri-o://1.9.16
qe-wjiang39crio-node-registry-router-1   Ready     compute   2h        v1.9.1+a0ce1bc657   <none>        Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-957.10.1.el7.x86_64   cri-o://1.9.16


[root@qe-wjiang39crio-master-etcd-1 ~]# cat /etc/origin/node/node-config.yaml
kubeletArguments:
  max-pods:
  - '40'
  maximum-dead-containers:
  - '20'
  maximum-dead-containers-per-container:
  - '2'


And in my scenario, all the pods is SANDBOX_READAY status but actually no running container for it.

[root@qe-wjiang39crio-master-etcd-1 ~]# oc adm manage-node qe-wjiang39crio-master-etcd-1 --list-pods

Listing matched pods on node: qe-wjiang39crio-master-etcd-1

NAMESPACE                           NAME                          READY     STATUS    RESTARTS   AGE
default                             dockergc-wxjgd                1/1       Running   0          2h
default                             h-1-4nfnl                     1/1       Running   0          1h
default                             h-1-8cgn6                     1/1       Running   0          1h
default                             h-1-cqqwf                     1/1       Running   0          1h
default                             h-1-lhhmd                     1/1       Running   0          1h
default                             h-1-ztpbn                     1/1       Running   0          1h
default                             registry-console-1-bxp4t      1/1       Running   0          2h
kube-service-catalog                apiserver-qmwm6               1/1       Running   0          2h
kube-service-catalog                controller-manager-drl9d      1/1       Running   0          2h
openshift-ansible-service-broker    asb-etcd-1-9ctf5              1/1       Running   0          2h
openshift-template-service-broker   apiserver-qw9kr               1/1       Running   0          2h
openshift-web-console               webconsole-5d8cf98965-ptl57   1/1       Running   0          2h


[root@qe-wjiang39crio-master-etcd-1 ~]# crictl pods |grep -i -v notready| grep -i ready | wc -l 
195


And no removepodsandboxrequest in crio logs:
[root@qe-wjiang39crio-master-etcd-1 ~]# timeout  300 journalctl -f -u crio|grep -i -E "removepodsandboxrequest"
Terminated

Comment 34 Urvashi Mohnani 2019-05-08 21:30:12 UTC

I am unable to reproduce this locally. Can I please get access to the cluster?

Comment 38 Urvashi Mohnani 2019-06-11 15:19:19 UTC

@wjiang, would it be possible to get access to the cluster to further debug the issue here. I have been unable to reproduce it on my end.

Comment 40 Urvashi Mohnani 2019-06-21 17:24:51 UTC

Fix is in https://github.com/cri-o/cri-o/pull/2516. Will cut a new release and create a new build once it is in.

Comment 41 Urvashi Mohnani 2019-06-24 18:49:03 UTC

The fix has been merged and cri-o 1.9.16 has been cut. Waiting on a new build for it now.

Comment 42 Lokesh Mandvekar 2019-06-24 19:09:43 UTC

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22330999

Comment 44 weiwei jiang 2019-06-28 06:05:34 UTC

Checked with cri-o-1.9.16-2.git858756d.el7.x86_64, this issue is already fixed.


[root@qe-wjiangmaster-etcd-1 ~]# rpm -qa|grep -i cri-o
cri-o-1.9.16-2.git858756d.el7.x86_64
[root@qe-wjiangmaster-etcd-1 ~]# oc get nodes -o wide 
NAME                              STATUS    ROLES     AGE       VERSION             EXTERNAL-IP   OS-IMAGE                                      KERNEL-VERSION               CONTAINER-RUNTIME
qe-wjiangmaster-etcd-1            Ready     master    1h        v1.9.1+a0ce1bc657   <none>        Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-957.21.3.el7.x86_64   cri-o://1.9.17-dev
qe-wjiangnode-registry-router-1   Ready     compute   1h        v1.9.1+a0ce1bc657   <none>        Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-957.21.3.el7.x86_64   cri-o://1.9.17-dev


[root@qe-wjiangmaster-etcd-1 ~]# cat test.sh
#!/bin/env bash

set -x 
oc run h --image=openshift/hello-openshift --replicas=1
sleep 10
for i in {1..30};
do
	oc scale dc/h --replicas=100
	sleep 5
	oc scale dc/h --replicas=1
	sleep 5
done

After run the testing script and wait not more than 30s.
[root@qe-wjiangmaster-etcd-1 ~]# crictl pods 
PODSANDBOX ID       CREATED             STATE               NAME                        NAMESPACE                           ATTEMPT
8bc39b6473d39       12 minutes ago      SANDBOX_READY       h-2-86m95                   default                             0
1e7594f335269       2 hours ago         SANDBOX_READY       asb-etcd-1-s546g            openshift-ansible-service-broker    0
67ff290ead86e       2 hours ago         SANDBOX_READY       apiserver-gzm6j             openshift-template-service-broker   0
63c9648a804ed       2 hours ago         SANDBOX_READY       controller-manager-7l8rv    kube-service-catalog                0
4f7dbadff702c       2 hours ago         SANDBOX_READY       apiserver-2k4qp             kube-service-catalog                0
2309adaf5d76e       2 hours ago         SANDBOX_READY       registry-console-1-p5cqb    default                             0
699da02fa2a5c       2 hours ago         SANDBOX_READY       dockergc-6pr5j              default                             0
9c83cc70d22ac       2 hours ago         SANDBOX_READY       webconsole-7bf77b6f-26fgm   openshift-web-console               0
[root@qe-wjiangmaster-etcd-1 ~]# oc adm manage-node `hostname` --list-pods -o wide 

Listing matched pods on node: qe-wjiangmaster-etcd-1

NAMESPACE                           NAME                        READY     STATUS    RESTARTS   AGE       IP            NODE
default                             dockergc-6pr5j              1/1       Running   0          1h        10.128.0.4    qe-wjiangmaster-etcd-1
default                             h-2-86m95                   1/1       Running   0          12m       10.128.0.18   qe-wjiangmaster-etcd-1
default                             registry-console-1-p5cqb    1/1       Running   0          1h        10.128.0.5    qe-wjiangmaster-etcd-1
kube-service-catalog                apiserver-2k4qp             1/1       Running   0          1h        10.128.0.6    qe-wjiangmaster-etcd-1
kube-service-catalog                controller-manager-7l8rv    1/1       Running   0          1h        10.128.0.7    qe-wjiangmaster-etcd-1
openshift-ansible-service-broker    asb-etcd-1-s546g            1/1       Running   0          1h        10.128.0.10   qe-wjiangmaster-etcd-1
openshift-template-service-broker   apiserver-gzm6j             1/1       Running   0          1h        10.128.0.9    qe-wjiangmaster-etcd-1
openshift-web-console               webconsole-7bf77b6f-26fgm   1/1       Running   0          1h        10.128.0.2    qe-wjiangmaster-etcd-1

Comment 46 errata-xmlrpc 2019-07-05 06:58:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1642

Note You need to log in before you can comment on or make changes to this bug.