Bug 1984094 - performance issues due to lost node, pods taking too long to relaunch
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 4.7
Hardware: x86_64
OS: Linux
: 4.7.z
Assignee: Oleg Bulatov
QA Contact: XiuJuan Wang
Depends On: 1972565
Reported: 2021-07-20 16:39 UTC by OpenShift BugZilla Robot
Modified: 2022-10-12 02:33 UTC (History)
15 users (show)

Last Closed: 2021-08-03 17:56:24 UTC
System ID Private Priority Status Summary Last Updated
Github openshift image-registry pull 288 0 None open [release-4.7] Bug 1984094: use apimachinery with HTTP/2 health checks enabled 2021-07-22 11:01:45 UTC
Red Hat Product Errata RHBA-2021:2903 0 None None None 2021-08-03 17:56:49 UTC

Comment 3 XiuJuan Wang 2021-07-28 11:18:49 UTC
Validated on 4.7.0-0.nightly-2021-07-24-034734 aws cluster.
Add toleration to image registry, then two pods schedule to 1 master and 1 worker (3 masters, 3 workers).

Stopped the master and worker, check all the clusteroperator.
Image registry reports to processing in 30s after openshift-apiserver report unconnect. then reschedule successfully after 5 mins.
Could push and pull images from internal registry when it's back.
$ oc get pods
NAME                                  READY   STATUS      RESTARTS   AGE
postgresql-1-deploy                   0/1     Completed   0          11m
postgresql-1-j7d6s                    1/1     Running     0          11m
rails-postgresql-example-1-build      0/1     Completed   0          11m
rails-postgresql-example-1-deploy     0/1     Completed   0          9m36s
rails-postgresql-example-1-gjxl8      1/1     Running     0          8m59s
rails-postgresql-example-1-hook-pre   0/1     Completed   0          9m31s

$oc get co image-registry  -o yaml 
  - lastTransitionTime: "2021-07-28T10:59:51Z"
    message: |-
      Available: The deployment does not have available replicas
      ImagePrunerAvailable: Pruner CronJob has been created
    reason: NoReplicasAvailable
    status: "False"
    type: Available
  - lastTransitionTime: "2021-07-28T10:59:46Z"
    message: 'Progressing: The deployment has not completed'
    reason: DeploymentNotCompleted
    status: "True"
    type: Progressing

$oc get co image-registry  -o yaml 
  - lastTransitionTime: "2021-07-28T11:05:04Z"
    message: |-
      Available: The registry is ready
      ImagePrunerAvailable: Pruner CronJob has been created
    reason: Ready
    status: "True"
    type: Available
  - lastTransitionTime: "2021-07-28T11:05:04Z"
    message: 'Progressing: The registry is ready'
    reason: Ready
    status: "False"
    type: Progressing

$ oc get co 
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.0-0.nightly-2021-07-24-034734   True        False         True       14m
baremetal                                  4.7.0-0.nightly-2021-07-24-034734   True        False         False      162m
cloud-credential                           4.7.0-0.nightly-2021-07-24-034734   True        False         False      169m
cluster-autoscaler                         4.7.0-0.nightly-2021-07-24-034734   True        False         False      162m
config-operator                            4.7.0-0.nightly-2021-07-24-034734   True        False         False      162m
console                                    4.7.0-0.nightly-2021-07-24-034734   True        False         False      147m
csi-snapshot-controller                    4.7.0-0.nightly-2021-07-24-034734   True        False         False      157m
dns                                        4.7.0-0.nightly-2021-07-24-034734   True        False         True       156m
etcd                                       4.7.0-0.nightly-2021-07-24-034734   True        False         True       161m
image-registry                             4.7.0-0.nightly-2021-07-24-034734   True        False         False      10m
ingress                                    4.7.0-0.nightly-2021-07-24-034734   True        False         False      152m
insights                                   4.7.0-0.nightly-2021-07-24-034734   True        False         False      156m
kube-apiserver                             4.7.0-0.nightly-2021-07-24-034734   True        False         True       160m
kube-controller-manager                    4.7.0-0.nightly-2021-07-24-034734   True        False         True       160m
kube-scheduler                             4.7.0-0.nightly-2021-07-24-034734   True        False         True       160m
kube-storage-version-migrator              4.7.0-0.nightly-2021-07-24-034734   True        False         False      151m
machine-api                                4.7.0-0.nightly-2021-07-24-034734   True        False         False      157m
machine-approver                           4.7.0-0.nightly-2021-07-24-034734   True        False         False      162m
machine-config                             4.7.0-0.nightly-2021-07-24-034734   False       False         True       3m15s
marketplace                                4.7.0-0.nightly-2021-07-24-034734   True        False         False      161m
monitoring                                 4.7.0-0.nightly-2021-07-24-034734   False       True          True       7m49s
network                                    4.7.0-0.nightly-2021-07-24-034734   True        True          True       162m
node-tuning                                4.7.0-0.nightly-2021-07-24-034734   True        False         False      161m
openshift-apiserver                        4.7.0-0.nightly-2021-07-24-034734   True        False         True       15m
openshift-controller-manager               4.7.0-0.nightly-2021-07-24-034734   True        False         False      154m
openshift-samples                          4.7.0-0.nightly-2021-07-24-034734   True        False         False      155m
operator-lifecycle-manager                 4.7.0-0.nightly-2021-07-24-034734   True        False         False      161m
operator-lifecycle-manager-catalog         4.7.0-0.nightly-2021-07-24-034734   True        False         False      161m
operator-lifecycle-manager-packageserver   4.7.0-0.nightly-2021-07-24-034734   True        False         False      15m
service-ca                                 4.7.0-0.nightly-2021-07-24-034734   True        False         False      162m
storage                                    4.7.0-0.nightly-2021-07-24-034734   True        True          False      162m

$ oc get pods -o wide -n openshift-image-registry
NAME                                               READY   STATUS        RESTARTS   AGE    IP             NODE                                         NOMINATED NODE   READINESS GATES
cluster-image-registry-operator-679db64c8c-kn647   1/1     Running       0          171m    ip-10-0-138-176.us-east-2.compute.internal   <none>           <none>
image-registry-75fb9858bd-4l7p6                    1/1     Running       0          11m    ip-10-0-165-131.us-east-2.compute.internal   <none>           <none>
image-registry-75fb9858bd-568kj                    1/1     Terminating   0          122m    ip-10-0-178-99.us-east-2.compute.internal    <none>           <none>
image-registry-75fb9858bd-ks9jv                    1/1     Terminating   0          122m    ip-10-0-138-38.us-east-2.compute.internal    <none>           <none>
image-registry-75fb9858bd-l99hf                    1/1     Running       0          11m    ip-10-0-215-198.us-east-2.compute.internal   <none>           <none>

$ oc get node
NAME                                         STATUS     ROLES    AGE    VERSION
ip-10-0-138-176.us-east-2.compute.internal   Ready      master   165m   v1.20.0+558d959
ip-10-0-138-38.us-east-2.compute.internal    NotReady   worker   154m   v1.20.0+558d959
ip-10-0-165-131.us-east-2.compute.internal   Ready      worker   155m   v1.20.0+558d959
ip-10-0-178-99.us-east-2.compute.internal    NotReady   master   165m   v1.20.0+558d959
ip-10-0-207-49.us-east-2.compute.internal    Ready      worker   155m   v1.20.0+558d959
ip-10-0-215-198.us-east-2.compute.internal   Ready      master   165m   v1.20.0+558d959

