2066560 – two router pods are in ContainerCreating status when tried to patch ingress-operator with custom error code pages directly

Bug 2066560 - two router pods are in ContainerCreating status when tried to patch ingress-operator with custom error code pages directly

Summary: two router pods are in ContainerCreating status when tried to patch ingress-o...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.12.0
Assignee:	Grant Spence
QA Contact:	Shudi Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-03-22 04:33 UTC by Shudi Li
Modified:	2023-01-17 19:55 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, if a `configmap` that the router deployment depends on is not created, then the router deployment does not progress. With this update, the cluster Operator reports `ingress progressing=true` if the default ingress controller deployment is progressing. This results in users debugging issues with the ingress controller by using the command `oc get co`. (link:https://bugzilla.redhat.com/show_bug.cgi?id=2066560[BZ#2066560*])
Clone Of:
Environment:
Last Closed:	2023-01-17 19:47:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-ingress-operator pull 769	0	None	Merged	Bug 2066560: Make ingress clusteroperator progressing=true when router deployment is rolling out	2022-09-22 22:04:05 UTC
Red Hat Product Errata	RHSA-2022:7399	0	None	None	None	2023-01-17 19:55:25 UTC

Description Shudi Li 2022-03-22 04:33:45 UTC

Description of problem: Without preparing the custom error page(create a customized error page configmap in openshift-config namespace), but patch ingress-operator directly, two router pods are always in ContainerCreating status.


OpenShift release version: 4.11.0-0.nightly-2022-03-20-160505(4.9.0-0.nightly-2022-03-21-144414 has the same issue)


Cluster Platform:


How reproducible:


Steps to Reproduce (in detail):
1.
% oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-03-20-160505   True        False         149m    Cluster version is 4.11.0-0.nightly-2022-03-20-160505
% 

2.
% oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"httpErrorCodePages":{"name":"my-custom-error-code-pages"}}}' --type=merge
ingresscontroller.operator.openshift.io/default patched
% 

3.
% oc  -n openshift-ingress  get pods
NAME                              READY   STATUS              RESTARTS   AGE
router-default-687b85889f-brxjf   0/1     ContainerCreating   0          54m
router-default-687b85889f-qdxrc   0/1     ContainerCreating   0          54m
router-default-84c787c96b-rsmvf   1/1     Running             0          58m
%

Actual results:
Trying to create two router pods

Expected results:
Two new router are created and the two old router pods are deleted

Impact of the problem:


Additional info:



** Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report.  You may also mark the bug private if you wish.

Comment 4 Miciah Dashiel Butler Masters 2022-03-24 21:22:03 UTC

If the configmap doesn't exist, then the deployment's pods won't start until the configmap is created, which is the intended behavior.  In your example, the deployment already had pods, so the deployment controller will leave those pods running until the new pods become ready.  I think this is the appropriate behavior.  What might need to be improved is the status reporting: Instead of reporting Progressing=False, the operator could report Progressing=True while the deployment is in the middle of rolling out the new pods.

Comment 8 Shudi Li 2022-09-19 02:55:59 UTC

Verified it with 4.12.0-0.nightly-2022-09-18-141547, progressing=True when the pod was rolling out
1.
% oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-09-18-141547   True        False         46m     Cluster version is 4.12.0-0.nightly-2022-09-18-141547
%

2.
% oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"httpErrorCodePages":{"name":"my-custom-error-code-pages"}}}' --type=merge
ingresscontroller.operator.openshift.io/default patched
%

3.
 % oc -n openshift-ingress get pods
NAME                               READY   STATUS              RESTARTS   AGE
router-default-5c4c557b74-rczcq    1/1     Running             0          39m
router-default-6f6d9f454f-cfspr    0/1     ContainerCreating   0          96s
router-default-6f6d9f454f-rbvlg    0/1     ContainerCreating   0          96s
router-ocp50074-65c58cfc69-dccwv   1/1     Running             0          33m
% 

4.
% oc -n openshift-ingress-operator get ingresscontroller/default -o yaml | grep Progressing -B2
  - lastTransitionTime: "2022-09-19T01:41:41Z"
    message: LoadBalancer is not progressing
    reason: LoadBalancerNotProgressing
    status: "False"
    type: LoadBalancerProgressing
--
      One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination...
      )
    reason: IngressControllerProgressing
    status: "True"
    type: Progressing
%

5.
% oc -n openshift-ingress get deployment/router-default  -o yaml | grep Progressing -B4
    lastUpdateTime: "2022-09-19T02:47:55Z"
    message: ReplicaSet "router-default-6f6d9f454f" is progressing.
    reason: ReplicaSetUpdated
    status: "True"
    type: Progressing

%

Comment 11 errata-xmlrpc 2023-01-17 19:47:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399

Comment 12 errata-xmlrpc 2023-01-17 19:55:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399

Note You need to log in before you can comment on or make changes to this bug.