Bug 2066560

Summary:	two router pods are in ContainerCreating status when tried to patch ingress-operator with custom error code pages directly
Product:	OpenShift Container Platform	Reporter:	Shudi Li <shudili>
Component:	Networking	Assignee:	Grant Spence <gspence>
Networking sub component:	router	QA Contact:	Shudi Li <shudili>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	amcdermo, aos-bugs, gspence, hongli, jaldinge, mmasters
Version:	4.11
Target Milestone:	---
Target Release:	4.12.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Previously, if a `configmap` that the router deployment depends on is not created, then the router deployment does not progress. With this update, the cluster Operator reports `ingress progressing=true` if the default ingress controller deployment is progressing. This results in users debugging issues with the ingress controller by using the command `oc get co`. (link:https://bugzilla.redhat.com/show_bug.cgi?id=2066560[BZ#2066560*])	Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-01-17 19:47:48 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Shudi Li 2022-03-22 04:33:45 UTC

Description of problem: Without preparing the custom error page(create a customized error page configmap in openshift-config namespace), but patch ingress-operator directly, two router pods are always in ContainerCreating status.


OpenShift release version: 4.11.0-0.nightly-2022-03-20-160505(4.9.0-0.nightly-2022-03-21-144414 has the same issue)


Cluster Platform:


How reproducible:


Steps to Reproduce (in detail):
1.
% oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-03-20-160505   True        False         149m    Cluster version is 4.11.0-0.nightly-2022-03-20-160505
% 

2.
% oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"httpErrorCodePages":{"name":"my-custom-error-code-pages"}}}' --type=merge
ingresscontroller.operator.openshift.io/default patched
% 

3.
% oc  -n openshift-ingress  get pods
NAME                              READY   STATUS              RESTARTS   AGE
router-default-687b85889f-brxjf   0/1     ContainerCreating   0          54m
router-default-687b85889f-qdxrc   0/1     ContainerCreating   0          54m
router-default-84c787c96b-rsmvf   1/1     Running             0          58m
%

Actual results:
Trying to create two router pods

Expected results:
Two new router are created and the two old router pods are deleted

Impact of the problem:


Additional info:



** Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report.  You may also mark the bug private if you wish.

Comment 4 Miciah Dashiel Butler Masters 2022-03-24 21:22:03 UTC

If the configmap doesn't exist, then the deployment's pods won't start until the configmap is created, which is the intended behavior.  In your example, the deployment already had pods, so the deployment controller will leave those pods running until the new pods become ready.  I think this is the appropriate behavior.  What might need to be improved is the status reporting: Instead of reporting Progressing=False, the operator could report Progressing=True while the deployment is in the middle of rolling out the new pods.

Comment 8 Shudi Li 2022-09-19 02:55:59 UTC

Verified it with 4.12.0-0.nightly-2022-09-18-141547, progressing=True when the pod was rolling out
1.
% oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-09-18-141547   True        False         46m     Cluster version is 4.12.0-0.nightly-2022-09-18-141547
%

2.
% oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"httpErrorCodePages":{"name":"my-custom-error-code-pages"}}}' --type=merge
ingresscontroller.operator.openshift.io/default patched
%

3.
 % oc -n openshift-ingress get pods
NAME                               READY   STATUS              RESTARTS   AGE
router-default-5c4c557b74-rczcq    1/1     Running             0          39m
router-default-6f6d9f454f-cfspr    0/1     ContainerCreating   0          96s
router-default-6f6d9f454f-rbvlg    0/1     ContainerCreating   0          96s
router-ocp50074-65c58cfc69-dccwv   1/1     Running             0          33m
% 

4.
% oc -n openshift-ingress-operator get ingresscontroller/default -o yaml | grep Progressing -B2
  - lastTransitionTime: "2022-09-19T01:41:41Z"
    message: LoadBalancer is not progressing
    reason: LoadBalancerNotProgressing
    status: "False"
    type: LoadBalancerProgressing
--
      One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination...
      )
    reason: IngressControllerProgressing
    status: "True"
    type: Progressing
%

5.
% oc -n openshift-ingress get deployment/router-default  -o yaml | grep Progressing -B4
    lastUpdateTime: "2022-09-19T02:47:55Z"
    message: ReplicaSet "router-default-6f6d9f454f" is progressing.
    reason: ReplicaSetUpdated
    status: "True"
    type: Progressing

%

Comment 11 errata-xmlrpc 2023-01-17 19:47:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399

Comment 12 errata-xmlrpc 2023-01-17 19:55:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399