Bug 1686204

Summary:	Restoring the default ingress controller requires operator restart
Product:	OpenShift Container Platform	Reporter:	Dan Mace <dmace>
Component:	Networking	Assignee:	Dan Mace <dmace>
Networking sub component:	router	QA Contact:	Hongan Li <hongli>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	unspecified	CC:	aos-bugs, dhansen
Version:	4.1.0
Target Milestone:	---
Target Release:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-06-04 10:45:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Dan Mace 2019-03-06 22:34:20 UTC

Description of problem:

Restoring the default ingress controller after deletion requires restarting the ingress operator. The ingress operator should recreate the default without user intervention or operator downtime.

Version-Release number of selected component (if applicable):


How reproducible:

oc delete -n openshift-ingress-operator clusteringresses/default


Actual results:

clusteringresses/default is only recreated if the ingress operator is restarted.

Expected results:

The default ingress controller should be automatically recreated.

Additional info:

Comment 1 Daneyon Hansen 2019-03-07 00:39:14 UTC

I was able to delete and recreate the default ingress controller (aka clusteringress) without restarting the cluster-ingress-operator:

$ oc delete clusteringress/default -n openshift-ingress-operator
clusteringress.ingress.openshift.io "default" deleted

$ oc get clusteringresses -n openshift-ingress-operator
No resources found.

$ oc get deploy -n openshift-ingress
No resources found.

$ oc get svc -n openshift-ingress
No resources found.
 
$ oc create -f assets/defaults/cluster-ingress.yaml 
clusteringress.ingress.openshift.io/default created

$ oc get clusteringresses -n openshift-ingress-operator
NAME      AGE
default   56s

$ oc get deploy -n openshift-ingress
NAME             READY     UP-TO-DATE   AVAILABLE   AGE
router-default   2/2       2            2           61s

$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP                                                               PORT(S)                      AGE
router-default            LoadBalancer   172.30.197.85    aa41d9d6c406e11e9bd5a0e63434e58b-1280454278.us-east-1.elb.amazonaws.com   80:30794/TCP,443:32084/TCP   65s
router-internal-default   ClusterIP      172.30.139.207   <none>                                                                    80/TCP,443/TCP,1936/TCP      65s

I verified before/after ingress connectivity by accessing the web console using the hostname from the console route. Keep in mind that propagating DNS names from authoritative name servers to resolvers can take several minutes:

$ dig console-openshift-console.apps.danehans.devcluster.openshift.com

; <<>> DiG 9.10.6 <<>> console-openshift-console.apps.danehans.devcluster.openshift.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12580
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;console-openshift-console.apps.danehans.devcluster.openshift.com. IN A

;; ANSWER SECTION:
console-openshift-console.apps.danehans.devcluster.openshift.com. 5 IN A 34.199.157.5
console-openshift-console.apps.danehans.devcluster.openshift.com. 5 IN A 18.214.218.55

;; Query time: 42 msec
;; SERVER: 10.192.20.245#53(10.192.20.245)
;; WHEN: Wed Mar 06 19:26:40 EST 2019
;; MSG SIZE  rcvd: 125


For the "The ingress operator should recreate the default without user intervention" part of the bug, it sounds like this ingress controller should be named "mandatory" or "required" instead of default.

Comment 4 Hongan Li 2019-03-21 06:10:18 UTC

verified with 4.0.0-0.nightly-2019-03-20-153904 and issue has been fixed. The ingresscontroller/default can be recreated automatically after deleting it.


$ oc get pod -n openshift-ingress
NAME                              READY   STATUS    RESTARTS   AGE
router-default-7cf558bd7f-hj5cm   1/1     Running   0          4h48m
router-default-7cf558bd7f-r55hj   1/1     Running   0          4h48m
$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP                                                                    PORT(S)                      AGE
router-default            LoadBalancer   172.30.110.219   a74971e0e4b7511e9a8eb0a21cd44590-1114946490.ap-northeast-1.elb.amazonaws.com   80:32276/TCP,443:31791/TCP   4h50m
router-internal-default   ClusterIP      172.30.180.114   <none>                                                                         80/TCP,443/TCP,1936/TCP      4h50m


$ oc delete -n openshift-ingress-operator ingresscontroller/default


$ oc get pod -n openshift-ingress
NAME                              READY   STATUS              RESTARTS   AGE
router-default-7cf558bd7f-rj46k   0/1     ContainerCreating   0          2s
router-default-7cf558bd7f-sp8fh   0/1     ContainerCreating   0          2s
$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP                                                                    PORT(S)                      AGE
router-default            LoadBalancer   172.30.129.155   a4123fbf24b9e11e980c9069c412f6e1-1870353468.ap-northeast-1.elb.amazonaws.com   80:31146/TCP,443:31551/TCP   65s
router-internal-default   ClusterIP      172.30.145.212   <none>                                                                         80/TCP,443/TCP,1936/TCP      65s

As mentioned in Comment 1, since LB changed and DNS propagation need some time so cannot access any route during this time.

Comment 6 errata-xmlrpc 2019-06-04 10:45:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758