Hide Forgot
Description of problem: Restoring the default ingress controller after deletion requires restarting the ingress operator. The ingress operator should recreate the default without user intervention or operator downtime. Version-Release number of selected component (if applicable): How reproducible: oc delete -n openshift-ingress-operator clusteringresses/default Actual results: clusteringresses/default is only recreated if the ingress operator is restarted. Expected results: The default ingress controller should be automatically recreated. Additional info:
I was able to delete and recreate the default ingress controller (aka clusteringress) without restarting the cluster-ingress-operator: $ oc delete clusteringress/default -n openshift-ingress-operator clusteringress.ingress.openshift.io "default" deleted $ oc get clusteringresses -n openshift-ingress-operator No resources found. $ oc get deploy -n openshift-ingress No resources found. $ oc get svc -n openshift-ingress No resources found. $ oc create -f assets/defaults/cluster-ingress.yaml clusteringress.ingress.openshift.io/default created $ oc get clusteringresses -n openshift-ingress-operator NAME AGE default 56s $ oc get deploy -n openshift-ingress NAME READY UP-TO-DATE AVAILABLE AGE router-default 2/2 2 2 61s $ oc get svc -n openshift-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.197.85 aa41d9d6c406e11e9bd5a0e63434e58b-1280454278.us-east-1.elb.amazonaws.com 80:30794/TCP,443:32084/TCP 65s router-internal-default ClusterIP 172.30.139.207 <none> 80/TCP,443/TCP,1936/TCP 65s I verified before/after ingress connectivity by accessing the web console using the hostname from the console route. Keep in mind that propagating DNS names from authoritative name servers to resolvers can take several minutes: $ dig console-openshift-console.apps.danehans.devcluster.openshift.com ; <<>> DiG 9.10.6 <<>> console-openshift-console.apps.danehans.devcluster.openshift.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12580 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;console-openshift-console.apps.danehans.devcluster.openshift.com. IN A ;; ANSWER SECTION: console-openshift-console.apps.danehans.devcluster.openshift.com. 5 IN A 34.199.157.5 console-openshift-console.apps.danehans.devcluster.openshift.com. 5 IN A 18.214.218.55 ;; Query time: 42 msec ;; SERVER: 10.192.20.245#53(10.192.20.245) ;; WHEN: Wed Mar 06 19:26:40 EST 2019 ;; MSG SIZE rcvd: 125 For the "The ingress operator should recreate the default without user intervention" part of the bug, it sounds like this ingress controller should be named "mandatory" or "required" instead of default.
verified with 4.0.0-0.nightly-2019-03-20-153904 and issue has been fixed. The ingresscontroller/default can be recreated automatically after deleting it. $ oc get pod -n openshift-ingress NAME READY STATUS RESTARTS AGE router-default-7cf558bd7f-hj5cm 1/1 Running 0 4h48m router-default-7cf558bd7f-r55hj 1/1 Running 0 4h48m $ oc get svc -n openshift-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.110.219 a74971e0e4b7511e9a8eb0a21cd44590-1114946490.ap-northeast-1.elb.amazonaws.com 80:32276/TCP,443:31791/TCP 4h50m router-internal-default ClusterIP 172.30.180.114 <none> 80/TCP,443/TCP,1936/TCP 4h50m $ oc delete -n openshift-ingress-operator ingresscontroller/default $ oc get pod -n openshift-ingress NAME READY STATUS RESTARTS AGE router-default-7cf558bd7f-rj46k 0/1 ContainerCreating 0 2s router-default-7cf558bd7f-sp8fh 0/1 ContainerCreating 0 2s $ oc get svc -n openshift-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.129.155 a4123fbf24b9e11e980c9069c412f6e1-1870353468.ap-northeast-1.elb.amazonaws.com 80:31146/TCP,443:31551/TCP 65s router-internal-default ClusterIP 172.30.145.212 <none> 80/TCP,443/TCP,1936/TCP 65s As mentioned in Comment 1, since LB changed and DNS propagation need some time so cannot access any route during this time.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758