Description of problem: When trying to launch a ocp cluster on openstack with installer, bootstrap and api work well, but sometimes masters instance fail to fetch ignition from load balancer. The temp machine-config-server on bootstrap work well from the outside of openstack [openshift@dhcp-140-70 ~]$ curl -k https://api.wjiang-ocp.shiftstack.com:22623/config/master -I HTTP/2 200 content-type: application/json content-length: 46313 date: Mon, 11 Mar 2019 07:15:23 GMT Boot log of one master instance: [ 801.234287] ignition[542]: GET https://api.wjiang-ocp.shiftstack.com:22623/config/master: attempt #27 [ 831.235304] ignition[542]: GET error: Get https://api.wjiang-ocp.shiftstack.com:22623/config/master: dial tcp 10.0.76.127:22623: i/o timeout Version-Release number of the following components: [openshift@dhcp-140-70 installer]$ bin/openshift-install version bin/openshift-install unreleased-master-540-g12af0c9b8e6a090c041b19c2fb0c040188607bcb How reproducible: Sometimes Steps to Reproduce: 1. Launch an OCP cluster with installer 2. Check the boot log of bootstrap, api and masters 3. Actual results: Bootstrap and api work well for ignition service. masters fail to fetch bootstrap config from temp master-config-server Expected results: master should also work well Additional info: Please attach logs from ansible-playbook with the -vvv flag
Are you still seeing this? I haven't seen issues related to masters not getting the ignition config in a while
(In reply to Flavio Percoco from comment #1) > Are you still seeing this? I haven't seen issues related to masters not > getting the ignition config in a while Checked with [openshift@dhcp-140-70 installer]$ bin/openshift-install version bin/openshift-install unreleased-master-560-g974d9b0848866f03d4dd8c577d8b7ef28756a1d5-dirty built from commit 974d9b0848866f03d4dd8c577d8b7ef28756a1d5 But unfortunately got this https://bugzilla.redhat.com/show_bug.cgi?id=1687241#c2
Checked again and met https://bugzilla.redhat.com/show_bug.cgi?id=1687241#c3 [openshift@dhcp-140-70 installer]$ bin/openshift-install version bin/openshift-install unreleased-master-560-g974d9b0848866f03d4dd8c577d8b7ef28756a1d5 built from commit 974d9b0848866f03d4dd8c577d8b7ef28756a1d5 (openstack) server list --name wjiang +--------------------------------------+----------------------------+--------+-------------------------------------------------------+-------+----------------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+----------------------------+--------+-------------------------------------------------------+-------+----------------+ | 78cdfb63-cd5e-4fc2-8f0c-e14be3e6d91f | wjiang-ocp-fvkd5-master-1 | ACTIVE | wjiang-ocp-fvkd5-openshift=192.168.0.11 | rhcos | ci.m1.medlarge | | 984783ba-5303-4d35-a26a-fe7e9b784e3d | wjiang-ocp-fvkd5-master-2 | ACTIVE | wjiang-ocp-fvkd5-openshift=192.168.0.5 | rhcos | ci.m1.medlarge | | 0f4812e9-9f18-4336-b1c8-5a356a90a8e1 | wjiang-ocp-fvkd5-master-0 | ACTIVE | wjiang-ocp-fvkd5-openshift=192.168.0.9 | rhcos | ci.m1.medlarge | | e4347058-6889-4dc7-a5ad-d98115e468f4 | wjiang-ocp-fvkd5-api | ACTIVE | wjiang-ocp-fvkd5-openshift=192.168.128.13, 10.0.77.71 | rhcos | ci.m1.medlarge | | bfb430a2-a6b7-4899-9117-6bb3bca7a181 | wjiang-ocp-fvkd5-bootstrap | ACTIVE | wjiang-ocp-fvkd5-openshift=192.168.0.10 | rhcos | ci.m1.medlarge | +--------------------------------------+----------------------------+--------+-------------------------------------------------------+-------+----------------+
After I disable the creation of trunk for masters for upshift openstack, all work well. DEBUG OpenShift Installer unreleased-master-601-g1c1b2bb6f64b25c3eccacd07f031a3ec5b2ab29d-dirty DEBUG Built from commit 1c1b2bb6f64b25c3eccacd07f031a3ec5b2ab29d INFO Waiting up to 30m0s for the Kubernetes API at https://api.wjiang-ocp.shiftstack.com:6443... DEBUG Still waiting for the Kubernetes API: Get https://api.wjiang-ocp.shiftstack.com:6443/version?timeout=32s: dial tcp 10.0.76.214:6443: connect: connection refused DEBUG Still waiting for the Kubernetes API: Get https://api.wjiang-ocp.shiftstack.com:6443/version?timeout=32s: EOF DEBUG Still waiting for the Kubernetes API: Get https://api.wjiang-ocp.shiftstack.com:6443/version?timeout=32s: EOF DEBUG Still waiting for the Kubernetes API: Get https://api.wjiang-ocp.shiftstack.com:6443/version?timeout=32s: EOF DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource DEBUG Still waiting for the Kubernetes API: Get https://api.wjiang-ocp.shiftstack.com:6443/version?timeout=32s: EOF INFO API v1.12.4+8156b0c up INFO Waiting up to 30m0s for the bootstrap-complete event... DEBUG added kube-controller-manager.158f2bc94596490e: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_f9e39ec3-4ee5-11e9-ad40-fa163ef33bb6 became leader DEBUG added kube-scheduler.158f2bc97583d42a: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_f5af79b9-4ee5-11e9-bb52-fa163ef33bb6 became leader DEBUG modified kube-controller-manager.158f2bc94596490e: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_f9e39ec3-4ee5-11e9-ad40-fa163ef33bb6 became leader DEBUG modified kube-scheduler.158f2bc97583d42a: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_f5af79b9-4ee5-11e9-bb52-fa163ef33bb6 became leader DEBUG added kube-controller-manager.158f2c018fddb679: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_55e4225b-4ee6-11e9-8463-fa163ef33bb6 became leader DEBUG added kube-scheduler.158f2c01dbc01f68: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_54a353be-4ee6-11e9-bcbf-fa163ef33bb6 became leader DEBUG added openshift-master-controllers.158f2c027a6b2309: controller-manager-rbxq9 became leader DEBUG added bootstrap-success: Required control plane pods have been created DEBUG added openshift-master-controllers.158f2c11437d5e36: controller-manager-5lcpm became leader DEBUG added bootstrap-complete: cluster bootstrapping has completed INFO Destroying the bootstrap resources...
One work around here is to use service_port_ip even lb_ip is defined, to make the communication within cluster go through same network. diff --git a/data/data/openstack/service/main.tf b/data/data/openstack/service/main.tf index 534762e18..41a494ee1 100644 --- a/data/data/openstack/service/main.tf +++ b/data/data/openstack/service/main.tf @@ -200,7 +200,7 @@ $ORIGIN ${var.cluster_domain}. 3600 ; minimum (1 hour) ) -${length(var.lb_floating_ip) == 0 ? "api IN A ${var.service_port_ip}" : "api IN A ${var.lb_floating_ip}"} +api IN A ${var.service_port_ip} ${length(var.lb_floating_ip) == 0 ? "*.apps IN A ${var.service_port_ip}" : "*.apps IN A ${var.lb_floating_ip}"} bootstrap.${var.cluster_domain} IN A ${var.bootstrap_ip}
This also block all the routes. All the routes target to the external ip of load balancer, this make web console not work well, since it require authentication routes. [openshift@dhcp-140-70 installer]$ oc -n openshift-console logs console-d9d875c95-tww2b 2019/04/2 10:59:58 cmd/main: cookies are secure! 2019/04/2 11:00:03 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com/oauth/token failed: Head https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2019/04/2 11:00:18 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com/oauth/token failed: Head https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2019/04/2 11:00:33 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com/oauth/token failed: Head https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2019/04/2 11:00:48 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com/oauth/token failed: Head https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) [openshift@dhcp-140-70 installer]$ oc get pods -n openshift-console -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE console-d9d875c95-chq7f 0/1 Running 22 105m 10.129.0.28 wjiang-ocp-5hrhk-master-1.wjiang-ocp.shiftstack.com <none> console-d9d875c95-tww2b 0/1 Running 22 105m 10.128.0.22 wjiang-ocp-5hrhk-master-0.wjiang-ocp.shiftstack.com <none> downloads-77f7688f6c-pjrkp 1/1 Running 0 105m 10.128.0.21 wjiang-ocp-5hrhk-master-0.wjiang-ocp.shiftstack.com <none> downloads-77f7688f6c-txp92 1/1 Running 0 105m 10.130.0.20 wjiang-ocp-5hrhk-master-2.wjiang-ocp.shiftstack.com <none>
Checked with 4.2.0-0.nightly-2019-07-31-162901, and this would not be an issue anymore.
eturning to QE to close out since BZ has been validated to work on a nightly.
Verified on 4.2.0-0.nightly-2019-08-05-223032, looks like it's just an upshift OSP issue.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922