Bug 1687292 - [OSP] Sometimes masters fail to get ignition from load balancer vm and got error "dial tcp <LB ip>:22623: i/o timeout"
Summary: [OSP] Sometimes masters fail to get ignition from load balancer vm and got er...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.2.0
Assignee: Eric Duen
QA Contact: Tomas Sedovic
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-11 08:40 UTC by weiwei jiang
Modified: 2019-10-16 06:27 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
CLOSED / CURRENTRELEASE
Clone Of:
Environment:
Last Closed: 2019-10-16 06:27:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:27:49 UTC

Description weiwei jiang 2019-03-11 08:40:15 UTC
Description of problem:
When trying to launch a ocp cluster on openstack with installer, bootstrap and api work well, but sometimes masters instance fail to fetch ignition from load balancer.

The temp machine-config-server on bootstrap work well from the outside of openstack
[openshift@dhcp-140-70 ~]$ curl -k  https://api.wjiang-ocp.shiftstack.com:22623/config/master -I
HTTP/2 200 
content-type: application/json
content-length: 46313
date: Mon, 11 Mar 2019 07:15:23 GMT

Boot log of one master instance:
[  801.234287] ignition[542]: GET https://api.wjiang-ocp.shiftstack.com:22623/config/master: attempt #27
[  831.235304] ignition[542]: GET error: Get https://api.wjiang-ocp.shiftstack.com:22623/config/master: dial tcp 10.0.76.127:22623: i/o timeout



Version-Release number of the following components:
[openshift@dhcp-140-70 installer]$ bin/openshift-install version 
bin/openshift-install unreleased-master-540-g12af0c9b8e6a090c041b19c2fb0c040188607bcb

How reproducible:
Sometimes

Steps to Reproduce:
1. Launch an OCP cluster with installer
2. Check the boot log of bootstrap, api and masters 
3.

Actual results:
Bootstrap and api work well for ignition service.
masters fail to fetch bootstrap config from temp master-config-server

Expected results:
master should also work well

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Flavio Percoco 2019-03-13 11:39:14 UTC
Are you still seeing this? I haven't seen issues related to masters not getting the ignition config in a while

Comment 2 weiwei jiang 2019-03-15 10:02:38 UTC
(In reply to Flavio Percoco from comment #1)
> Are you still seeing this? I haven't seen issues related to masters not
> getting the ignition config in a while

Checked with
[openshift@dhcp-140-70 installer]$ bin/openshift-install version 
bin/openshift-install unreleased-master-560-g974d9b0848866f03d4dd8c577d8b7ef28756a1d5-dirty
built from commit 974d9b0848866f03d4dd8c577d8b7ef28756a1d5

But unfortunately got this https://bugzilla.redhat.com/show_bug.cgi?id=1687241#c2

Comment 3 weiwei jiang 2019-03-15 10:59:20 UTC
Checked again and met https://bugzilla.redhat.com/show_bug.cgi?id=1687241#c3

[openshift@dhcp-140-70 installer]$ bin/openshift-install version 
bin/openshift-install unreleased-master-560-g974d9b0848866f03d4dd8c577d8b7ef28756a1d5
built from commit 974d9b0848866f03d4dd8c577d8b7ef28756a1d5

(openstack) server list --name wjiang
+--------------------------------------+----------------------------+--------+-------------------------------------------------------+-------+----------------+
| ID                                   | Name                       | Status | Networks                                              | Image | Flavor         |
+--------------------------------------+----------------------------+--------+-------------------------------------------------------+-------+----------------+
| 78cdfb63-cd5e-4fc2-8f0c-e14be3e6d91f | wjiang-ocp-fvkd5-master-1  | ACTIVE | wjiang-ocp-fvkd5-openshift=192.168.0.11               | rhcos | ci.m1.medlarge |
| 984783ba-5303-4d35-a26a-fe7e9b784e3d | wjiang-ocp-fvkd5-master-2  | ACTIVE | wjiang-ocp-fvkd5-openshift=192.168.0.5                | rhcos | ci.m1.medlarge |
| 0f4812e9-9f18-4336-b1c8-5a356a90a8e1 | wjiang-ocp-fvkd5-master-0  | ACTIVE | wjiang-ocp-fvkd5-openshift=192.168.0.9                | rhcos | ci.m1.medlarge |
| e4347058-6889-4dc7-a5ad-d98115e468f4 | wjiang-ocp-fvkd5-api       | ACTIVE | wjiang-ocp-fvkd5-openshift=192.168.128.13, 10.0.77.71 | rhcos | ci.m1.medlarge |
| bfb430a2-a6b7-4899-9117-6bb3bca7a181 | wjiang-ocp-fvkd5-bootstrap | ACTIVE | wjiang-ocp-fvkd5-openshift=192.168.0.10               | rhcos | ci.m1.medlarge |
+--------------------------------------+----------------------------+--------+-------------------------------------------------------+-------+----------------+

Comment 4 weiwei jiang 2019-03-25 10:16:37 UTC
After I disable the creation of trunk for masters for upshift openstack, all work well.

DEBUG OpenShift Installer unreleased-master-601-g1c1b2bb6f64b25c3eccacd07f031a3ec5b2ab29d-dirty                                                                                                                                                                                 
DEBUG Built from commit 1c1b2bb6f64b25c3eccacd07f031a3ec5b2ab29d                                                                                                                                                                                                                
INFO Waiting up to 30m0s for the Kubernetes API at https://api.wjiang-ocp.shiftstack.com:6443...                                                                                                                                                                                
DEBUG Still waiting for the Kubernetes API: Get https://api.wjiang-ocp.shiftstack.com:6443/version?timeout=32s: dial tcp 10.0.76.214:6443: connect: connection refused                                                                                                          
DEBUG Still waiting for the Kubernetes API: Get https://api.wjiang-ocp.shiftstack.com:6443/version?timeout=32s: EOF                                                                                                                                                             
DEBUG Still waiting for the Kubernetes API: Get https://api.wjiang-ocp.shiftstack.com:6443/version?timeout=32s: EOF                                                                                                                                                             
DEBUG Still waiting for the Kubernetes API: Get https://api.wjiang-ocp.shiftstack.com:6443/version?timeout=32s: EOF                                                                                                                                                             
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource                                                                                                                                                                                    
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource
DEBUG Still waiting for the Kubernetes API: Get https://api.wjiang-ocp.shiftstack.com:6443/version?timeout=32s: EOF
INFO API v1.12.4+8156b0c up
INFO Waiting up to 30m0s for the bootstrap-complete event...
DEBUG added kube-controller-manager.158f2bc94596490e: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_f9e39ec3-4ee5-11e9-ad40-fa163ef33bb6 became leader                                                                                                                  
DEBUG added kube-scheduler.158f2bc97583d42a: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_f5af79b9-4ee5-11e9-bb52-fa163ef33bb6 became leader                                                                                                                           
DEBUG modified kube-controller-manager.158f2bc94596490e: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_f9e39ec3-4ee5-11e9-ad40-fa163ef33bb6 became leader                                                                                                               
DEBUG modified kube-scheduler.158f2bc97583d42a: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_f5af79b9-4ee5-11e9-bb52-fa163ef33bb6 became leader                                                                                                                        
DEBUG added kube-controller-manager.158f2c018fddb679: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_55e4225b-4ee6-11e9-8463-fa163ef33bb6 became leader                                                                                                                  
DEBUG added kube-scheduler.158f2c01dbc01f68: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_54a353be-4ee6-11e9-bcbf-fa163ef33bb6 became leader                                                                                                                           
DEBUG added openshift-master-controllers.158f2c027a6b2309: controller-manager-rbxq9 became leader
DEBUG added bootstrap-success: Required control plane pods have been created
DEBUG added openshift-master-controllers.158f2c11437d5e36: controller-manager-5lcpm became leader
DEBUG added bootstrap-complete: cluster bootstrapping has completed
INFO Destroying the bootstrap resources...

Comment 5 weiwei jiang 2019-04-01 10:12:18 UTC
One work around here is to use service_port_ip even lb_ip is defined, to make the communication within cluster go through same network.

diff --git a/data/data/openstack/service/main.tf b/data/data/openstack/service/main.tf
index 534762e18..41a494ee1 100644
--- a/data/data/openstack/service/main.tf
+++ b/data/data/openstack/service/main.tf
@@ -200,7 +200,7 @@ $ORIGIN ${var.cluster_domain}.
                                 3600       ; minimum (1 hour)
                                 )
 
-${length(var.lb_floating_ip) == 0 ? "api  IN  A  ${var.service_port_ip}" : "api  IN  A  ${var.lb_floating_ip}"}
+api  IN  A  ${var.service_port_ip}
 ${length(var.lb_floating_ip) == 0 ? "*.apps  IN  A  ${var.service_port_ip}" : "*.apps  IN  A  ${var.lb_floating_ip}"}
 
 bootstrap.${var.cluster_domain}  IN  A  ${var.bootstrap_ip}

Comment 6 weiwei jiang 2019-04-02 11:06:40 UTC
This also block all the routes.

All the routes target to the external ip of load balancer, this make web console not work well, since it require authentication routes.

[openshift@dhcp-140-70 installer]$ oc -n openshift-console logs console-d9d875c95-tww2b 
2019/04/2 10:59:58 cmd/main: cookies are secure!
2019/04/2 11:00:03 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com/oauth/token failed: Head https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2019/04/2 11:00:18 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com/oauth/token failed: Head https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2019/04/2 11:00:33 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com/oauth/token failed: Head https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2019/04/2 11:00:48 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com/oauth/token failed: Head https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
[openshift@dhcp-140-70 installer]$ oc get pods -n openshift-console -o wide 
NAME                         READY   STATUS    RESTARTS   AGE    IP            NODE                                                  NOMINATED NODE
console-d9d875c95-chq7f      0/1     Running   22         105m   10.129.0.28   wjiang-ocp-5hrhk-master-1.wjiang-ocp.shiftstack.com   <none>
console-d9d875c95-tww2b      0/1     Running   22         105m   10.128.0.22   wjiang-ocp-5hrhk-master-0.wjiang-ocp.shiftstack.com   <none>
downloads-77f7688f6c-pjrkp   1/1     Running   0          105m   10.128.0.21   wjiang-ocp-5hrhk-master-0.wjiang-ocp.shiftstack.com   <none>
downloads-77f7688f6c-txp92   1/1     Running   0          105m   10.130.0.20   wjiang-ocp-5hrhk-master-2.wjiang-ocp.shiftstack.com   <none>

Comment 10 weiwei jiang 2019-08-01 06:22:31 UTC
Checked with 4.2.0-0.nightly-2019-07-31-162901,
and this would not be an issue anymore.

Comment 11 Eric Duen 2019-08-05 18:58:00 UTC
eturning to QE to close out since BZ has been validated to work on a nightly.

Comment 12 weiwei jiang 2019-08-06 08:20:54 UTC
Verified on 4.2.0-0.nightly-2019-08-05-223032, looks like it's just an upshift OSP issue.

Comment 13 errata-xmlrpc 2019-10-16 06:27:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.