Bug 1750606 - [vsphere][upi] Fail to setup the openshift cluster with OVN network type
Summary: [vsphere][upi] Fail to setup the openshift cluster with OVN network type
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.2.0
Hardware: All
OS: All
high
high
Target Milestone: ---
: 4.4.0
Assignee: Ricardo Carrillo Cruz
QA Contact: zhaozhanqi
URL:
Whiteboard:
: 1779005 (view as bug list)
Depends On: 1762614
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-10 03:15 UTC by zhaozhanqi
Modified: 2023-09-14 05:43 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-20 17:19:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
kube_apiserver_exited (1.22 MB, text/plain)
2019-09-16 09:47 UTC, Ricardo Carrillo Cruz
no flags Details
kube_apiserver_running (14.61 MB, text/plain)
2019-09-16 09:49 UTC, Ricardo Carrillo Cruz
no flags Details

Comment 4 Ricardo Carrillo Cruz 2019-09-12 10:53:11 UTC
kube-apiserver is unavailable:

[rcarrillo@edu-playground ~]$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
cloud-credential                           4.2.0-0.nightly-2019-09-12-010607   True        False         False      156m
dns                                        4.2.0-0.nightly-2019-09-12-010607   True        False         False      155m
insights                                   4.2.0-0.nightly-2019-09-12-010607   True        True          False      156m
kube-apiserver                             4.2.0-0.nightly-2019-09-12-010607   False       True          False      156m
kube-controller-manager                    4.2.0-0.nightly-2019-09-12-010607   False       True          False      156m
kube-scheduler                             4.2.0-0.nightly-2019-09-12-010607   False       True          False      156m
machine-api                                4.2.0-0.nightly-2019-09-12-010607   True        False         False      156m
machine-config                             4.2.0-0.nightly-2019-09-12-010607   True        False         False      155m
network                                                                        False       True          False      157m
openshift-apiserver                        4.2.0-0.nightly-2019-09-12-010607   Unknown     Unknown       False      156m
openshift-controller-manager                                                   False       True          False      156m
operator-lifecycle-manager                 4.2.0-0.nightly-2019-09-12-010607   True        True          False      155m
operator-lifecycle-manager-catalog         4.2.0-0.nightly-2019-09-12-010607   True        True          False      155m
operator-lifecycle-manager-packageserver                                       False       True          False      155m
service-ca                                 4.2.0-0.nightly-2019-09-12-010607   True        True          False      156m

Getting logs for kube-apiserver shows:

[rcarrillo@edu-playground ~]$ oc -n openshift-kube-apiserver logs kube-apiserver-control-plane-0 -c kube-apiserver-2
...
...
I0912 10:50:17.116092       1 controller.go:107] OpenAPI AggregationController: Processing item v1.packages.operators.coreos.com
W0912 10:50:17.116167       1 handler_proxy.go:91] no RequestInfo found in the context
E0912 10:50:17.116203       1 controller.go:114] loading OpenAPI spec for "v1.packages.operators.coreos.com" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0912 10:50:17.116220       1 controller.go:127] OpenAPI AggregationController: action for item v1.packages.operators.coreos.com: Rate Limited Requeue.
E0912 10:50:17.586317       1 reflector.go:126] github.com/openshift/client-go/user/informers/externalversions/factory.go:101: Failed to list *v1.Group: the server could not find the requested resource (get groups.user.openshift.io)
E0912 10:50:17.710882       1 reflector.go:126] github.com/openshift/client-go/oauth/informers/externalversions/factory.go:101: Failed to list *v1.OAuthClient: the server could not find the requested resource (get oauthclients.oauth.openshift.io)
E0912 10:50:18.587510       1 reflector.go:126] github.com/openshift/client-go/user/informers/externalversions/factory.go:101: Failed to list *v1.Group: the server could not find the requested resource (get groups.user.openshift.io)
E0912 10:50:18.711939       1 reflector.go:126] github.com/openshift/client-go/oauth/informers/externalversions/factory.go:101: Failed to list *v1.OAuthClient: the server could not find the requested resource (get oauthclients.oauth.openshift.io)


It seems stuck trying to reach v1.packages.operators.coreos.com, which is an APIService from OLM:

[rcarrillo@edu-playground ~]$ oc describe apiservice v1.packages.operators.coreos.com
Name:         v1.packages.operators.coreos.com
Namespace:    
Labels:       olm.owner=packageserver
              olm.owner.kind=ClusterServiceVersion
              olm.owner.namespace=openshift-operator-lifecycle-manager
Annotations:  <none>
API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
  Creation Timestamp:  2019-09-12T08:04:38Z
  Resource Version:    3856
  Self Link:           /apis/apiregistration.k8s.io/v1/apiservices/v1.packages.operators.coreos.com
  UID:                 f7a8676c-d533-11e9-b25b-0050568b3a10
Spec:
  Ca Bundle:               LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJhRENDQVE2Z0F3SUJBZ0lJSVIyc1ZCbThNaWN3Q2dZSUtvWkl6ajBFQXdJd0dERVdNQlFHQTFVRUNoTU4KVW1Wa0lFaGhkQ3dnU1c1akxqQWVGdzB4T1RBNU1USXdPREUwTVRCYUZ3MHlNVEE1TVRFd09ERTBNVEJhTUJneApGakFVQmdOVkJBb1REVkpsWkNCSVlYUXNJRWx1WXk0d1dUQVRCZ2NxaGtqT1BRSUJCZ2dxaGtqT1BRTUJCd05DCkFBU3E3eW5sOENXUDJGWW52QjJuWVgxVEx3QVBGcHdXQUt4blhoeG1zUUcvV0w3UUlld3BERzR3dTBCWEVDVGYKUTlrSlBYQWhFU1RWVXNmZkFORTh4YW8zbzBJd1FEQU9CZ05WSFE4QkFmOEVCQU1DQW9Rd0hRWURWUjBsQkJZdwpGQVlJS3dZQkJRVUhBd0lHQ0NzR0FRVUZCd01CTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3Q2dZSUtvWkl6ajBFCkF3SURTQUF3UlFJZ1B4NWpDSy9Hd0pCa2R1TXRmQTJmVG1DVVd3cDdhYzNyVFc5S0lrNFBreXNDSVFEVkczUWoKbXhNbEU2OFJybVV5UnlxYUppTWpyUzJncmhIRm9wbGxmZUgzaVE9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
  Group:                   packages.operators.coreos.com
  Group Priority Minimum:  2000
  Service:
    Name:            v1-packages-operators-coreos-com
    Namespace:       openshift-operator-lifecycle-manager
  Version:           v1
  Version Priority:  15
Status:
  Conditions:
    Last Transition Time:  2019-09-12T08:04:38Z
    Message:               endpoints for service/v1-packages-operators-coreos-com in "openshift-operator-lifecycle-manager" have no addresses
    Reason:                MissingEndpoints
    Status:                False
    Type:                  Available
Events:                    <none>

I'm waiting for OLM folks to connect to get assistance and debug this further.

Comment 5 Ricardo Carrillo Cruz 2019-09-12 13:32:57 UTC
Network wise, this is what I've seen:

[rcarrillo@edu-playground ~]$ oc get nodes                                                     
NAME              STATUS                        ROLES    AGE     VERSION 
compute-0         NotReady,SchedulingDisabled   worker   5h25m   v1.14.6+906c7c68c
control-plane-0   Ready,SchedulingDisabled      master   5h25m   v1.14.0+66908432a
[rcarrillo@edu-playground ~]$ oc describe node compute-0                                     
Name:               compute-0             
Roles:              worker                
Labels:             beta.kubernetes.io/arch=amd64                                    
                    beta.kubernetes.io/os=linux       
                    kubernetes.io/arch=amd64                                                   
                    kubernetes.io/hostname=compute-0                               
                    kubernetes.io/os=linux                                                     
                    node-role.kubernetes.io/worker=                                                                                                                                           
                    node.openshift.io/os_id=rhcos                                                                                                                                             
Annotations:        machineconfiguration.openshift.io/currentConfig: rendered-worker-b5b0f80279ccfe32ba7bb15f98da7c1c              
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-b5b0f80279ccfe32ba7bb15f98da7c1c              
                    machineconfiguration.openshift.io/state: Done                                                                                                                             
                    ovn_host_subnet: 10.129.0.0/23                                                                                                                                            
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 12 Sep 2019 08:02:38 +0000           
Taints:             node.kubernetes.io/not-ready:NoExecute
                    node.kubernetes.io/not-ready:NoSchedule
                    node.kubernetes.io/unschedulable:NoSchedule
Unschedulable:      true                     
Conditions:                              
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Thu, 12 Sep 2019 13:27:22 +0000   Thu, 12 Sep 2019 08:17:06 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Thu, 12 Sep 2019 13:27:22 +0000   Thu, 12 Sep 2019 08:17:06 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Thu, 12 Sep 2019 13:27:22 +0000   Thu, 12 Sep 2019 08:17:06 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Thu, 12 Sep 2019 13:27:22 +0000   Thu, 12 Sep 2019 08:17:06 +0000   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPlugin
NotReady message:Network plugin returns error: Missing CNI default network


The compute node shows missing CNI default network.
SSHing to the node and poking at container logs with crictl shows:

[core@compute-0 ~]$ sudo crictl ps
CONTAINER ID        IMAGE                                                              CREATED              STATE               NAME                    ATTEMPT             POD ID
db645356905bf       fee40f617519d527b967a7fdc511a2845cff5542b26187212f3f64d5531e8dcd   17 seconds ago       Running             ovn-node                39                  068960453cbd9
fa4f74237e250       fee40f617519d527b967a7fdc511a2845cff5542b26187212f3f64d5531e8dcd   17 seconds ago       Running             ovn-controller          39                  068960453cbd9
e3b99c98dc1fe       783ce936493c58ae575ad127b1df6ead8c5db95efa5d6520719710f922e63bbb   About a minute ago   Running             kube-multus             32                  0a232e9bd59b9
6759dc87819d8       91248eceeca05dbc758c25a899157188089fdfe8ea60881c69e5076b13a8f4b1   5 hours ago          Running             machine-config-daemon   1                   de2cf6af0bdc7
6af7371e89284       fee40f617519d527b967a7fdc511a2845cff5542b26187212f3f64d5531e8dcd   5 hours ago          Running             ovs-daemons             1                   068960453cbd9
[core@compute-0 ~]$ sudo crictl logs db645356905bf
================== ovnkube.sh --- version: 3 ================
 ==================== command: ovn-node
 =================== hostname: compute-0
 =================== daemonset version 3
 =================== Image built from ovn-kubernetes ref: refs/heads/rhaos-4.2-rhel-7  commit: 33303df05b248b007dc5250fa75469e4ea3eda0f
=============== ovn-node - (wait for ovs)
=============== ovn-node - (wait for ready_to_start_node)
info: Waiting for ready_to_start_node  to come up, waiting 1s ...
info: Waiting for ready_to_start_node  to come up, waiting 5s ...
info: Waiting for ready_to_start_node  to come up, waiting 5s ...
info: Waiting for ready_to_start_node  to come up, waiting 5s ...
info: Waiting for ready_to_start_node  to come up, waiting 5s ...
info: Waiting for ready_to_start_node  to come up, waiting 5s ...
info: Waiting for ready_to_start_node  to come up, waiting 5s ...
[core@compute-0 ~]$ sudo crictl logs 6af7371e89284                                                                                                                                   [284/496]
================== ovnkube.sh --- version: 3 ================                                                                                                                                 
 ==================== command: ovs-server                                                                                                                                                     
 =================== hostname: compute-0                                                                                                                                                      
 =================== daemonset version 3                                                                                                                                                      
 =================== Image built from ovn-kubernetes ref: refs/heads/rhaos-4.2-rhel-7  commit: 33303df05b248b007dc5250fa75469e4ea3eda0f                                                       
Starting ovsdb-server.                                                                                                                                                                        
PMD: net_mlx4: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory                                                                           
PMD: net_mlx4: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx4)                                                                          
net_mlx5: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory                                                                                
net_mlx5: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx5)                                                                               
Configuring Open vSwitch system IDs.                                                                                                                                                          
Enabling remote OVSDB managers.                                                                                                                                                               
Inserting openvswitch module.                                                                                                                                                                 
PMD: net_mlx4: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory                                                                           PMD: net_mlx4: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx4)                                                                          
net_mlx5: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory                                                                                net_mlx5: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx5)                                                                               
Starting ovs-vswitchd.                                                                                                                                                                        Enabling remote OVSDB managers.                                                                                                                                                               
iptables binary not installed, not adding a rule for udp to port 6081.                                                                                                                        ==> /var/log/openvswitch/ovs-vswitchd.log <==                                                                                                                                                 
2019-09-12T08:17:08.260Z|00033|bridge|INFO|bridge br-local: added interface br-local on port 65534                                                                                            2019-09-12T08:17:08.265Z|00034|bridge|INFO|bridge br-local: added interface br-nexthop on port 1                                                                                              
2019-09-12T08:17:08.265Z|00035|bridge|INFO|bridge br-int: using datapath ID 0000066040b8b04e                                                                                                  2019-09-12T08:17:08.265Z|00036|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt"                                                                         
2019-09-12T08:17:08.266Z|00037|bridge|INFO|bridge br-local: using datapath ID 00005a997c8de042                                                                                                2019-09-12T08:17:08.266Z|00038|connmgr|INFO|br-local: added service controller "punix:/var/run/openvswitch/br-local.mgmt"                                                                     
2019-09-12T08:17:08.273Z|00039|bridge|WARN|could not open network device 3917aa0995f1b2b (No such device)                                                                                     2019-09-12T08:17:08.273Z|00040|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.11.0                                                                                                                 
2019-09-12T08:17:08.284Z|00041|bridge|WARN|could not open network device 3917aa0995f1b2b (No such device)                                                                                     2019-09-12T08:17:08.296Z|00042|bridge|WARN|could not open network device 3917aa0995f1b2b (No such device)                                                                                     
                                                                                                                                                                                              ==> /var/log/openvswitch/ovsdb-server.log <==                                                                                                                                                 
2019-09-12T08:17:08.069Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server.log                                                                                                2019-09-12T08:17:08.077Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.11.0                                                                                                           
                                                                                                                                                                                              ==> /var/log/openvswitch/ovs-vswitchd.log <==                                                                                                                                                 
2019-09-12T08:17:15.319Z|00001|ofproto_dpif_xlate(handler7)|WARN|Invalid Geneve tunnel metadata on bridge br-int while processing udp,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:9e:bf:6a,dl_d$t=32:8c:db:81:00:04,nw_src=10.128.0.17,nw_dst=10.129.0.3,nw_tos=0,nw_ecn=0,nw_ttl=63,tp_src=34680,tp_dst=5353                                                                                 
2019-09-12T08:17:18.275Z|00043|memory|INFO|73980 kB peak resident set size after 10.1 seconds                                                                                                 
2019-09-12T08:17:18.275Z|00044|memory|INFO|handlers:2 ports:7 revalidators:2 rules:9 udpif keys:1

==> /var/log/openvswitch/ovsdb-server.log <==
2019-09-12T08:17:18.086Z|00003|memory|INFO|7772 kB peak resident set size after 10.0 seconds
2019-09-12T08:17:18.087Z|00004|memory|INFO|cells:564 json-caches:1 monitors:2 sessions:1

==> /var/log/openvswitch/ovs-vswitchd.log <==
2019-09-12T08:18:30.330Z|00001|ofproto_dpif_xlate(handler6)|WARN|Invalid Geneve tunnel metadata on bridge br-int while processing udp,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:9e:bf:6a,dl_d$
t=32:8c:db:81:00:04,nw_src=10.128.0.17,nw_dst=10.129.0.3,nw_tos=0,nw_ecn=0,nw_ttl=63,tp_src=47357,tp_dst=5353

That last message about invalid tunnel metadata is then printed in loop.
I found by that string a bug in OpenStack https://bugzilla.redhat.com/show_bug.cgi?id=1671347 exhibiting something similar, seems the issue was due to OVS version, but unsure if this is the root cause for this bug.

Comment 6 Dan Williams 2019-09-12 14:59:39 UTC
==> /var/log/openvswitch/ovs-vswitchd.log <==
2019-09-12T08:18:30.330Z|00001|ofproto_dpif_xlate(handler6)|WARN|Invalid Geneve tunnel metadata on bridge br-int while processing udp,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:9e:bf:6a,dl_d$
t=32:8c:db:81:00:04,nw_src=10.128.0.17,nw_dst=10.129.0.3,nw_tos=0,nw_ecn=0,nw_ttl=63,tp_src=47357,tp_dst=5353

Lorenzo, any idea about this?

Comment 7 Ricardo Carrillo Cruz 2019-09-13 16:33:15 UTC
I've been unable to locate a QE resource during EMEA times to create a new cluster.
Will follow-up on Monday, agreed with Lorenzo on internal IRC that we will take a look together then.

Comment 8 Ricardo Carrillo Cruz 2019-09-16 09:47:59 UTC
Created attachment 1615508 [details]
kube_apiserver_exited

Comment 9 Ricardo Carrillo Cruz 2019-09-16 09:49:02 UTC
Created attachment 1615510 [details]
kube_apiserver_running

Comment 10 Ricardo Carrillo Cruz 2019-09-16 09:53:10 UTC
Latest cluster does not exhibit the geneve errors.
Something I'm noting here from exited kube_apiserver:

E0916 01:08:28.364249       1 controller.go:148] Unable to remove old endpoints from kubernetes service: StorageError: key not found, Code: 1, Key: /kubernetes.io/masterleases/139.178.76.8, 
ResourceVersion: 0, AdditionalErrorMsg: 
I0916 01:08:28.394110       1 log.go:172] http: TLS handshake error from [::1]:46818: remote error: tls: bad certificate
I0916 01:08:28.413175       1 log.go:172] http: TLS handshake error from [::1]:46614: remote error: tls: bad certificate
I0916 01:08:28.413278       1 log.go:172] http: TLS handshake error from [::1]:46616: remote error: tls: bad certificate
E0916 01:08:28.428460       1 reflector.go:126] github.com/openshift/client-go/user/informers/externalversions/factory.go:101: Failed to list *v1.Group: the server could not find the requested resource (get groups.user.openshift.io)
E0916 01:08:28.428600       1 reflector.go:126] github.com/openshift/client-go/quota/informers/externalversions/factory.go:101: Failed to list *v1.ClusterResourceQuota: the server could not find the requested resource (get clusterresourcequotas.quota.openshift.io)
E0916 01:08:28.428663       1 reflector.go:126] github.com/openshift/client-go/oauth/informers/externalversions/factory.go:101: Failed to list *v1.OAuthClient: the server could not find the requested resource (get oauthclients.oauth.openshift.io)
E0916 01:08:28.436070       1 reflector.go:126] github.com/openshift/client-go/security/informers/externalversions/factory.go:101: Failed to list *v1.SecurityContextConstraints: the server could not find the requested resource (get securitycontextconstraints.security.openshift.io)


Eventually, the kube apiserver starts showing issues that it cannot connect to etcd:

0916 01:09:15.726794       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{etcd-0.zzhao917.qe.devcluster.openshift.com:2379 <nil>}]
W0916 01:09:15.720350       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {etcd-0.zzhao917.qe.devcluster.openshift.com:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 139.178.76.8:2379: connect: connection refused". Reconnecting...
W0916 01:09:15.720391       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {etcd-0.zzhao917.qe.devcluster.openshift.com:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 139.178.76.8:2379: connect: connection refused". Reconnecting...

Comment 11 zhaozhanqi 2019-09-16 10:25:48 UTC
Found same issue with 'openshift-sdn' network type. do not sure if it caused by the changes of recently.

Comment 13 Ricardo Carrillo Cruz 2019-09-17 14:37:14 UTC
yet another failure installation from Weibin:

https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Launch%20Environment%20Flexy/67810/console

Seems like installation on vsphere is flaky, making it really hard to debug this ticket.

Comment 15 Ricardo Carrillo Cruz 2019-09-18 13:19:36 UTC
We agreed to get a must-gather on these clusters, but I'm unable to, they timeout as the scheduler and API is never functional.

Comment 16 Ricardo Carrillo Cruz 2019-09-18 13:22:34 UTC
Install logs fail with:

module.bootstrap.vsphere_virtual_machine.vm: Destroying... (ID: 420b07b7-8379-ae6f-f385-1c9068ae5017)
module.dns.aws_route53_record.api-external: Still modifying... (ID: Z1W9WHC6VPK9SB_api.zzhao918o.qe.devcluster.openshift.com_A_api, 10s elapsed)
module.dns.aws_route53_record.api-internal: Still modifying... (ID: Z1W9WHC6VPK9SB_api-int.zzhao918o.qe.devcluster.openshift.com_A_api, 10s elapsed)
module.bootstrap.vsphere_virtual_machine.vm: Still destroying... (ID: 420b07b7-8379-ae6f-f385-1c9068ae5017, 10s elapsed)
module.bootstrap.vsphere_virtual_machine.vm: Destruction complete after 13s
module.dns.aws_route53_record.api-external: Still modifying... (ID: Z1W9WHC6VPK9SB_api.zzhao918o.qe.devcluster.openshift.com_A_api, 20s elapsed)
module.dns.aws_route53_record.api-internal: Still modifying... (ID: Z1W9WHC6VPK9SB_api-int.zzhao918o.qe.devcluster.openshift.com_A_api, 20s elapsed)
module.dns.aws_route53_record.api-external: Still modifying... (ID: Z1W9WHC6VPK9SB_api.zzhao918o.qe.devcluster.openshift.com_A_api, 30s elapsed)
module.dns.aws_route53_record.api-internal: Still modifying... (ID: Z1W9WHC6VPK9SB_api-int.zzhao918o.qe.devcluster.openshift.com_A_api, 30s elapsed)
module.dns.aws_route53_record.api-internal: Modifications complete after 35s (ID: Z1W9WHC6VPK9SB_api-int.zzhao918o.qe.devcluster.openshift.com_A_api)
module.dns.aws_route53_record.api-external: Modifications complete after 36s (ID: Z1W9WHC6VPK9SB_api.zzhao918o.qe.devcluster.openshift.com_A_api)

Apply complete! Resources: 0 added, 3 changed, 2 destroyed.
+ '/home/installer4/workspace/Launch Environment Flexy/private-openshift-misc/v3-launch-templates/functionality-testing/aos-4_2/hosts/wait_approve_nodes_csr.sh' 1 1
+ '/home/installer4/workspace/Launch Environment Flexy/private-openshift-misc/v3-launch-templates/functionality-testing/aos-4_2/hosts/wait_patch_imageregistry_storage.sh'
Set the image registry storage to an empty directory.....
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
configs.imageregistry.operator.openshift.io is not gernerated, waiting...
!!!!!!!!!!
Something wrong, pls check your cluster - 'oc get configs.imageregistry.operator.openshift.io cluster'
+ exit 3
tools/launch_instance.rb:621:in `installation_task': shell command failed execution, see logs (RuntimeError)
	from tools/launch_instance.rb:747:in `block in launch_template'
	from tools/launch_instance.rb:746:in `each'
	from tools/launch_instance.rb:746:in `launch_template'
	from tools/launch_instance.rb:55:in `block (2 levels) in run'
	from /usr/share/gems/gems/commander-4.4.7/lib/commander/command.rb:182:in `call'
	from /usr/share/gems/gems/commander-4.4.7/lib/commander/command.rb:153:in `run'
	from /usr/share/gems/gems/commander-4.4.7/lib/commander/runner.rb:446:in `run_active_command'
	from /usr/share/gems/gems/commander-4.4.7/lib/commander/runner.rb:68:in `run!'
	from /usr/share/gems/gems/commander-4.4.7/lib/commander/delegates.rb:15:in `run!'
	from tools/launch_instance.rb:92:in `run'
	from tools/launch_instance.rb:879:in `<main>'
waiting for operation up to 36000 seconds..

[08:12:35] INFO> Exit Status: 3
deleting /home/installer4/workspace/Launch Environment Flexy/workdir/awscreds20190918-6748-bmdjs7
+ ret=1
+ '[' X1 == X0 ']'
+ result=FAIL
+ '[' -n '' ']'
+ exit 1
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording fingerprints
Started calculate disk usage of build
Finished Calculation of disk usage of build in 0 seconds
Started calculate disk usage of workspace
Finished Calculation of disk usage of workspace in 0 seconds
Finished: FAILURE

But I believe it's not a root cause, however found https://bugzilla.redhat.com/show_bug.cgi?id=1702615 that shows somethign similar and seems it was hit intermittently on vpsphere installations.

Comment 17 Ricardo Carrillo Cruz 2019-09-18 14:34:59 UTC
[rcarrillo@edu-playground ~]$ oc adm must-gather
[must-gather      ] OUT the server could not find the requested resource (get imagestreams.image.openshift.io must-gather)
[must-gather      ] OUT 
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift/origin-must-gather:latest
[must-gather      ] OUT namespace/openshift-must-gather-8j6cm created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-fz8tl created
[must-gather      ] OUT pod for plug-in image quay.io/openshift/origin-must-gather:latest created
[must-gather-hlsr7] OUT gather did not start: timed out waiting for the condition
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-fz8tl deleted
[must-gather      ] OUT namespace/openshift-must-gather-8j6cm deleted
error: gather did not start for pod must-gather-hlsr7: timed out waiting for the condition

Comment 18 Joseph Callen 2019-09-18 14:36:33 UTC
I am sure this was checked but MTU?  By default a switch in vSphere is 1500.

Comment 19 Ricardo Carrillo Cruz 2019-09-18 14:41:06 UTC
THis is a suggestion I made to QE, since Geneve docs suggest that it should be 1600.
On QE clusters NIC has 1500.
However QE engineers confirmed for me that with their tooling they cannot tweak MTU.
So, it may or may not be it, but is worth pursuing.

Comment 20 zhaozhanqi 2019-09-19 03:50:29 UTC
hi, I filed another bug to rhcos team to request a new coreos image with default MTU is 1600 to have a try. https://bugzilla.redhat.com/show_bug.cgi?id=1753470

Comment 23 Ricardo Carrillo Cruz 2019-09-23 11:23:15 UTC
I did an installation of vsphere + OVN on the dev cluster last Friday night, and got a bootstrap log bundle.
Something I'm seeing interesting:

events.json
-----------

        {
            "apiVersion": "v1",
            "count": 1,
            "eventTime": null,
            "firstTimestamp": "2019-09-20T18:31:35Z",
            "involvedObject": {
                "apiVersion": "apps/v1",
                "kind": "ReplicaSet",
                "name": "ovnkube-master-5d9dd77574",
                "namespace": "openshift-ovn-kubernetes",
                "resourceVersion": "1403",
                "uid": "e05726c6-dbd4-11e9-a39a-0050568bc774"
            },
            "kind": "Event",
            "lastTimestamp": "2019-09-20T18:31:35Z",
            "message": "Created pod: ovnkube-master-5d9dd77574-wj5rf",
            "metadata": {
                "creationTimestamp": "2019-09-20T18:31:35Z",
                "name": "ovnkube-master-5d9dd77574.15c6391804ba0b02",
                "namespace": "openshift-ovn-kubernetes",
                "resourceVersion": "1412",
                "selfLink": "/api/v1/namespaces/openshift-ovn-kubernetes/events/ovnkube-master-5d9dd77574.15c6391804ba0b02",
                "uid": "e05afaf3-dbd4-11e9-a39a-0050568bc774"
            },
            "reason": "SuccessfulCreate",
            "reportingComponent": "",
            "reportingInstance": "",
            "source": {
                "component": "replicaset-controller"
            },
            "type": "Normal"
        },

So there's a succesful pod scheduling of ovnkube-master. However it gets evicted, I believe due to the control-plane-2 being drained cordoned by machine-config-daemon:

2019-09-20T18:39:29.314999465+00:00 stderr F I0920 18:39:29.314985   16934 daemon.go:480] Starting MachineConfigDaemon
2019-09-20T18:39:29.315120536+00:00 stderr F I0920 18:39:29.315108   16934 daemon.go:487] Enabling Kubelet Healthz Monitor
2019-09-20T18:39:38.026618453+00:00 stderr F I0920 18:39:38.026282   16934 node.go:24] No machineconfiguration.openshift.io/currentConfig annotation on node control-plane-2: map[ovn_host_sub
net:10.128.2.0/23 volumes.kubernetes.io/controller-managed-attach-detach:true], in cluster bootstrap, loading initial node annotation from /etc/machine-config-daemon/node-annotations.json
2019-09-20T18:39:38.031476085+00:00 stderr F I0920 18:39:38.031428   16934 node.go:45] Setting initial node config: rendered-master-08066581e2819243940b9ac7d9dd7105
2019-09-20T18:39:38.083695696+00:00 stderr F I0920 18:39:38.083621   16934 daemon.go:643] In bootstrap mode
2019-09-20T18:39:38.083811979+00:00 stderr F E0920 18:39:38.083790   16934 writer.go:127] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-master-08066581e2
819243940b9ac7d9dd7105" not found
2019-09-20T18:39:40.056750104+00:00 stderr F I0920 18:39:40.056709   16934 daemon.go:643] In bootstrap mode
2019-09-20T18:39:40.056858868+00:00 stderr F E0920 18:39:40.056827   16934 writer.go:127] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-master-08066581e2
819243940b9ac7d9dd7105" not found
2019-09-20T18:39:56.072028535+00:00 stderr F I0920 18:39:56.071983   16934 daemon.go:643] In bootstrap mode
2019-09-20T18:39:56.072028535+00:00 stderr F I0920 18:39:56.072010   16934 daemon.go:671] Current+desired config: rendered-master-08066581e2819243940b9ac7d9dd7105
2019-09-20T18:39:56.078525571+00:00 stderr F I0920 18:39:56.078486   16934 daemon.go:847] Bootstrap pivot required to: registry.svc.ci.openshift.org/ocp/4.2-2019-09-20-090332@sha256:28adb865
febdab11d5317008558f41c2902afc9ee8d7a7dbca50c33ee13d7fa3
2019-09-20T18:39:56.078575859+00:00 stderr F I0920 18:39:56.078540   16934 update.go:863] Updating OS to registry.svc.ci.openshift.org/ocp/4.2-2019-09-20-090332@sha256:28adb865febdab11d53170
08558f41c2902afc9ee8d7a7dbca50c33ee13d7fa3
2019-09-20T18:39:56.086691516+00:00 stdout F Starting RPM-OSTree System Management Daemon...
2019-09-20T18:39:56.086691516+00:00 stdout F Reading config file '/etc/rpm-ostreed.conf'
2019-09-20T18:39:56.086691516+00:00 stdout F In idle state; will auto-exit in 62 seconds
2019-09-20T18:39:56.086691516+00:00 stdout F Started RPM-OSTree System Management Daemon.
2019-09-20T18:39:56.086691516+00:00 stdout F client(id:cli dbus:1.228 unit:crio-a4988a524d531204a1463bda1366e7598e904d675a3713f032624c49434afdf1.scope uid:0) added; new total=1
2019-09-20T18:39:56.086691516+00:00 stdout F client(id:cli dbus:1.228 unit:crio-a4988a524d531204a1463bda1366e7598e904d675a3713f032624c49434afdf1.scope uid:0) vanished; remaining=0
2019-09-20T18:39:56.086691516+00:00 stdout F In idle state; will auto-exit in 60 seconds
2019-09-20T18:39:56.086691516+00:00 stdout F client(id:cli dbus:1.230 unit:crio-a4988a524d531204a1463bda1366e7598e904d675a3713f032624c49434afdf1.scope uid:0) added; new total=1
2019-09-20T18:39:56.086691516+00:00 stdout F client(id:cli dbus:1.230 unit:crio-a4988a524d531204a1463bda1366e7598e904d675a3713f032624c49434afdf1.scope uid:0) vanished; remaining=0
2019-09-20T18:39:56.086691516+00:00 stdout F In idle state; will auto-exit in 64 seconds
2019-09-20T18:39:56.326084900+00:00 stderr F I0920 18:39:56.326051   16934 update.go:984] Update prepared; beginning drain
2019-09-20T18:39:56.336240103+00:00 stderr F I0920 18:39:56.336203   16934 update.go:89] cordoned node "control-plane-2"
2019-09-20T18:39:56.383090552+00:00 stderr F I0920 18:39:56.383038   16934 update.go:93] deleting pods with local storage: insights-operator-c84bf5dc9-2zfqz; ignoring DaemonSet-managed pods: dns-default-x48jp, machine-config-daemon-89g54, machine-config-server-jbwv9, multus-admission-controller-sw6th, multus-b4g4l, ovnkube-node-rplz2
2019-09-20T18:41:04.927775110+00:00 stderr F I0920 18:41:04.927716   16934 update.go:89] pod "openshift-kube-scheduler-operator-7bdc9b4854-4h6q2" removed (evicted)
2019-09-20T18:41:05.122219675+00:00 stderr F I0920 18:41:05.122182   16934 update.go:89] pod "machine-api-operator-6b5d54cdd8-fm846" removed (evicted)
2019-09-20T18:41:05.317089835+00:00 stderr F I0920 18:41:05.317046   16934 update.go:89] pod "kube-apiserver-operator-69ff97554b-g8xzm" removed (evicted)
2019-09-20T18:41:05.517243196+00:00 stderr F I0920 18:41:05.517192   16934 update.go:89] pod "kube-controller-manager-operator-f8888dd9d-8w8pz" removed (evicted)
2019-09-20T18:41:05.717153067+00:00 stderr F I0920 18:41:05.717114   16934 update.go:89] pod "catalog-operator-5bd75f766b-9lgfn" removed (evicted)
2019-09-20T18:41:05.917118494+00:00 stderr F I0920 18:41:05.917077   16934 update.go:89] pod "ovnkube-master-5d9dd77574-wj5rf" removed (evicted)

I checked the other control-plane nodes, and they all exhibit that logs, so that maybe why the nodes show as non schedulable.

Comment 24 Ricardo Carrillo Cruz 2019-09-23 11:36:58 UTC
After the reboot, here's the log for the MCD on control-plane-2:

2019-09-20T18:43:01.729759412+00:00 stderr F I0920 18:43:01.729424    1583 start.go:68] Version: machine-config-daemon-4.2.0-201907161330-229-g1dc26ed3-dirty (1dc26ed39bef507cf38c7c8f289fa59
0f33ff5d1)
2019-09-20T18:43:01.743650685+00:00 stderr F I0920 18:43:01.743575    1583 start.go:78] Calling chroot("/rootfs")
2019-09-20T18:43:01.744003811+00:00 stderr F I0920 18:43:01.743976    1583 rpm-ostree.go:356] Running captured: rpm-ostree status --json
2019-09-20T18:43:01.848745944+00:00 stderr F I0920 18:43:01.848033    1583 daemon.go:208] Booted osImageURL: registry.svc.ci.openshift.org/ocp/4.2-2019-09-20-090332@sha256:28adb865febdab11d5
317008558f41c2902afc9ee8d7a7dbca50c33ee13d7fa3 (42.80.20190920.1)
2019-09-20T18:43:01.852181781+00:00 stderr F I0920 18:43:01.851723    1583 update.go:984] Starting to manage node: control-plane-2
2019-09-20T18:43:01.857027567+00:00 stderr F I0920 18:43:01.856990    1583 rpm-ostree.go:356] Running captured: rpm-ostree status
2019-09-20T18:43:01.898342655+00:00 stderr F I0920 18:43:01.898302    1583 daemon.go:715] State: idle
2019-09-20T18:43:01.898342655+00:00 stderr F AutomaticUpdates: disabled
2019-09-20T18:43:01.898342655+00:00 stderr F Deployments:
2019-09-20T18:43:01.898342655+00:00 stderr F * pivot://registry.svc.ci.openshift.org/ocp/4.2-2019-09-20-090332@sha256:28adb865febdab11d5317008558f41c2902afc9ee8d7a7dbca50c33ee13d7fa3
2019-09-20T18:43:01.898342655+00:00 stderr F               CustomOrigin: Managed by pivot tool
2019-09-20T18:43:01.898342655+00:00 stderr F                    Version: 42.80.20190920.1 (2019-09-20T07:18:44Z)
2019-09-20T18:43:01.898342655+00:00 stderr F 
2019-09-20T18:43:01.898342655+00:00 stderr F   pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:53389c9b4a00d7afebb98f7bd9d20348deb1d77ca4baf194f0ae1b582b7e965b
2019-09-20T18:43:01.898342655+00:00 stderr F               CustomOrigin: Provisioned from oscontainer
2019-09-20T18:43:01.898342655+00:00 stderr F                    Version: 410.8.20190520.0 (2019-05-20T22:55:04Z)
2019-09-20T18:43:01.898342655+00:00 stderr F I0920 18:43:01.898325    1583 rpm-ostree.go:356] Running captured: journalctl --list-boots
2019-09-20T18:43:01.904445803+00:00 stderr F I0920 18:43:01.904410    1583 daemon.go:722] journalctl --list-boots:
2019-09-20T18:43:01.904445803+00:00 stderr F -2 a5f06cd48abd4ab4b9cf6895b96da2ee Fri 2019-09-20 18:23:39 UTC—Fri 2019-09-20 18:26:10 UTC
2019-09-20T18:43:01.904445803+00:00 stderr F -1 1200e18b5bc740e98c50db172756ef11 Fri 2019-09-20 18:26:16 UTC—Fri 2019-09-20 18:41:13 UTC
2019-09-20T18:43:01.904445803+00:00 stderr F  0 aa133ec568be4530b97525704147d928 Fri 2019-09-20 18:41:19 UTC—Fri 2019-09-20 18:43:01 UTC
2019-09-20T18:43:01.904479399+00:00 stderr F I0920 18:43:01.904438    1583 daemon.go:480] Starting MachineConfigDaemon
2019-09-20T18:43:01.904575801+00:00 stderr F I0920 18:43:01.904560    1583 daemon.go:487] Enabling Kubelet Healthz Monitor
2019-09-20T18:43:31.854155192+00:00 stderr F E0920 18:43:31.854091    1583 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Node: Get https://172.30.0.1:443/ap
i/v1/nodes?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: i/o timeout
2019-09-20T18:43:31.854820225+00:00 stderr F E0920 18:43:31.854788    1583 reflector.go:126] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:
101: Failed to list *v1.MachineConfig: Get https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: i/o timeout
2019-09-20T18:44:02.855646777+00:00 stderr F E0920 18:44:02.855569    1583 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Node: Get https://172.30.0.1:443/ap
i/v1/nodes?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: i/o timeout
2019-09-20T18:44:02.856507878+00:00 stderr F E0920 18:44:02.856437    1583 reflector.go:126] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to list *v1.MachineConfig: Get https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: i/o timeout
2019-09-20T18:44:33.856361284+00:00 stderr F E0920 18:44:33.856279    1583 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Node: Get https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: i/o timeout

Comment 25 Ricardo Carrillo Cruz 2019-09-23 11:45:41 UTC
172.30.0.1 is the service assigned to apiserver:

        {
            "apiVersion": "v1",
            "kind": "Service",
            "metadata": {
                "creationTimestamp": "2019-09-20T18:30:44Z",
                "labels": {
                    "component": "apiserver",
                    "provider": "kubernetes"
                },
                "name": "kubernetes",
                "namespace": "default",
                "resourceVersion": "180",
                "selfLink": "/api/v1/namespaces/default/services/kubernetes",
                "uid": "c206e494-dbd4-11e9-a39a-0050568bc774"
            },
            "spec": {
                "clusterIP": "172.30.0.1",
                "ports": [
                    {
                        "name": "https",
                        "port": 443,
                        "protocol": "TCP",
                        "targetPort": 6443
                    }
                ],
                "sessionAffinity": "None",
                "type": "ClusterIP"
            },
            "status": {
                "loadBalancer": {}
            }
        },


So, after reboot, the MCD is unable to reach the ApiServer, thus it's not possible to uncordon the nodes.

Comment 27 Ricardo Carrillo Cruz 2019-09-26 14:21:33 UTC
We determined it cannot work as what we have today.
OVN does not provide HA, as the MCD cordons the control plane prior to rebooting them, 
there's no network working to uncordon them so the installation can complete.

I'm trying to see if doing a manual uncordon during the bootstrap on the nodes suffices, and will
open a doc bug for the workaround if so.

Comment 28 Ricardo Carrillo Cruz 2019-10-02 12:34:18 UTC
Awaiting for https://github.com/openshift/cluster-network-operator/pull/329/files#diff-54b09156f80dfb820afa115de13b8f32R17 to merge.

That PR fixes couple issues I encountered when doing the manual uncordon, namely a lock issue on sb-db and an idempotency issue about
recreating ovnkube-db endpoint.

After that lands, will give it a go the uncordon workaround and if it works file doc bugs for it.

Comment 29 Casey Callendrello 2019-11-04 18:58:26 UTC
It's time to re-run this and see if it works.

Comment 30 Anurag saxena 2019-11-15 13:18:50 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1762614 is blocking this to verify

Comment 36 Ricardo Carrillo Cruz 2019-11-20 18:17:09 UTC
Pasting results from a failed installation:

<snip>
[ricky@ricky-laptop vsphere]$ ../openshift-install wait-for install-complete                                      
INFO Waiting up to 30m0s for the cluster at https://api.rcarrillocruz.devcluster.openshift.com:6443 to initialize... 
ERROR Cluster operator dns Degraded is True with NoDNS: No DNS resource exists 
INFO Cluster operator dns Progressing is True with Reconciling: Not all DNS DaemonSets available.
Moving to release version "4.3.0-0.ci-2019-11-20-121416".
Moving to coredns image version "registry.svc.ci.openshift.org/ocp/4.3-2019-11-20-121416@sha256:9a7e8297f7b70ee3a11fcbe4a78c59a5861e1afda5657a7437de6934bdc2458e".
Moving to openshift-cli image version "registry.svc.ci.openshift.org/ocp/4.3-2019-11-20-121416@sha256:2f5d38a12ee6193325847c0d0c6f7d64d5c7c06a8477f4c34a0aae153463d6c5". 
INFO Cluster operator dns Available is False with DNSUnavailable: No DNS DaemonSets available  
INFO Cluster operator insights Progressing is True with : Initializing the operator                                                                                                           
INFO Cluster operator kube-apiserver Progressing is True with Progressing: Progressing: 3 nodes are at revision 0; 0 nodes have achieved new revision 3 
INFO Cluster operator kube-apiserver Available is False with AvailableZeroNodesActive: Available: 0 nodes are active; 3 nodes are at revision 0; 0 nodes have achieved new revision 3         
INFO Cluster operator kube-controller-manager Progressing is True with Progressing: Progressing: 3 nodes are at revision 0; 0 nodes have achieved new revision 3                              
INFO Cluster operator kube-controller-manager Available is False with AvailableZeroNodesActive: Available: 0 nodes are active; 3 nodes are at revision 0; 0 nodes have achieved new revision 3
                                                                                                                                                                                              
INFO Cluster operator kube-scheduler Progressing is True with Progressing: Progressing: 3 nodes are at revision 0; 0 nodes have achieved new revision 4                                       
INFO Cluster operator kube-scheduler Available is False with AvailableZeroNodesActive: Available: 0 nodes are active; 3 nodes are at revision 0; 0 nodes have achieved new revision 4         
INFO Cluster operator machine-config Available is False with :                                                                                                                                
INFO Cluster operator machine-config Progressing is True with : Cluster is bootstrapping 4.3.0-0.ci-2019-11-20-121416                                                                         
INFO Cluster operator openshift-apiserver Available is Unknown with NoData:                                                                                                                   
INFO Cluster operator openshift-controller-manager Progressing is True with ProgressingDesiredStateNotYetAchieved: Progressing: daemonset/controller-manager: observed generation is 7, desire
d generation is 8.                                                                                                                                                                            
INFO Cluster operator operator-lifecycle-manager Progressing is True with : Deployed 0.13.0                                                                                                   
INFO Cluster operator operator-lifecycle-manager-catalog Progressing is True with : Deployed 0.13.0                                                                                           
INFO Cluster operator operator-lifecycle-manager-packageserver Available is False with :                                                                                                      
INFO Cluster operator operator-lifecycle-manager-packageserver Progressing is True with : Working toward 0.13.0                                                                               
FATAL failed to initialize the cluster: Working towards 4.3.0-0.ci-2019-11-20-121416: 74% complete     
[ricky@ricky-laptop vsphere]$ oc get co
NAME                                       VERSION                        AVAILABLE   PROGRESSING   DEGRADED   SINCE
cloud-credential                                                          True        False         False      54m
dns                                        unknown                        False       True          True       50m
insights                                   4.3.0-0.ci-2019-11-20-121416   True        True          False      51m
kube-apiserver                             4.3.0-0.ci-2019-11-20-121416   False       True          False      51m
kube-controller-manager                    4.3.0-0.ci-2019-11-20-121416   False       True          False      51m
kube-scheduler                             4.3.0-0.ci-2019-11-20-121416   False       True          False      51m
machine-api                                4.3.0-0.ci-2019-11-20-121416   True        False         False      50m
machine-config                             4.3.0-0.ci-2019-11-20-121416   False       True          False      51m
network                                    4.3.0-0.ci-2019-11-20-121416   True        False         False      49m
openshift-apiserver                        4.3.0-0.ci-2019-11-20-121416   Unknown     False         False      51m
openshift-controller-manager                                              True        True          False      49m
operator-lifecycle-manager                 4.3.0-0.ci-2019-11-20-121416   True        True          False      50m
operator-lifecycle-manager-catalog         4.3.0-0.ci-2019-11-20-121416   True        True          False      50m
operator-lifecycle-manager-packageserver                                  False       True          False      50m
service-ca                                 4.3.0-0.ci-2019-11-20-121416   True        False         False      50m
[ricky@ricky-laptop vsphere]$ oc get co kube-apiserver -oyaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:                                  
  creationTimestamp: "2019-11-20T17:22:57Z"
  generation: 1
  name: kube-apiserver            
  resourceVersion: "5580"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/kube-apiserver
  uid: 20476291-c297-4ab1-8509-8870c89fc631
spec: {}                                 
status:                 
  conditions:      
  - lastTransitionTime: "2019-11-20T17:22:57Z"
    message: |-                          
      NodeControllerDegraded: All master node(s) are ready
      NodeInstallerDegraded: 1 nodes are failing on revision 2:
      NodeInstallerDegraded: static pod of revision has been installed, but is not ready while new revision 2 is pending
      StaticPodsDegraded: nodes/control-plane-1 pods/kube-apiserver-control-plane-1 container="kube-apiserver-2" is not ready
      StaticPodsDegraded: pods "kube-apiserver-control-plane-2" not found
      StaticPodsDegraded: pods "kube-apiserver-control-plane-0" not found
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2019-11-20T17:23:02Z"
    message: 'Progressing: 3 nodes are at revision 0; 0 nodes have achieved new revision
      3'
    reason: Progressing
    status: "True"
    type: Progressing
  - lastTransitionTime: "2019-11-20T17:22:58Z"
    message: 'Available: 0 nodes are active; 3 nodes are at revision 0; 0 nodes have
      achieved new revision 3'
    reason: AvailableZeroNodesActive
    status: "False"
    type: Available
  - lastTransitionTime: "2019-11-20T17:22:57Z"
    reason: AsExpected
    status: "True"
    type: Upgradeable
...
</snip>

Note the CNO is available, yet apiserver is not.

Comment 37 Ricardo Carrillo Cruz 2019-11-22 09:41:13 UTC
Getting consistent results:

<snip>
[ricky@ricky-laptop vsphere]$ openshift-install wait-for install-complete
INFO Waiting up to 30m0s for the cluster at https://api.rcarrillocruz.devcluster.openshift.com:6443 to initialize... 
FATAL failed to initialize the cluster: Working towards 4.3.0-0.ci-2019-11-21-023035: 74% complete: timed out waiting for the condition 
[ricky@ricky-laptop openshift-installer]$ oc get co
NAME                                       VERSION                        AVAILABLE   PROGRESSING   DEGRADED   SINCE
cloud-credential                                                          True        False         False      15h
dns                                        4.3.0-0.ci-2019-11-21-023035   True        False         False      15h
insights                                   4.3.0-0.ci-2019-11-21-023035   True        True          False      15h
kube-apiserver                             4.3.0-0.ci-2019-11-21-023035   False       True          False      15h
kube-controller-manager                    4.3.0-0.ci-2019-11-21-023035   False       True          False      15h
kube-scheduler                             4.3.0-0.ci-2019-11-21-023035   False       True          False      15h
machine-api                                4.3.0-0.ci-2019-11-21-023035   True        False         False      15h
machine-config                             4.3.0-0.ci-2019-11-21-023035   False       True          False      15h
network                                    4.3.0-0.ci-2019-11-21-023035   True        False         False      15h
openshift-apiserver                        4.3.0-0.ci-2019-11-21-023035   Unknown     False         False      15h
openshift-controller-manager                                              False       True          False      15h
operator-lifecycle-manager                 4.3.0-0.ci-2019-11-21-023035   True        True          False      15h
operator-lifecycle-manager-catalog         4.3.0-0.ci-2019-11-21-023035   True        True          False      15h
operator-lifecycle-manager-packageserver                                  False       True          False      15h
service-ca                                 4.3.0-0.ci-2019-11-21-023035   True        False         False      15h
</snip>

So it doesn't apper to be a network issue, the network is up as the CNO is available.
Will do another run and paste results.

Comment 38 Ricardo Carrillo Cruz 2019-11-22 13:12:47 UTC
Did another run and CNO is up and running, kube-apiserver is not:

<snip>
[ricky@ricky-laptop openshift-installer]$ oc get co
NAME                                 VERSION                        AVAILABLE   PROGRESSING   DEGRADED   SINCE
cloud-credential                                                    True        False         False      42m
dns                                  4.3.0-0.ci-2019-11-21-195648   True        True          False      39m
insights                             4.3.0-0.ci-2019-11-21-195648   True        True          False      40m
kube-apiserver                       4.3.0-0.ci-2019-11-21-195648   False       True          False      40m
kube-controller-manager                                             False       True          False      40m
kube-scheduler                                                      False       True          False      40m
machine-api                          4.3.0-0.ci-2019-11-21-195648   True        False         False      39m
machine-config                       4.3.0-0.ci-2019-11-21-195648   True        False         False      38m
network                              4.3.0-0.ci-2019-11-21-195648   True        False         False      34m
openshift-apiserver                  4.3.0-0.ci-2019-11-21-195648   Unknown     False         False      40m
openshift-controller-manager         4.3.0-0.ci-2019-11-21-195648   True        True          False      39m
operator-lifecycle-manager-catalog   4.3.0-0.ci-2019-11-21-195648   True        True          False      39m
service-ca                           4.3.0-0.ci-2019-11-21-195648   True        False         False      40m
</snip>

Spotted this on kube-apiserver container log (oc -n openshift-kube-apiserver logs kube-apiserver-control-plane-0 -c kube-apiserver-2):

<snip>
E1122 12:37:34.077007       1 writers.go:105] apiserver was unable to write a JSON response: client disconnected
E1122 12:37:34.077031       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"client disconnected"}
E1122 12:37:34.078791       1 runtime.go:76] Observed a panic: runtime error: invalid memory address or nil pointer dereference
goroutine 13684 [running]:
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1(0xc00799f9e0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:108 +0x107
panic(0x5248aa0, 0xda2d0d0)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1.1(0xc010d60000, 0x7f733bee3798, 0xc000b77e70, 0xc000b77e60, 0x1, 0x1, 0x0, 0x0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/audit.go:88 +0x1e0
panic(0x5248aa0, 0xda2d0d0)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5
compress/gzip.(*Writer).Write(0xc01bec5b80, 0xc00c769e50, 0x43, 0x43, 0x30, 0x55cbf00, 0xc01242bc01)
	/usr/local/go/src/compress/gzip/gzip.go:168 +0x23c
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.(*deferredResponseWriter).Write(0xc00cbb7220, 0xc00c769e50, 0x43, 0x43, 0xc00c769e50, 0x43, 0x43)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go:182 +0x571
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.SerializeObject(0x5d0c330, 0x23, 0x7f733bed0810, 0xc015b99a70, 0x96645a0, 0xc0163f7a00, 0xc006d79600, 0xc8, 0x9630860, 0xc00b81ba40)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go:117 +0x3ea
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.WriteObjectNegotiated(0x966c7a0, 0xc0020a9860, 0x96644e0, 0xc000d931e0, 0x0, 0x0, 0x5c68b89, 0x2, 0x96645a0, 0xc0163f7a00, ...)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go:251 +0x559
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers.transformResponseObject(0x96af660, 0xc00b3b47b0, 0xc000d931e0, 0xc01236c818, 0xc006d79600, 0x96645a0, 0xc0163f7a00, 0xc8, 0x0, 0x0, ...)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/response.go:119 +0x356
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers.ListResource.func1(0x96645a0, 0xc0163f7a00, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/get.go:275 +0xd08
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints.restfulListResource.func1(0xc00b3b46c0, 0xc00b81b730)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/installer.go:1083 +0x8f
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/metrics.InstrumentRouteFunc.func1(0xc00b3b46c0, 0xc00b81b730)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go:323 +0x254
github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Container).dispatch(0xc001220090, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/emicklei/go-restful/container.go:288 +0xa3b
github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Container).Dispatch(...)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/emicklei/go-restful/container.go:199
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.director.ServeHTTP(0x5c8719a, 0xe, 0xc001220090, 0xc000c0a1c0, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/handler.go:146 +0x4e4
github.com/openshift/origin/vendor/k8s.io/kube-aggregator/pkg/apiserver.(*proxyHandler).ServeHTTP(0xc00cf9a190, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kube-aggregator/pkg/apiserver/handler_proxy.go:121 +0x162
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux.(*pathHandler).ServeHTTP(0xc010d90700, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux/pathrecorder.go:248 +0x38d
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux.(*PathRecorderMux).ServeHTTP(0xc004aec1c0, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux/pathrecorder.go:234 +0x85
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.director.ServeHTTP(0x5c8bbe6, 0xf, 0xc0052bb7a0, 0xc004aec1c0, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/handler.go:154 +0x6c3
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAuthorization.func1(0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/authorization.go:64 +0x4fa
net/http.HandlerFunc.ServeHTTP(0xc0028db840, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.WithMaxInFlightLimit.func1(0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/maxinflight.go:160 +0x5c7
net/http.HandlerFunc.ServeHTTP(0xc00476e270, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithImpersonation.func1(0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/impersonation.go:50 +0x1f6f
net/http.HandlerFunc.ServeHTTP(0xc0028db880, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1(0x7f73257ba410, 0xc0163f79c8, 0xc006d79500)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/audit.go:110 +0x50f
net/http.HandlerFunc.ServeHTTP(0xc0028db940, 0x7f73257ba410, 0xc0163f79c8, 0xc006d79500)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAuthentication.func1(0x7f73257ba410, 0xc0163f79c8, 0xc006d79400)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/authentication.go:110 +0x696
net/http.HandlerFunc.ServeHTTP(0xc000cc3cc0, 0x7f73257ba410, 0xc0163f79c8, 0xc006d79400)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.WithCORS.func1(0x7f73257ba410, 0xc0163f79c8, 0xc006d79400)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/cors.go:75 +0x1d7
net/http.HandlerFunc.ServeHTTP(0xc004a46900, 0x7f73257ba410, 0xc0163f79c8, 0xc006d79400)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1(0xc00799f9e0, 0xc004ad4c80, 0x96b0660, 0xc0163f79c8, 0xc006d79400)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:113 +0xb3
created by github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:99 +0x1b1

goroutine 13683 [running]:
github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic(0x4ef0680, 0xc010c39a20)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa3
github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0xc0104d5c20, 0x1, 0x1)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x82
panic(0x4ef0680, 0xc010c39a20)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP(0xc004ad4c80, 0x96647a0, 0xc00b81b2d0, 0xc006d79400)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:119 +0x43d
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.WithWaitGroup.func1(0x96647a0, 0xc00b81b2d0, 0xc006d79300)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/waitgroup.go:58 +0x122
net/http.HandlerFunc.ServeHTTP(0xc00476e570, 0x96647a0, 0xc00b81b2d0, 0xc006d79300)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithRequestInfo.func1(0x96647a0, 0xc00b81b2d0, 0xc006d79100)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/requestinfo.go:39 +0x2b8
net/http.HandlerFunc.ServeHTTP(0xc00476e630, 0x96647a0, 0xc00b81b2d0, 0xc006d79100)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithCacheControl.func1(0x96647a0, 0xc00b81b2d0, 0xc006d79100)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/cachecontrol.go:31 +0xa8
net/http.HandlerFunc.ServeHTTP(0xc004ad4ca0, 0x96647a0, 0xc00b81b2d0, 0xc006d79100)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog.WithLogging.func1(0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:89 +0x29c
net/http.HandlerFunc.ServeHTTP(0xc004ad4cc0, 0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.withPanicRecovery.func1(0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/wrap.go:51 +0x105
net/http.HandlerFunc.ServeHTTP(0xc004ad4d40, 0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/kubernetes/openshift-kube-apiserver/openshiftkubeapiserver.translateLegacyScopeImpersonation.func1(0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/openshift-kube-apiserver/openshiftkubeapiserver/patch_handlerchain.go:62 +0x18f
net/http.HandlerFunc.ServeHTTP(0xc004ad4d60, 0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.(*APIServerHandler).ServeHTTP(0xc00476e6f0, 0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/handler.go:189 +0x51
net/http.serverHandler.ServeHTTP(0xc0021cf1e0, 0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/usr/local/go/src/net/http/server.go:2774 +0xa8
net/http.initNPNRequest.ServeHTTP(0xc00d596000, 0xc0021cf1e0, 0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/usr/local/go/src/net/http/server.go:3323 +0x8d
github.com/openshift/origin/vendor/golang.org/x/net/http2.(*serverConn).runHandler(0xc0068ed200, 0xc0163f79b8, 0xc006d79000, 0xc008721960)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/server.go:2149 +0x89
created by github.com/openshift/origin/vendor/golang.org/x/net/http2.(*serverConn).processHeaders
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/server.go:1883 +0x4f4
E1122 12:37:34.078837       1 wrap.go:39] apiserver panic'd on GET /api/v1/pods?limit=500&resourceVersion=0
I1122 12:37:34.078941       1 log.go:172] http2: panic serving 139.178.89.202:50552: runtime error: invalid memory address or nil pointer dereference
goroutine 13684 [running]:
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1(0xc00799f9e0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:108 +0x107
panic(0x5248aa0, 0xda2d0d0)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1.1(0xc010d60000, 0x7f733bee3798, 0xc000b77e70, 0xc000b77e60, 0x1, 0x1, 0x0, 0x0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/audit.go:88 +0x1e0
panic(0x5248aa0, 0xda2d0d0)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5
compress/gzip.(*Writer).Write(0xc01bec5b80, 0xc00c769e50, 0x43, 0x43, 0x30, 0x55cbf00, 0xc01242bc01)
	/usr/local/go/src/compress/gzip/gzip.go:168 +0x23c
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.(*deferredResponseWriter).Write(0xc00cbb7220, 0xc00c769e50, 0x43, 0x43, 0xc00c769e50, 0x43, 0x43)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go:182 +0x571
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.SerializeObject(0x5d0c330, 0x23, 0x7f733bed0810, 0xc015b99a70, 0x96645a0, 0xc0163f7a00, 0xc006d79600, 0xc8, 0x9630860, 0xc00b81ba40)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go:117 +0x3ea
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.WriteObjectNegotiated(0x966c7a0, 0xc0020a9860, 0x96644e0, 0xc000d931e0, 0x0, 0x0, 0x5c68b89, 0x2, 0x96645a0, 0xc0163f7a00, ...)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go:251 +0x559
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers.transformResponseObject(0x96af660, 0xc00b3b47b0, 0xc000d931e0, 0xc01236c818, 0xc006d79600, 0x96645a0, 0xc0163f7a00, 0xc8, 0x0, 0x0, ...)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/response.go:119 +0x356
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers.ListResource.func1(0x96645a0, 0xc0163f7a00, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/get.go:275 +0xd08
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints.restfulListResource.func1(0xc00b3b46c0, 0xc00b81b730)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/installer.go:1083 +0x8f
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/metrics.InstrumentRouteFunc.func1(0xc00b3b46c0, 0xc00b81b730)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go:323 +0x254
github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Container).dispatch(0xc001220090, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/emicklei/go-restful/container.go:288 +0xa3b
github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Container).Dispatch(...)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/emicklei/go-restful/container.go:199
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.director.ServeHTTP(0x5c8719a, 0xe, 0xc001220090, 0xc000c0a1c0, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/handler.go:146 +0x4e4
github.com/openshift/origin/vendor/k8s.io/kube-aggregator/pkg/apiserver.(*proxyHandler).ServeHTTP(0xc00cf9a190, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kube-aggregator/pkg/apiserver/handler_proxy.go:121 +0x162
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux.(*pathHandler).ServeHTTP(0xc010d90700, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux/pathrecorder.go:248 +0x38d
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux.(*PathRecorderMux).ServeHTTP(0xc004aec1c0, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux/pathrecorder.go:234 +0x85
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.director.ServeHTTP(0x5c8bbe6, 0xf, 0xc0052bb7a0, 0xc004aec1c0, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/handler.go:154 +0x6c3
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAuthorization.func1(0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/authorization.go:64 +0x4fa
net/http.HandlerFunc.ServeHTTP(0xc0028db840, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.WithMaxInFlightLimit.func1(0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/maxinflight.go:160 +0x5c7
net/http.HandlerFunc.ServeHTTP(0xc00476e270, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithImpersonation.func1(0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/impersonation.go:50 +0x1f6f
net/http.HandlerFunc.ServeHTTP(0xc0028db880, 0x96644a0, 0xc0163f79e8, 0xc006d79600)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1(0x7f73257ba410, 0xc0163f79c8, 0xc006d79500)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/audit.go:110 +0x50f
net/http.HandlerFunc.ServeHTTP(0xc0028db940, 0x7f73257ba410, 0xc0163f79c8, 0xc006d79500)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAuthentication.func1(0x7f73257ba410, 0xc0163f79c8, 0xc006d79400)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/authentication.go:110 +0x696
net/http.HandlerFunc.ServeHTTP(0xc000cc3cc0, 0x7f73257ba410, 0xc0163f79c8, 0xc006d79400)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.WithCORS.func1(0x7f73257ba410, 0xc0163f79c8, 0xc006d79400)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/cors.go:75 +0x1d7
net/http.HandlerFunc.ServeHTTP(0xc004a46900, 0x7f73257ba410, 0xc0163f79c8, 0xc006d79400)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1(0xc00799f9e0, 0xc004ad4c80, 0x96b0660, 0xc0163f79c8, 0xc006d79400)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:113 +0xb3
created by github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:99 +0x1b1

goroutine 13683 [running]:
github.com/openshift/origin/vendor/golang.org/x/net/http2.(*serverConn).runHandler.func1(0xc0163f79b8, 0xc0104d5faf, 0xc0068ed200)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/server.go:2142 +0x16b
panic(0x4ef0680, 0xc010c39a20)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0xc0104d5c20, 0x1, 0x1)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x105
panic(0x4ef0680, 0xc010c39a20)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP(0xc004ad4c80, 0x96647a0, 0xc00b81b2d0, 0xc006d79400)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:119 +0x43d
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.WithWaitGroup.func1(0x96647a0, 0xc00b81b2d0, 0xc006d79300)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/waitgroup.go:58 +0x122
net/http.HandlerFunc.ServeHTTP(0xc00476e570, 0x96647a0, 0xc00b81b2d0, 0xc006d79300)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithRequestInfo.func1(0x96647a0, 0xc00b81b2d0, 0xc006d79100)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/requestinfo.go:39 +0x2b8
net/http.HandlerFunc.ServeHTTP(0xc00476e630, 0x96647a0, 0xc00b81b2d0, 0xc006d79100)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithCacheControl.func1(0x96647a0, 0xc00b81b2d0, 0xc006d79100)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/cachecontrol.go:31 +0xa8
net/http.HandlerFunc.ServeHTTP(0xc004ad4ca0, 0x96647a0, 0xc00b81b2d0, 0xc006d79100)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog.WithLogging.func1(0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:89 +0x29c
net/http.HandlerFunc.ServeHTTP(0xc004ad4cc0, 0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.withPanicRecovery.func1(0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/wrap.go:51 +0x105
net/http.HandlerFunc.ServeHTTP(0xc004ad4d40, 0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/kubernetes/openshift-kube-apiserver/openshiftkubeapiserver.translateLegacyScopeImpersonation.func1(0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/openshift-kube-apiserver/openshiftkubeapiserver/patch_handlerchain.go:62 +0x18f
net/http.HandlerFunc.ServeHTTP(0xc004ad4d60, 0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/usr/local/go/src/net/http/server.go:1995 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.(*APIServerHandler).ServeHTTP(0xc00476e6f0, 0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/handler.go:189 +0x51
net/http.serverHandler.ServeHTTP(0xc0021cf1e0, 0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/usr/local/go/src/net/http/server.go:2774 +0xa8
net/http.initNPNRequest.ServeHTTP(0xc00d596000, 0xc0021cf1e0, 0x9657d20, 0xc0163f79b8, 0xc006d79000)
	/usr/local/go/src/net/http/server.go:3323 +0x8d
github.com/openshift/origin/vendor/golang.org/x/net/http2.(*serverConn).runHandler(0xc0068ed200, 0xc0163f79b8, 0xc006d79000, 0xc008721960)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/server.go:2149 +0x89
created by github.com/openshift/origin/vendor/golang.org/x/net/http2.(*serverConn).processHeaders
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/server.go:1883 +0x4f4
</snip>

Comment 39 liujia 2019-12-05 01:42:58 UTC
*** Bug 1779005 has been marked as a duplicate of this bug. ***

Comment 41 Weibin Liang 2020-01-20 17:19:53 UTC
OVN installation passed in private-openshift-misc/v3-launch-templates/functionality-testing/aos-4_4/upi-on-vsphere/versioned-installer-vsphere_slave setup

[root@dhcp-41-193 FILE]# oc get network cluster -o yaml | grep OVN
  networkType: OVNKubernetes
  networkType: OVNKubernetes
[root@dhcp-41-193 FILE]# 

[root@dhcp-41-193 FILE]# oc get node
oc get clusterversionNAME              STATUS   ROLES    AGE   VERSION
compute-0         Ready    worker   53m   v1.17.0
control-plane-0   Ready    master   53m   v1.17.0
[root@dhcp-41-193 FILE]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-01-17-124118   True        False         31m     Cluster version is 4.4.0-0.nightly-2020-01-17-124118

Comment 42 Red Hat Bugzilla 2023-09-14 05:43:01 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.