Bug 1805034
Summary: | Bootstraps fails when installing 4.4 nightly on single node | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Praveen Kumar <prkumar> | ||||||
Component: | Etcd | Assignee: | Sam Batschelet <sbatsche> | ||||||
Status: | CLOSED ERRATA | QA Contact: | ge liu <geliu> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 4.4 | CC: | alpatel, benjamin.dabelow, cfergeau, dbelenky, eparis, fdeutsch, fromani, fsimonce, gercan, kboumedh, mfojtik, mfuruta, mnewby, moddi, ngompa13, oshoval, skolicha, sspeiche, sttts, ykashtan | ||||||
Target Milestone: | --- | ||||||||
Target Release: | 4.5.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Enhancement | |||||||
Doc Text: |
Feature: Support single node cluster installations
Reason: This is needed for development-only for CRC testing
Result: With proper env set, and patch applied, single node clusters are supported.
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 1821748 (view as bug list) | Environment: | |||||||
Last Closed: | 2020-07-13 17:16:20 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1821748 | ||||||||
Attachments: |
|
Description
Praveen Kumar
2020-02-20 06:35:36 UTC
Created attachment 1664284 [details]
bootstrap-logs
bootstarp and control plane logs
Waiting on CEO, moving to etcd. AFAIK this is the PR which broke this https://github.com/openshift/cluster-etcd-operator/pull/157/files#diff-16c82eb805d9624f37fc2f0121ddc6eaR46 We have a solution we are working on for 4.4, with luck that will ship. Currently, this is being tested. Current PR https://github.com/openshift/cluster-etcd-operator/pull/266 which we (CRC team) actively testing, looks like it able to create the single node cluster on libvirt but the cert now generated by the etcd operator is depend on the cluster internal IP and in case of libvirt it is the IP which configured by libvirt provider instead using srv records (which used to be the case till 4.3.x side). In case of CRC we create the bundle and then run the genereated bundle on different platform on which we are not able to force an static IP so running the bundle on those platform will create issue and we are going to be still block :( Just an update, we thought if we able to create a virtual network [0] and if it can be picked up by openshift then we can able to deal with this etcd cert issue but as per our experiment the openshift doesn't take that network IP but uses what on actual interface have and this is not going to work out for us atm :( ``` core@crc-p5vnv-master-0 ~]$ ifconfig ens3 ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.130.11 netmask 255.255.255.0 broadcast 192.168.130.255 inet6 fe80::53fd:8725:ea4a:8093 prefixlen 64 scopeid 0x20<link> ether 52:fd:fc:07:21:82 txqueuelen 1000 (Ethernet) RX packets 25456 bytes 9639852 (9.1 MiB) RX errors 0 dropped 10 overruns 0 frame 0 TX packets 34018 bytes 38511681 (36.7 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [core@crc-p5vnv-master-0 ~]$ ifconfig ens3:0 ens3:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.126.11 netmask 255.255.255.0 broadcast 192.168.126.255 ether 52:fd:fc:07:21:82 txqueuelen 1000 (Ethernet) $ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.130.1 0.0.0.0 UG 100 0 0 ens3 10.88.0.0 0.0.0.0 255.255.0.0 U 0 0 0 cni-podman0 10.128.0.0 0.0.0.0 255.252.0.0 U 0 0 0 tun0 172.30.0.0 0.0.0.0 255.255.0.0 U 0 0 0 tun0 192.168.126.0 0.0.0.0 255.255.255.0 U 0 0 0 ens3 192.168.126.0 0.0.0.0 255.255.255.0 U 100 0 0 ens3 192.168.130.0 0.0.0.0 255.255.255.0 U 100 0 0 ens3 $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME crc-p5vnv-master-0 Ready master,worker 7h17m v1.17.1 192.168.130.11 <none> Red Hat Enterprise Linux CoreOS 45.81.202003231628-0 (Ootpa) 4.18.0-147.5.1.el8_1.x86_64 cri-o://1.17.0-9.dev.rhaos4.4.gitdfc8414.el8 $ oc logs etcd-crc-p5vnv-master-0 -n openshift-etcd -c etcd [...] 2020-03-24 15:36:50.006277 I | embed: rejected connection from "10.128.0.79:33292" (error "remote error: tls: bad certificate", ServerName "") 2020-03-24 15:36:51.219809 I | embed: rejected connection from "192.168.130.11:33952" (error "remote error: tls: bad certificate", ServerName "") 2020-03-24 15:36:51.717504 I | embed: rejected connection from "10.128.0.79:33326" (error "remote error: tls: bad certificate", ServerName "") 2020-03-24 15:36:51.805955 I | embed: rejected connection from "192.168.130.11:33962" (error "remote error: tls: bad certificate", ServerName "") 2020-03-24 15:36:52.581834 I | embed: rejected connection from "192.168.130.11:33968" (error "remote error: tls: bad certificate", ServerName "") 2020-03-24 15:36:53.981921 I | embed: rejected connection from "192.168.130.11:33982" (error "remote error: tls: bad certificate", ServerName "") 2020-03-24 15:36:54.804054 I | embed: rejected connection from "10.128.0.79:33360" (error "remote error: tls: bad certificate", ServerName "") 2020-03-24 15:36:55.943713 I | embed: rejected connection from "192.168.130.11:34006" (error "remote error: tls: bad certificate", ServerName "") 2020-03-24 15:36:56.544935 I | embed: rejected connection from "192.168.130.11:34012" (error "remote error: tls: bad certificate", ServerName "") 2020-03-24 15:36:57.032870 I | embed: rejected connection from "192.168.130.11:34026" (error "remote error: tls: bad certificate", ServerName "") 2020-03-24 15:36:57.767818 I | embed: rejected connection from "192.168.130.11:34032" (error "remote error: tls: bad certificate", ServerName "") 2020-03-24 15:36:58.176810 I | embed: rejected connection from "192.168.130.11:34040" (error "remote error: tls: bad certificate", ServerName "") 2020-03-24 15:36:58.643069 I | embed: rejected connection from "192.168.130.11:34044" (error "remote error: tls: bad certificate", ServerName "") 2020-03-24 15:36:58.867680 I | embed: rejected connection from "192.168.130.11:34048" (error "remote error: tls: bad certificate", ServerName "") 2020-03-24 15:36:58.874256 I | embed: rejected connection from "10.128.0.79:33418" (error "remote error: tls: bad certificate", ServerName "") 2020-03-24 15:37:02.810137 I | embed: rejected connection from "192.168.130.11:34108" (error "remote error: tls: bad certificate", ServerName "") 2020-03-24 15:37:02.844207 I | embed: rejected connection from "192.168.130.11:34110" (error "remote error: tls: bad certificate", ServerName "") ``` [0] https://linuxconfig.org/configuring-virtual-network-interfaces-in-linux Another update, We are able to test #266 (which is not closed without merge) by using the dummy network and making change in the kubelet systemd unit file (adding `--node-ip` [Thanks Alay]). but since now #279 is the one which should resolve it but I am afraid how we can automate this bundle process since this one need a manual intervention to patch the `etcd` resource as soon as the bootstrap make API available :( It would be great if there is a way to have this change to part of manifest so that can be added before starting the cluster creation. I verified this bug and it need manual intervention [0] to patch the `etcd` resource as soon as the bootstrap make API available. After applying the `oc patch etcd cluster -p='{"spec": {"unsupportedConfigOverrides": {"useUnsupportedUnsafeNonHANonProductionUnstableEtcd": true}}}' --type=merge` I can see the bootstrap success without any issue. [0] https://github.com/openshift/cluster-etcd-operator/pull/279#issue-393886988 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |