Bug 1833160
| Summary: | OCP 4.3.15 UPI bare metal installation only two of three nodes are active | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Steven Ellis <sellis> |
| Component: | RHCOS | Assignee: | Ben Howard <behoward> |
| Status: | CLOSED NOTABUG | QA Contact: | Michael Nguyen <mnguyen> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.3.z | CC: | bbreard, imcleod, jligon, miabbott, nstielau |
| Target Milestone: | --- | ||
| Target Release: | 4.6.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-05-15 19:09:34 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
> Actual results:
oc get nodes
NAME STATUS ROLES AGE VERSION
etcd-2.test.bionode.io Ready master,worker 142m v1.16.2
localhost Ready master,worker 142m v1.16.2
This makes it seem like the nodes are failing to get the correct hostnames/nodenames. So moving to rhcos tema to triage.
It looks like the problem occurs when either 1 - the dns server gets overloaded with queries, or 2 - reverse dns issues or 3 - IPV6 address resolution issues. I've moved my config to just using DNSMasq and tuned the config. Currently I can reliably deploy 4.3.19 Based on comment #4, it looks like this was related to DNS resolution problems. I don't think there is much that can be done on the RHCOS side for DNS resolution issues; closing as NOTABUG. If you think there is more that should be done here, please reopen. |
Description of problem: ocp 4.3.15 on Bare Metal UPI. Environment deployes with no issues with the 4.3.9 installer. With 4.3.15 install is reporting Error while reconciling 4.3.15: the cluster operator openshift-apiserver has not yet successfully rolled out We also only have two of the 3 masters as active nodes with etcd running. Version-Release number of the following components: openshift-install-linux-4.3.15.tar.gz openshift-client-linux-4.3.15.tar.gz How reproducible: Consistent Steps to Reproduce: mkdir baremetal cp install-config-redpill.yaml baremetal/install-config.yaml openshift-install create manifests --dir=baremetal # We need the masters to be schedulable so we don't run this step #sed -i "s/mastersSchedulable: true/mastersSchedulable: false/" baremetal/manifests/cluster-scheduler-02-config.yml # Then generate the ign files openshift-install create ignition-configs --dir=baremetal openshift-install --dir=baremetal wait-for bootstrap-complete \ --log-level=info Boostrap completes oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.3.15 True False 31m Error while reconciling 4.3.15: the cluster operator monitoring is degraded Actual results: oc get nodes NAME STATUS ROLES AGE VERSION etcd-2.test.bionode.io Ready master,worker 142m v1.16.2 localhost Ready master,worker 142m v1.16.2 All the nodes should be etcd-[0-2].test.bionode.io No SRV records were requested during bootstrap Expected results: NAME STATUS ROLES AGE VERSION etcd-0.test.bionode.io Ready master,worker 7m19s v1.16.2 etcd-1.test.bionode.io Ready master,worker 7m41s v1.16.2 etcd-2.test.bionode.io Ready master,worker 7m20s v1.16.2 Additional info: Please attach logs from ansible-playbook with the -vvv flag