Bug 1995021
| Summary: | resolv.conf and corefile sync slows down/stops after keepalived container restart | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Eldar Weiss <eweiss> | |
| Component: | Machine Config Operator | Assignee: | Yossi Boaron <yboaron> | |
| Machine Config Operator sub component: | platform-baremetal | QA Contact: | Eldar Weiss <eweiss> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | medium | |||
| Priority: | unspecified | CC: | aos-bugs, bnemec, bperkins, kgarriso, mkrejci, skumari, tsedovic, vvoronko | |
| Version: | 4.9 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.10.0 | |||
| Hardware: | Unspecified | |||
| OS: | All | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Cause: An old version of the kubernetes client library in the baremetal-runtimecfg project.
Consequence: When a VIP failed over, sometimes client connections were not closed in a timely fashion. This could result in long delays for monitor containers that rely on talking to the API.
Fix: Updated the client library.
Result: Connections are not closed as expected on VIP failovers so the monitor does not hang for an excessively long time.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 2033966 (view as bug list) | Environment: | ||
| Last Closed: | 2022-03-10 16:05:18 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2033966 | |||
Please provide a must gather if possible along with information about the deployment Hi Ben, Are you or your team still looking at this bug? Sorry, I think we missed this one because it wasn't in the baremetal subcomponent. I'm going to move it so we catch it in our triage meeting tomorrow. Issue is resolved.
Expected results:
Corefile is synced with the resolv.conf by getting the resolv.conf addition in it's "forward" section, with the sync only taking a a few seconds.
Version-Release number of selected component (if applicable), verified on:
4.10.0-0.ci-2021-12-15-195801
Actual results:
I've added "8.8.8.6" to the nameserver:
[core@master-0-0 ~]$ date
Thu Dec 16 15:20:46 UTC 2021
[core@master-0-0 ~]$ sudo vi /var/run/NetworkManager/resolv.conf
[core@master-0-0 ~]$ cat vi /var/run/NetworkManager/resolv.conf
cat: vi: No such file or directory
# Generated by NetworkManager
search ocp-edge-cluster-0.qe.lab.redhat.com
nameserver fe80::5054:ff:fe62:929f%br-ex
nameserver fd2e:6f44:5dd8::1
nameserver 8.8.8.6
Then checked the corefile to make sure it added the nameserver
[core@master-0-0 ~]$ cat /etc/coredns/Corefile | grep forward
forward . fe80::5054:ff:fe62:929f%br-ex fd2e:6f44:5dd8::1 8.8.8.6 {
[core@master-0-0 ~]$ date
Thu Dec 16 15:21:18 UTC 2021
Took less than a minute.
This should be backported ASAP.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |
Description of problem: Adding a nameserver to a node's NM resolv.conf does not add the nameserver to the Corefile if the keepalived constainer was restarted very recently Version-Release number of selected component (if applicable): 4.9.0-0.nightly-2021-08-14-065522 / Steps to Reproduce: On any master node: 1.Get the keepalived container's id by using: sudo crictl ps --name keepalived 2. stop the container by using: sudo crictl stop *CONTAINER ID* 3. Modify /var/run/NetworkManager/resolv.conf by adding a nameserver (example: 'nameserver 8.8.8.8') 4. Check and see if the nameserver you added was added to cat /etc/coredns/Corefile or not. Actual results: [core@master-0-1 ~]$ cat /var/run/NetworkManager/resolv.conf # Generated by NetworkManager search ocp-edge-cluster-0.qe.lab.redhat.com nameserver fe80::5054:ff:fe08:ccbe%br-ex nameserver fd2e:6f44:5dd8::1 nameserver 8.8.8.8 [core@master-0-1 ~]$ cat /etc/coredns/Corefile . { errors health :18080 forward . fe80::5054:ff:fe08:ccbe%br-ex fd2e:6f44:5dd8::1 { policy sequential } Expected results: [core@master-0-1 ~]$ cat /etc/coredns/Corefile . { errors health :18080 forward . fe80::5054:ff:fe08:ccbe%br-ex fd2e:6f44:5dd8::1 8.8.8.8 { policy sequential } Additional info: 1) This can also happen when removing a nameserver and waiting for it to be removed from the corefile. 2) At times, the sync does happen in the above condition, but takes several minutes.