Bug 1995021

Summary: resolv.conf and corefile sync slows down/stops after keepalived container restart
Product: OpenShift Container Platform Reporter: Eldar Weiss <eweiss>
Component: Machine Config OperatorAssignee: Yossi Boaron <yboaron>
Machine Config Operator sub component: platform-baremetal QA Contact: Eldar Weiss <eweiss>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: aos-bugs, bnemec, bperkins, kgarriso, mkrejci, skumari, tsedovic, vvoronko
Version: 4.9   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: An old version of the kubernetes client library in the baremetal-runtimecfg project. Consequence: When a VIP failed over, sometimes client connections were not closed in a timely fashion. This could result in long delays for monitor containers that rely on talking to the API. Fix: Updated the client library. Result: Connections are not closed as expected on VIP failovers so the monitor does not hang for an excessively long time.
Story Points: ---
Clone Of:
: 2033966 (view as bug list) Environment:
Last Closed: 2022-03-10 16:05:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2033966    

Description Eldar Weiss 2021-08-18 09:53:08 UTC
Description of problem:
Adding a nameserver to a node's NM resolv.conf does not add the nameserver to the Corefile if the keepalived constainer was restarted very recently

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-08-14-065522

/
Steps to Reproduce:
On any master node:
1.Get the keepalived container's id by using:

sudo crictl ps --name keepalived

2. stop the container by using:

sudo crictl stop *CONTAINER ID*

3.
Modify /var/run/NetworkManager/resolv.conf by adding a nameserver (example: 'nameserver 8.8.8.8')

4.
Check and see if the nameserver you added was added to cat /etc/coredns/Corefile or not.

Actual results:

[core@master-0-1 ~]$ cat /var/run/NetworkManager/resolv.conf
# Generated by NetworkManager
search ocp-edge-cluster-0.qe.lab.redhat.com
nameserver fe80::5054:ff:fe08:ccbe%br-ex
nameserver fd2e:6f44:5dd8::1
nameserver 8.8.8.8

[core@master-0-1 ~]$ cat /etc/coredns/Corefile 
. {
    errors
    health :18080
    forward . fe80::5054:ff:fe08:ccbe%br-ex fd2e:6f44:5dd8::1 {
        policy sequential
    }

Expected results:
[core@master-0-1 ~]$ cat /etc/coredns/Corefile 
. {
    errors
    health :18080
    forward . fe80::5054:ff:fe08:ccbe%br-ex fd2e:6f44:5dd8::1 8.8.8.8 {
        policy sequential
    }

Additional info:
1) This can also happen when removing a nameserver and waiting for it to be removed from the 
corefile.
2) At times, the sync does happen in the above condition, but takes several minutes.

Comment 1 Kirsten Garrison 2021-08-18 17:14:24 UTC
Please provide a must gather if possible along with information about the deployment

Comment 5 Sinny Kumari 2021-11-22 16:46:59 UTC
Hi Ben,

Are you or your team still looking at this bug?

Comment 6 Ben Nemec 2021-11-23 22:54:06 UTC
Sorry, I think we missed this one because it wasn't in the baremetal subcomponent. I'm going to move it so we catch it in our triage meeting tomorrow.

Comment 8 Eldar Weiss 2021-12-16 15:24:31 UTC
Issue is resolved.


Expected results:
Corefile is synced with the resolv.conf by getting the resolv.conf addition in it's "forward" section, with the sync only taking a a few seconds.

Version-Release number of selected component (if applicable), verified on:
4.10.0-0.ci-2021-12-15-195801

Actual results:
I've added "8.8.8.6" to the nameserver:
[core@master-0-0 ~]$ date
Thu Dec 16 15:20:46 UTC 2021
[core@master-0-0 ~]$ sudo vi /var/run/NetworkManager/resolv.conf
[core@master-0-0 ~]$ cat vi /var/run/NetworkManager/resolv.conf
cat: vi: No such file or directory
# Generated by NetworkManager
search ocp-edge-cluster-0.qe.lab.redhat.com
nameserver fe80::5054:ff:fe62:929f%br-ex
nameserver fd2e:6f44:5dd8::1
nameserver 8.8.8.6

Then checked the corefile to make sure it added the nameserver
[core@master-0-0 ~]$ cat /etc/coredns/Corefile | grep forward
    forward . fe80::5054:ff:fe62:929f%br-ex fd2e:6f44:5dd8::1 8.8.8.6 {
[core@master-0-0 ~]$ date
Thu Dec 16 15:21:18 UTC 2021

Took less than a minute.


This should be backported ASAP.

Comment 12 errata-xmlrpc 2022-03-10 16:05:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056