Bug 1995021

Summary:	resolv.conf and corefile sync slows down/stops after keepalived container restart
Product:	OpenShift Container Platform	Reporter:	Eldar Weiss <eweiss>
Component:	Machine Config Operator	Assignee:	Yossi Boaron <yboaron>
Machine Config Operator sub component:	platform-baremetal	QA Contact:	Eldar Weiss <eweiss>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	unspecified	CC:	aos-bugs, bnemec, bperkins, kgarriso, mkrejci, skumari, tsedovic, vvoronko
Version:	4.9
Target Milestone:	---
Target Release:	4.10.0
Hardware:	Unspecified
OS:	All
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: An old version of the kubernetes client library in the baremetal-runtimecfg project. Consequence: When a VIP failed over, sometimes client connections were not closed in a timely fashion. This could result in long delays for monitor containers that rely on talking to the API. Fix: Updated the client library. Result: Connections are not closed as expected on VIP failovers so the monitor does not hang for an excessively long time.	Story Points:	---
Clone Of:
Clones:	2033966 (view as bug list)		Environment:
Last Closed:	2022-03-10 16:05:18 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2033966

Description Eldar Weiss 2021-08-18 09:53:08 UTC

Description of problem:
Adding a nameserver to a node's NM resolv.conf does not add the nameserver to the Corefile if the keepalived constainer was restarted very recently

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-08-14-065522

/
Steps to Reproduce:
On any master node:
1.Get the keepalived container's id by using:

sudo crictl ps --name keepalived

2. stop the container by using:

sudo crictl stop *CONTAINER ID*

3.
Modify /var/run/NetworkManager/resolv.conf by adding a nameserver (example: 'nameserver 8.8.8.8')

4.
Check and see if the nameserver you added was added to cat /etc/coredns/Corefile or not.

Actual results:

[core@master-0-1 ~]$ cat /var/run/NetworkManager/resolv.conf
# Generated by NetworkManager
search ocp-edge-cluster-0.qe.lab.redhat.com
nameserver fe80::5054:ff:fe08:ccbe%br-ex
nameserver fd2e:6f44:5dd8::1
nameserver 8.8.8.8

[core@master-0-1 ~]$ cat /etc/coredns/Corefile 
. {
    errors
    health :18080
    forward . fe80::5054:ff:fe08:ccbe%br-ex fd2e:6f44:5dd8::1 {
        policy sequential
    }

Expected results:
[core@master-0-1 ~]$ cat /etc/coredns/Corefile 
. {
    errors
    health :18080
    forward . fe80::5054:ff:fe08:ccbe%br-ex fd2e:6f44:5dd8::1 8.8.8.8 {
        policy sequential
    }

Additional info:
1) This can also happen when removing a nameserver and waiting for it to be removed from the 
corefile.
2) At times, the sync does happen in the above condition, but takes several minutes.

Comment 1 Kirsten Garrison 2021-08-18 17:14:24 UTC

Please provide a must gather if possible along with information about the deployment

Comment 5 Sinny Kumari 2021-11-22 16:46:59 UTC

Hi Ben,

Are you or your team still looking at this bug?

Comment 6 Ben Nemec 2021-11-23 22:54:06 UTC

Sorry, I think we missed this one because it wasn't in the baremetal subcomponent. I'm going to move it so we catch it in our triage meeting tomorrow.

Comment 8 Eldar Weiss 2021-12-16 15:24:31 UTC

Issue is resolved.


Expected results:
Corefile is synced with the resolv.conf by getting the resolv.conf addition in it's "forward" section, with the sync only taking a a few seconds.

Version-Release number of selected component (if applicable), verified on:
4.10.0-0.ci-2021-12-15-195801

Actual results:
I've added "8.8.8.6" to the nameserver:
[core@master-0-0 ~]$ date
Thu Dec 16 15:20:46 UTC 2021
[core@master-0-0 ~]$ sudo vi /var/run/NetworkManager/resolv.conf
[core@master-0-0 ~]$ cat vi /var/run/NetworkManager/resolv.conf
cat: vi: No such file or directory
# Generated by NetworkManager
search ocp-edge-cluster-0.qe.lab.redhat.com
nameserver fe80::5054:ff:fe62:929f%br-ex
nameserver fd2e:6f44:5dd8::1
nameserver 8.8.8.6

Then checked the corefile to make sure it added the nameserver
[core@master-0-0 ~]$ cat /etc/coredns/Corefile | grep forward
    forward . fe80::5054:ff:fe62:929f%br-ex fd2e:6f44:5dd8::1 8.8.8.6 {
[core@master-0-0 ~]$ date
Thu Dec 16 15:21:18 UTC 2021

Took less than a minute.


This should be backported ASAP.

Comment 12 errata-xmlrpc 2022-03-10 16:05:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056