Bug 1995021 - resolv.conf and corefile sync slows down/stops after keepalived container restart
Summary: resolv.conf and corefile sync slows down/stops after keepalived container res...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.9
Hardware: Unspecified
OS: All
unspecified
medium
Target Milestone: ---
: 4.10.0
Assignee: Yossi Boaron
QA Contact: Eldar Weiss
URL:
Whiteboard:
Depends On:
Blocks: 2033966
TreeView+ depends on / blocked
 
Reported: 2021-08-18 09:53 UTC by Eldar Weiss
Modified: 2022-03-10 16:05 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: An old version of the kubernetes client library in the baremetal-runtimecfg project. Consequence: When a VIP failed over, sometimes client connections were not closed in a timely fashion. This could result in long delays for monitor containers that rely on talking to the API. Fix: Updated the client library. Result: Connections are not closed as expected on VIP failovers so the monitor does not hang for an excessively long time.
Clone Of:
: 2033966 (view as bug list)
Environment:
Last Closed: 2022-03-10 16:05:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift baremetal-runtimecfg pull 164 0 None open Bug 1995021: upgrade k8s.io/client-go 2021-12-08 15:15:38 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:05:37 UTC

Description Eldar Weiss 2021-08-18 09:53:08 UTC
Description of problem:
Adding a nameserver to a node's NM resolv.conf does not add the nameserver to the Corefile if the keepalived constainer was restarted very recently

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-08-14-065522

/
Steps to Reproduce:
On any master node:
1.Get the keepalived container's id by using:

sudo crictl ps --name keepalived

2. stop the container by using:

sudo crictl stop *CONTAINER ID*

3.
Modify /var/run/NetworkManager/resolv.conf by adding a nameserver (example: 'nameserver 8.8.8.8')

4.
Check and see if the nameserver you added was added to cat /etc/coredns/Corefile or not.

Actual results:

[core@master-0-1 ~]$ cat /var/run/NetworkManager/resolv.conf
# Generated by NetworkManager
search ocp-edge-cluster-0.qe.lab.redhat.com
nameserver fe80::5054:ff:fe08:ccbe%br-ex
nameserver fd2e:6f44:5dd8::1
nameserver 8.8.8.8

[core@master-0-1 ~]$ cat /etc/coredns/Corefile 
. {
    errors
    health :18080
    forward . fe80::5054:ff:fe08:ccbe%br-ex fd2e:6f44:5dd8::1 {
        policy sequential
    }

Expected results:
[core@master-0-1 ~]$ cat /etc/coredns/Corefile 
. {
    errors
    health :18080
    forward . fe80::5054:ff:fe08:ccbe%br-ex fd2e:6f44:5dd8::1 8.8.8.8 {
        policy sequential
    }

Additional info:
1) This can also happen when removing a nameserver and waiting for it to be removed from the 
corefile.
2) At times, the sync does happen in the above condition, but takes several minutes.

Comment 1 Kirsten Garrison 2021-08-18 17:14:24 UTC
Please provide a must gather if possible along with information about the deployment

Comment 5 Sinny Kumari 2021-11-22 16:46:59 UTC
Hi Ben,

Are you or your team still looking at this bug?

Comment 6 Ben Nemec 2021-11-23 22:54:06 UTC
Sorry, I think we missed this one because it wasn't in the baremetal subcomponent. I'm going to move it so we catch it in our triage meeting tomorrow.

Comment 8 Eldar Weiss 2021-12-16 15:24:31 UTC
Issue is resolved.


Expected results:
Corefile is synced with the resolv.conf by getting the resolv.conf addition in it's "forward" section, with the sync only taking a a few seconds.

Version-Release number of selected component (if applicable), verified on:
4.10.0-0.ci-2021-12-15-195801

Actual results:
I've added "8.8.8.6" to the nameserver:
[core@master-0-0 ~]$ date
Thu Dec 16 15:20:46 UTC 2021
[core@master-0-0 ~]$ sudo vi /var/run/NetworkManager/resolv.conf
[core@master-0-0 ~]$ cat vi /var/run/NetworkManager/resolv.conf
cat: vi: No such file or directory
# Generated by NetworkManager
search ocp-edge-cluster-0.qe.lab.redhat.com
nameserver fe80::5054:ff:fe62:929f%br-ex
nameserver fd2e:6f44:5dd8::1
nameserver 8.8.8.6

Then checked the corefile to make sure it added the nameserver
[core@master-0-0 ~]$ cat /etc/coredns/Corefile | grep forward
    forward . fe80::5054:ff:fe62:929f%br-ex fd2e:6f44:5dd8::1 8.8.8.6 {
[core@master-0-0 ~]$ date
Thu Dec 16 15:21:18 UTC 2021

Took less than a minute.


This should be backported ASAP.

Comment 12 errata-xmlrpc 2022-03-10 16:05:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.