Bug 1801638

Summary: On a DHCP6 lease renew, the node gets in NotReady state
Product: OpenShift Container Platform Reporter: Juan Manuel Parrilla Madrid <jparrill>
Component: InstallerAssignee: Antoni Segura Puimedon <asegurap>
Installer sub component: OpenShift on Bare Metal IPI QA Contact: Victor Voronkov <vvoronko>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: asegurap, rbryant, vvoronko
Version: 4.3.z   
Target Milestone: ---   
Target Release: 4.3.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1801662 (view as bug list) Environment:
Last Closed: 2020-03-10 23:53:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1801662    
Bug Blocks:    

Description Juan Manuel Parrilla Madrid 2020-02-11 12:43:19 UTC
Description of problem:

When a master wants to renew the DHCP6 lease, it loses sone entries from resolv.conf which are the NS entry and the search XXXX statement. This causes that kubelet sets the node as NotReady with all the consecuences...

Version-Release number of selected component (if applicable):


How reproducible:

Deploy a IPv6 Disconnected Baremental cluster using this build: 4.3.0-0.nightly-2020-02-06-120247-ipv6.6, then wait, it will happen eventually.

Actual results:

NetworkManager triggers the prepender script (/etc/NetworkManager/dispatcher.d/30-resolv-prepender) and loses the main entries from resolv.conf

Expected results:

Have the resolv.conf like this:

# Generated by KNI resolv prepender NM dispatcher script
search xxx.xxx.xxx.xxx.xxx.redhat.com
nameserver fd35:919d:4042:2:c7ed:9a9f:a9ec:7  # <== VIP
nameserver fd35:919d:4042:2::1000             # <== dnsmasq


Additional info:

Build info: 4.3.0-0.nightly-2020-02-06-120247-ipv6.6

- 30-resolv-prepender:
===========
#!/bin/bash
IFACE=$1
STATUS=$2
# If $DHCP6_FQDN_FQDN contains a "-"
[[ "$DHCP6_FQDN_FQDN" =~ - ]] && hostname $DHCP6_FQDN_FQDN
case "$STATUS" in
    up|down|dhcp4-change|dhcp6-change)
    logger -s "NM resolv-prepender triggered by ${1} ${2}."
    NAMESERVER_IP="fd35:919d:4042:2:c7ed:9a9f:a9ec:7"
    set +e
    if [[ -n "$NAMESERVER_IP" ]]; then
        logger -s "NM resolv-prepender: Prepending 'nameserver $NAMESERVER_IP' to /etc/resolv.conf (other nameservers from /var/run/NetworkManager/resolv.conf)"
        sed "/^search .*$/a nameserver $NAMESERVER_IP" /var/run/NetworkManager/resolv.conf > /etc/resolv.conf
    else
        logger -s "NM resolv-prepender: Couldn't find a Virtual IP, just updating resolv.conf"
        cp /var/run/NetworkManager/resolv.conf /etc/resolv.conf
    fi
    ;;
    *)
    ;;
esac
===========

- resolv.conf:
===========
# Generated by NetworkManager
nameserver fd35:919d:4042:2::1000
===========

- dhclient.conf:
===========
supersede domain-search "kni7.cloud.lab.eng.bos.redhat.com";
===========

Comment 1 Juan Manuel Parrilla Madrid 2020-02-12 16:29:40 UTC
Working fine on: 4.3.0-0.nightly-2020-02-10-055634-ipv6.3 (validated by my side)

Comment 4 Victor Voronkov 2020-02-26 16:10:07 UTC
Verified on 4.3.0-0.nightly-2020-02-21-091838-ipv6.3

After few hours and few lease renew, all seem fine:

[core@master-0 ~]$ sudo cat /var/lib/NetworkManager/dhclient6-6b03ac63-8515-451d-88c1-3e51b630a8b1-ens4.lease | grep ens4
  interface "ens4";
  interface "ens4";
  interface "ens4";
  interface "ens4";
  interface "ens4";
  interface "ens4";
  interface "ens4";
  interface "ens4";
[core@master-0 ~]$ cat /etc/resolv.conf 
# Generated by KNI resolv prepender NM dispatcher script
search vvoron-cluster.qe.lab.redhat.com
nameserver fd2e:6f44:5dd8:c956:0:0:0:2
nameserver fe80::5054:ff:fe20:d53b%ens4
nameserver fd2e:6f44:5dd8:c956::1

[kni@provisionhost-0 ~]$ oc get nodes
NAME                                        STATUS   ROLES    AGE    VERSION
master-0.vvoron-cluster.qe.lab.redhat.com   Ready    master   176m   v1.16.2
master-1.vvoron-cluster.qe.lab.redhat.com   Ready    master   176m   v1.16.2
master-2.vvoron-cluster.qe.lab.redhat.com   Ready    master   176m   v1.16.2
worker-0.vvoron-cluster.qe.lab.redhat.com   Ready    worker   158m   v1.16.2
worker-1.vvoron-cluster.qe.lab.redhat.com   Ready    worker   158m   v1.16.2

Comment 6 errata-xmlrpc 2020-03-10 23:53:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0676