Bug 1862874

Summary: haproxy pod on one of masters in CrashLoopBackOff after deploy of OpenShift BareMetall ipv6
Product: OpenShift Container Platform Reporter: Amit Ugol <augol>
Component: Machine Config OperatorAssignee: Yossi Boaron <yboaron>
Status: CLOSED ERRATA QA Contact: Aleksandra Malykhin <amalykhi>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.5CC: asegurap, bperkins, lshilin, miabbott, mnguyen, rbartal, shardy, yboaron
Target Milestone: ---Keywords: Triaged
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: In baremetal platform, an infra-dns container runs in each host to support node names resolution and other internal DNS records. to complete the picture, an NM script updates host's resolv.conf to point to the infra-dns container. Additionally, when pods created they inherit their DNS configuration file (/etc/resolv.conf) from the host. In this case, the HAProxy pod was created before NM scripts update the host's resolv.conf. Consequence: HAProxy pod repeatedly failed because the api-int internal DNS record is not resolvable. Fix: Verify that resov.conf of HAProxy pod is identical to host's resolv.conf file. Result: HAProxy container runs with no error.
Story Points: ---
Clone Of: 1849432 Environment:
Last Closed: 2020-10-19 14:54:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1849432    
Bug Blocks:    

Comment 2 Micah Abbott 2020-10-05 13:34:48 UTC
@Alexsandra I've made you QA Contact on this BZ since you were able to verify the parent BZ.

Comment 3 Aleksandra Malykhin 2020-10-07 08:31:55 UTC
OCP 4.5: 4.5.0-0.nightly-2020-10-05-204452

Steps to reproduce:

1. Change resolv.conf file on one of the masters (I addede the line "nameserver 127.0.0.1")
[core@master-0-0 ~]$ sudo vi  /etc/resolv.conf 


2. Verify that the haproxy-monitor pod was restarted: 
[core@master-0-0 ~]$ sudo crictl ps | grep haproxy-monitor
f5f09a971e61f       9fae0d9500dcd3c705e17d6c9c7afc41adb7713de390ae7cc751e5408201798e  3 seconds ago       Running   haproxy-monitor 4  4c9f0b437e7c9


3. Be sure that the haproxy-monitor pod was restarted because the "failed liveness probe":
[core@master-0-0 ~]$ journalctl -u kubelet | grep "liveness probe"
Oct 06 14:24:55 master-0-0 hyperkube[2343]: I1006 14:24:55.565294    2343 event.go:278] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kni-infra", Name:"haproxy-master-0-0", UID:"ccda7847ab5cefa71f1f91518c07131c", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{haproxy-monitor}"}): type: 'Normal' reason: 'Killing' Container haproxy-monitor failed liveness probe, will be restarted


4. Verify that container resolv.conf file is the same as on the master node (first line was added on the step 1):
[core@master-0-0 ~]$ sudo crictl exec -it 2a94810301fa6   /bin/sh
sh-4.2# cat /etc/resolv.conf 
# Generated by KNI resolv prepender NM dispatcher script
search ocp-edge-cluster-0.qe.lab.redhat.com
nameserver 192.168.123.112
nameserver 127.0.0.1
nameserver 192.168.123.1


5. The pod was restarted and in the Running state:
[kni@provisionhost-0-0 ~]$ oc get pods -n openshift-kni-infra | grep haproxy-master-0-0
haproxy-master-0-0          2/2     Running   5          78m



Before backport:
When resolv.conf file changed on one of the masters (step 1) nothing happens.

Comment 6 errata-xmlrpc 2020-10-19 14:54:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.15 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4228