Bug 1862874

Summary:	haproxy pod on one of masters in CrashLoopBackOff after deploy of OpenShift BareMetall ipv6
Product:	OpenShift Container Platform	Reporter:	Amit Ugol <augol>
Component:	Machine Config Operator	Assignee:	Yossi Boaron <yboaron>
Status:	CLOSED ERRATA	QA Contact:	Aleksandra Malykhin <amalykhi>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.5	CC:	asegurap, bperkins, lshilin, miabbott, mnguyen, rbartal, shardy, yboaron
Target Milestone:	---	Keywords:	Triaged
Target Release:	4.5.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: In baremetal platform, an infra-dns container runs in each host to support node names resolution and other internal DNS records. to complete the picture, an NM script updates host's resolv.conf to point to the infra-dns container. Additionally, when pods created they inherit their DNS configuration file (/etc/resolv.conf) from the host. In this case, the HAProxy pod was created before NM scripts update the host's resolv.conf. Consequence: HAProxy pod repeatedly failed because the api-int internal DNS record is not resolvable. Fix: Verify that resov.conf of HAProxy pod is identical to host's resolv.conf file. Result: HAProxy container runs with no error.	Story Points:	---
Clone Of:	1849432	Environment:
Last Closed:	2020-10-19 14:54:24 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1849432
Bug Blocks:

Comment 2 Micah Abbott 2020-10-05 13:34:48 UTC

@Alexsandra I've made you QA Contact on this BZ since you were able to verify the parent BZ.

Comment 3 Aleksandra Malykhin 2020-10-07 08:31:55 UTC

OCP 4.5: 4.5.0-0.nightly-2020-10-05-204452

Steps to reproduce:

1. Change resolv.conf file on one of the masters (I addede the line "nameserver 127.0.0.1")
[core@master-0-0 ~]$ sudo vi  /etc/resolv.conf 


2. Verify that the haproxy-monitor pod was restarted: 
[core@master-0-0 ~]$ sudo crictl ps | grep haproxy-monitor
f5f09a971e61f       9fae0d9500dcd3c705e17d6c9c7afc41adb7713de390ae7cc751e5408201798e  3 seconds ago       Running   haproxy-monitor 4  4c9f0b437e7c9


3. Be sure that the haproxy-monitor pod was restarted because the "failed liveness probe":
[core@master-0-0 ~]$ journalctl -u kubelet | grep "liveness probe"
Oct 06 14:24:55 master-0-0 hyperkube[2343]: I1006 14:24:55.565294    2343 event.go:278] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kni-infra", Name:"haproxy-master-0-0", UID:"ccda7847ab5cefa71f1f91518c07131c", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{haproxy-monitor}"}): type: 'Normal' reason: 'Killing' Container haproxy-monitor failed liveness probe, will be restarted


4. Verify that container resolv.conf file is the same as on the master node (first line was added on the step 1):
[core@master-0-0 ~]$ sudo crictl exec -it 2a94810301fa6   /bin/sh
sh-4.2# cat /etc/resolv.conf 
# Generated by KNI resolv prepender NM dispatcher script
search ocp-edge-cluster-0.qe.lab.redhat.com
nameserver 192.168.123.112
nameserver 127.0.0.1
nameserver 192.168.123.1


5. The pod was restarted and in the Running state:
[kni@provisionhost-0-0 ~]$ oc get pods -n openshift-kni-infra | grep haproxy-master-0-0
haproxy-master-0-0          2/2     Running   5          78m



Before backport:
When resolv.conf file changed on one of the masters (step 1) nothing happens.

Comment 6 errata-xmlrpc 2020-10-19 14:54:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.15 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4228