Bug 1862874 - haproxy pod on one of masters in CrashLoopBackOff after deploy of OpenShift BareMetall ipv6
Summary: haproxy pod on one of masters in CrashLoopBackOff after deploy of OpenShift B...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.5.z
Assignee: Yossi Boaron
QA Contact: Aleksandra Malykhin
URL:
Whiteboard:
Depends On: 1849432
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-03 05:35 UTC by Amit Ugol
Modified: 2020-10-19 14:54 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: In baremetal platform, an infra-dns container runs in each host to support node names resolution and other internal DNS records. to complete the picture, an NM script updates host's resolv.conf to point to the infra-dns container. Additionally, when pods created they inherit their DNS configuration file (/etc/resolv.conf) from the host. In this case, the HAProxy pod was created before NM scripts update the host's resolv.conf. Consequence: HAProxy pod repeatedly failed because the api-int internal DNS record is not resolvable. Fix: Verify that resov.conf of HAProxy pod is identical to host's resolv.conf file. Result: HAProxy container runs with no error.
Clone Of: 1849432
Environment:
Last Closed: 2020-10-19 14:54:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1974 0 None closed Bug 1862874: [baremetal & friends] verify resolv.conf in HAProxy static pod synced with host resolv.conf 2021-01-07 17:03:42 UTC
Red Hat Product Errata RHBA-2020:4228 0 None None None 2020-10-19 14:54:40 UTC

Comment 2 Micah Abbott 2020-10-05 13:34:48 UTC
@Alexsandra I've made you QA Contact on this BZ since you were able to verify the parent BZ.

Comment 3 Aleksandra Malykhin 2020-10-07 08:31:55 UTC
OCP 4.5: 4.5.0-0.nightly-2020-10-05-204452

Steps to reproduce:

1. Change resolv.conf file on one of the masters (I addede the line "nameserver 127.0.0.1")
[core@master-0-0 ~]$ sudo vi  /etc/resolv.conf 


2. Verify that the haproxy-monitor pod was restarted: 
[core@master-0-0 ~]$ sudo crictl ps | grep haproxy-monitor
f5f09a971e61f       9fae0d9500dcd3c705e17d6c9c7afc41adb7713de390ae7cc751e5408201798e  3 seconds ago       Running   haproxy-monitor 4  4c9f0b437e7c9


3. Be sure that the haproxy-monitor pod was restarted because the "failed liveness probe":
[core@master-0-0 ~]$ journalctl -u kubelet | grep "liveness probe"
Oct 06 14:24:55 master-0-0 hyperkube[2343]: I1006 14:24:55.565294    2343 event.go:278] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kni-infra", Name:"haproxy-master-0-0", UID:"ccda7847ab5cefa71f1f91518c07131c", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{haproxy-monitor}"}): type: 'Normal' reason: 'Killing' Container haproxy-monitor failed liveness probe, will be restarted


4. Verify that container resolv.conf file is the same as on the master node (first line was added on the step 1):
[core@master-0-0 ~]$ sudo crictl exec -it 2a94810301fa6   /bin/sh
sh-4.2# cat /etc/resolv.conf 
# Generated by KNI resolv prepender NM dispatcher script
search ocp-edge-cluster-0.qe.lab.redhat.com
nameserver 192.168.123.112
nameserver 127.0.0.1
nameserver 192.168.123.1


5. The pod was restarted and in the Running state:
[kni@provisionhost-0-0 ~]$ oc get pods -n openshift-kni-infra | grep haproxy-master-0-0
haproxy-master-0-0          2/2     Running   5          78m



Before backport:
When resolv.conf file changed on one of the masters (step 1) nothing happens.

Comment 6 errata-xmlrpc 2020-10-19 14:54:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.15 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4228


Note You need to log in before you can comment on or make changes to this bug.