Bug 1956653 - Removal of CoreDNS pod from remote worker cause api_int resolution error
Summary: Removal of CoreDNS pod from remote worker cause api_int resolution error
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: John Wilkins
QA Contact: Rei
Tomas 'Sheldon' Radej
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-04 07:40 UTC by Rei
Modified: 2021-08-31 15:13 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-31 15:13:00 UTC
Target Upstream Version:
Embargoed:
vvoronko: needinfo-


Attachments (Terms of Use)

Description Rei 2021-05-04 07:40:02 UTC
Description of problem:
The epic that remove CoreDNS from worker and remove it overall cause api-int record to disappear  

Version-Release number of selected component (if applicable):


How reproducible:
You can see in this epic, in this epic coredns removed from workers and the workers can't resolved api-int
https://issues.redhat.com/browse/KNIDEPLOY-4329 

Steps to Reproduce:
1. $ ssh kni@provisionhost-0-0
2. $ cd ~/clusterconfigs/manifests
3. $ vi cluster-network-avoid-workers-99-config.yamlapiVersion: machineconfiguration.openshift.io/v1kind: MachineConfigmetadata:  name: 50-worker-fix-ipi-rwn  labels:    machineconfiguration.openshift.io/role: workerspec:  config:    ignition:      version: 3.1.0    systemd:      units:      - name: nodeip-configuration.service        enabled: true        contents: |          [Unit]          Description=Writes IP address configuration so that kubelet and crio services select a valid node IP          Wants=network-online.target          After=network-online.target ignition-firstboot-complete.service          Before=kubelet.service crio.service          [Service]          Type=oneshot          ExecStart=/bin/bash -c "exit 0 "          [Install]          WantedBy=multi-user.target    storage:      files:        - contents:            source: data:,            verification: {}          filesystem: root          mode: 420          path: /etc/kubernetes/manifests/keepalived.yaml        - contents:            source: data:,            verification: {}          filesystem: root          mode: 420          path: /etc/kubernetes/manifests/mdns-publisher.yaml        - contents:            source: data:,            verification: {}          filesystem: root          mode: 420          path: /etc/kubernetes/manifests/coredns.yaml---apiVersion: operator.openshift.io/v1kind: IngressControllermetadata:  name: default  namespace: openshift-ingress-operatorspec:  nodePlacement:    nodeSelector:      matchLabels:        node-role.kubernetes.io/master: "" 
4. $ cp install-config.yaml ~/clusterconfigs 
5. $ ./openshift-baremetal-install --dir ~/clusterconfigs create manifests
6. Configure master nodes to be schedulable (by configuring the mastersSchedulable field), meaning that new pods are allowed for placement on the master nodes. By default, master nodes are not schedulable. Set to true (mastersSchedulable) to allow master nodes to be schedulable: $ sed -i "s;mastersSchedulable: false;mastersSchedulable: true;g" clusterconfigs/manifests/cluster-scheduler-02-config.yml 9. Run the OpenShift Installer with the command - $ ./openshift-baremetal-install --dir ~/clusterconfigs --log-level debug create cluster

Actual results:
Worker does not deploy. The stuck on api-int resolve

Expected results:
This is the expected result we should understand where and how we provide api-int record to the system

Additional info:
You can bypass this issue if you add api-int record to the external dns server

Comment 1 Victor Voronkov 2021-05-04 08:09:34 UTC
with Ingress_VIP removal as default deployment mode, no worker will be successfully deployed, blocker flag set

Comment 3 Yossi Boaron 2021-05-27 15:25:36 UTC
@vvoronko 


I think that the workaround of adding api-int entry to the external DNS should do the work.

After setting node selector to masters for the default ingress controller (as described in required steps, see [1]) the router pods should run only on master nodes and we shouldn't have any issues with ingress VIP


Could you please explain your comment (comment#1) ? 

[1] 
https://github.com/beekhof/openshift-docs/commit/afc331d1416b415396bea6a71282b205b69d7e8b#diff-83fa1c2c018928f312ed3a7126e1cf6daa189ed8802f83066735d224ffcc93f6R133

Comment 4 Victor Voronkov 2021-05-31 13:22:50 UTC
current bug is related to api_int, no other new issues, as was explained to yboaron

Comment 7 Rei 2021-08-30 07:53:35 UTC
Read the docs /LGTM

Comment 8 John Wilkins 2021-08-30 15:42:54 UTC
Merged and cherry picked https://github.com/openshift/openshift-docs/pull/35881


Note You need to log in before you can comment on or make changes to this bug.