Bug 1956653

Summary: Removal of CoreDNS pod from remote worker cause api_int resolution error
Product: OpenShift Container Platform Reporter: Rei <rhalle>
Component: DocumentationAssignee: John Wilkins <jowilkin>
Status: CLOSED CURRENTRELEASE QA Contact: Rei <rhalle>
Severity: high Docs Contact: Tomas 'Sheldon' Radej <tradej>
Priority: high    
Version: 4.8CC: aos-bugs, hpokorny, jokerman, mcornea, rhalle, vvoronko, yboaron, yprokule
Target Milestone: ---Flags: vvoronko: needinfo-
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-31 15:13:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rei 2021-05-04 07:40:02 UTC
Description of problem:
The epic that remove CoreDNS from worker and remove it overall cause api-int record to disappear  

Version-Release number of selected component (if applicable):


How reproducible:
You can see in this epic, in this epic coredns removed from workers and the workers can't resolved api-int
https://issues.redhat.com/browse/KNIDEPLOY-4329 

Steps to Reproduce:
1. $ ssh kni@provisionhost-0-0
2. $ cd ~/clusterconfigs/manifests
3. $ vi cluster-network-avoid-workers-99-config.yamlapiVersion: machineconfiguration.openshift.io/v1kind: MachineConfigmetadata:  name: 50-worker-fix-ipi-rwn  labels:    machineconfiguration.openshift.io/role: workerspec:  config:    ignition:      version: 3.1.0    systemd:      units:      - name: nodeip-configuration.service        enabled: true        contents: |          [Unit]          Description=Writes IP address configuration so that kubelet and crio services select a valid node IP          Wants=network-online.target          After=network-online.target ignition-firstboot-complete.service          Before=kubelet.service crio.service          [Service]          Type=oneshot          ExecStart=/bin/bash -c "exit 0 "          [Install]          WantedBy=multi-user.target    storage:      files:        - contents:            source: data:,            verification: {}          filesystem: root          mode: 420          path: /etc/kubernetes/manifests/keepalived.yaml        - contents:            source: data:,            verification: {}          filesystem: root          mode: 420          path: /etc/kubernetes/manifests/mdns-publisher.yaml        - contents:            source: data:,            verification: {}          filesystem: root          mode: 420          path: /etc/kubernetes/manifests/coredns.yaml---apiVersion: operator.openshift.io/v1kind: IngressControllermetadata:  name: default  namespace: openshift-ingress-operatorspec:  nodePlacement:    nodeSelector:      matchLabels:        node-role.kubernetes.io/master: "" 
4. $ cp install-config.yaml ~/clusterconfigs 
5. $ ./openshift-baremetal-install --dir ~/clusterconfigs create manifests
6. Configure master nodes to be schedulable (by configuring the mastersSchedulable field), meaning that new pods are allowed for placement on the master nodes. By default, master nodes are not schedulable. Set to true (mastersSchedulable) to allow master nodes to be schedulable: $ sed -i "s;mastersSchedulable: false;mastersSchedulable: true;g" clusterconfigs/manifests/cluster-scheduler-02-config.yml 9. Run the OpenShift Installer with the command - $ ./openshift-baremetal-install --dir ~/clusterconfigs --log-level debug create cluster

Actual results:
Worker does not deploy. The stuck on api-int resolve

Expected results:
This is the expected result we should understand where and how we provide api-int record to the system

Additional info:
You can bypass this issue if you add api-int record to the external dns server

Comment 1 Victor Voronkov 2021-05-04 08:09:34 UTC
with Ingress_VIP removal as default deployment mode, no worker will be successfully deployed, blocker flag set

Comment 3 Yossi Boaron 2021-05-27 15:25:36 UTC
@vvoronko 


I think that the workaround of adding api-int entry to the external DNS should do the work.

After setting node selector to masters for the default ingress controller (as described in required steps, see [1]) the router pods should run only on master nodes and we shouldn't have any issues with ingress VIP


Could you please explain your comment (comment#1) ? 

[1] 
https://github.com/beekhof/openshift-docs/commit/afc331d1416b415396bea6a71282b205b69d7e8b#diff-83fa1c2c018928f312ed3a7126e1cf6daa189ed8802f83066735d224ffcc93f6R133

Comment 4 Victor Voronkov 2021-05-31 13:22:50 UTC
current bug is related to api_int, no other new issues, as was explained to yboaron

Comment 7 Rei 2021-08-30 07:53:35 UTC
Read the docs /LGTM

Comment 8 John Wilkins 2021-08-30 15:42:54 UTC
Merged and cherry picked https://github.com/openshift/openshift-docs/pull/35881