Bug 1908145

Summary: kube-scheduler-recovery-controller container crash loop when router pod is co-scheduled
Product: OpenShift Container Platform Reporter: Seth Jennings <sjenning>
Component: kube-schedulerAssignee: Mike Dame <mdame>
Status: CLOSED ERRATA QA Contact: RamaKasturi <knarra>
Severity: high Docs Contact:
Priority: high    
Version: 4.7CC: aos-bugs, jluhrsen, mfojtik, wking
Target Milestone: ---Keywords: TestBlockerForLayeredProduct
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
[sig-arch] Managed cluster should have no crashlooping pods in core namespaces over four minutes [Suite:openshift/conformance/parallel]
Last Closed: 2021-02-24 15:44:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Seth Jennings 2020-12-15 22:48:42 UTC
Description of problem:

On installs where the masters are schedulable and the ingress router can be scheduled on a master, the router already uses port 10443 leading to a crash loop on the new kube-scheduler-recovery-controller container.

https://github.com/openshift/router/blob/cca042b8b1ef6c3acc176c6c0a04908b5dd45b2e/images/router/haproxy/conf/haproxy-config.template#L357

      name: kube-scheduler-recovery-controller
      command:
        - /bin/bash
        - '-euxo'
        - pipefail
        - '-c'
...
      args:
        - >
          timeout 3m /bin/bash -exuo pipefail -c 'while [ -n "$(ss -Htanop \(
          sport = 10443 \))" ]; do sleep 1; done'


          exec cluster-kube-scheduler-operator cert-recovery-controller
          --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-scheduler-cert-syncer-kubeconfig/kubeconfig 
          --namespace=${POD_NAMESPACE} --listen=0.0.0.0:10443 -v=2

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2020-12-15-081329

How reproducible:
Always

Steps to Reproduce:
1. Install 3-node cluster, make masters schedulable (i.e. masters have both master and worker roles).
2.
3.

Actual results:
$ oc get pod | grep kube-sche
openshift-kube-scheduler-master-0.ocp-dev.variantweb.net   2/3     CrashLoopBackOff   20         140m
openshift-kube-scheduler-master-1.ocp-dev.variantweb.net   3/3     Running            1          142m
openshift-kube-scheduler-master-2.ocp-dev.variantweb.net   2/3     CrashLoopBackOff   21         146m

$ oc get pod -n openshift-ingress -owide
NAME                              READY   STATUS    RESTARTS   AGE    IP             NODE                              NOMINATED NODE   READINESS GATES
router-default-86dcd458d8-7vbj2   1/1     Running   0          152m   10.42.11.118   master-0.ocp-dev.variantweb.net   <none>           <none>
router-default-86dcd458d8-vvxgb   1/1     Running   0          152m   10.42.11.120   master-2.ocp-dev.variantweb.net   <none>           <none>

Expected results:

kube-scheduler pod should be able to run successfully on the same node as the router

Additional info:

Comment 1 W. Trevor King 2020-12-15 22:58:09 UTC
Both components should probably also register their ports in [1] or somewhere in that doc.

[1]: https://github.com/openshift/enhancements/blob/5f2529a2a02a73aad17620d643e89eed189f14e3/enhancements/network/host-port-registry.md#localhost-only

Comment 2 Maciej Szulik 2020-12-16 13:00:55 UTC
Mike, we'll probably need to pick a different port for recovery controller, sync with Tomas if in doubt. 

I'm marking this a blocker+ since this is affecting the stability of the product when we're running 
in a schedulable masters configuration.

Comment 3 Seth Jennings 2020-12-16 15:22:06 UTC
fyi port used by the router updated in the port registry
https://github.com/openshift/enhancements/pull/568

Comment 4 Mike Dame 2020-12-17 18:08:54 UTC
I opened 2 PRs: 
- https://github.com/openshift/cluster-kube-scheduler-operator/pull/311, to change the port in kube-scheduler to 11443 (just a guess, need to confirm this value works)
- https://github.com/openshift/enhancements/pull/569, to add that, and the kube-controller-manager port for the same controller, to the registry

Comment 5 W. Trevor King 2020-12-23 21:41:14 UTC
*** Bug 1910417 has been marked as a duplicate of this bug. ***

Comment 6 W. Trevor King 2020-12-23 21:46:46 UTC
Dropping a reference to at least one of the e2e test-cases this kills (for compact clusters), to make this issue more discoverable in Sippy.

Comment 7 RamaKasturi 2021-01-06 07:07:00 UTC
Similar issue was hit when an upgrade was performed from 4.2 to 4.7 nightly build and as per the discussion with dev the PR here should fix the issue.

Comment 9 RamaKasturi 2021-01-11 18:00:44 UTC
Verified with the latest build below and i see that port has been changed to 11443 instead of 10443. Also tried an upgrade from 4.2 to 4.7 which was failing before the fix and now i could see that it passes with latest 4.7 build.

[knarra@knarra openshift-client-linux-4.7.0-0.nightly-2021-01-10-070949]$ ./oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-01-10-070949   True        False         6h7m    Cluster version is 4.7.0-0.nightly-2021-01-10-070949

Post action: #oc get co:NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
01-11 20:54:42  authentication                             4.7.0-0.nightly-2021-01-10-070949   True        False         False      3m42s
01-11 20:54:42  baremetal                                  4.7.0-0.nightly-2021-01-10-070949   True        False         False      25m
01-11 20:54:42  cloud-credential                           4.7.0-0.nightly-2021-01-10-070949   True        False         False      4h19m
01-11 20:54:42  cluster-autoscaler                         4.7.0-0.nightly-2021-01-10-070949   True        False         False      4h11m
01-11 20:54:42  config-operator                            4.7.0-0.nightly-2021-01-10-070949   True        False         False      138m
01-11 20:54:42  console                                    4.7.0-0.nightly-2021-01-10-070949   True        False         False      23m

port before the fix:
===========================
      name: cert-dir
  - args:
    - |
      timeout 3m /bin/bash -exuo pipefail -c 'while [ -n "$(ss -Htanop \( sport = 10443 \))" ]; do sleep 1; done'

      exec cluster-kube-scheduler-operator cert-recovery-controller --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-scheduler-cert-syncer-kubeconfig/kubeconfig  --namespace=${POD_NAMESPACE} --listen=0.0.0.0:10443 -v=2


port after the fix:
=============================
- args:
    - |
      timeout 3m /bin/bash -exuo pipefail -c 'while [ -n "$(ss -Htanop \( sport = 11443 \))" ]; do sleep 1; done'

      exec cluster-kube-scheduler-operator cert-recovery-controller --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-scheduler-cert-syncer-kubeconfig/kubeconfig  --namespace=${POD_NAMESPACE} --listen=0.0.0.0:11443 -v=2

Comment 10 RamaKasturi 2021-01-12 10:44:56 UTC
Based on comment 9 moving the bug to verified state.

Comment 11 RamaKasturi 2021-01-12 10:47:34 UTC
(In reply to RamaKasturi from comment #10)
> Based on comment 9 moving the bug to verified state. Also tried both UPI & IPI installs where node has both master & worker role but could not reproduce the crash.

Comment 14 errata-xmlrpc 2021-02-24 15:44:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633