Bug 1780318
| Summary: | Machine Config Daemon Daemon Set does not set universal Toleration (and therefore gets booted if taints are set on a node) | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Wolfgang Kulhanek <wkulhane> |
| Component: | Machine Config Operator | Assignee: | Antonio Murdaca <amurdaca> |
| Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.2.z | CC: | amurdaca, andcosta, bward, clasohm, cmarches, cshulman, dahernan, enorling, jmalde, jnaess, jorge.martinezgarcia, kgarriso, ktadimar, lbac, mas-hatada, mdhanve, mharri, mvardhan, mzali, nstephan, oarribas, obockows, palonsor, pamoedom, rahmed, rdomnu, rekhan, rh-container, rnoma, scott.worthington, smilner, sreber, susuresh, vfarias, wking |
| Target Milestone: | --- | ||
| Target Release: | 4.6.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-10-27 15:54:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1723620, 1846354 | ||
The same issue occurs with the node-ca DaemonSet from openshift-image-registry namespace. After tainting any node, the node-ca DaemonSet doesn't expect anymore the pod to run on that node. After redeploying of the DaemonSet, the pod doesn't get deployed on the tainted node. Version: 4.2.9 Going to fix this in 4.4, will have a PR ready soon and attempt a backport for 4.3 Do we need a seperate BZ for the node-ca issue? @Marcel, yes, please open a new BZ for that so that that group can get working on it. This BZ fix will only cover the openshift-machine-config-operator namespace. It looks like the universal toleration that was referenced as working was removed from the `dns-default` DaemonSet in 4.2.13. https://github.com/openshift/cluster-dns-operator/commit/6be3d017118b89203f00b9a915ffdfdb9975f145#diff-042823b431ba4ecede455c65596f6b52 I don't want to muddy this BZ since these would be for different components, but it does seem to be a larger problem with the cluster & system DaemonSets when tainting nodes should be something a customer can easily do without potentially breaking some of our core functionality (i.e. DNS, node-ca, MCD, etc.). I'm using the following workaround for MCD, but I'm not sure if this may cause issues or if it'll be overwritten by operator updates:
$ oc -n openshift-machine-config-operator patch ds machine-config-daemon --type='merge' -p "$(cat <<- EOF
spec:
template:
spec:
tolerations:
- operator: "Exists"
EOF
)"
Well and the DNS Operator is actually watching the dns-default DaemonSet. So my workaround for the other two (Node CA and Machine Config Operator) doesn't work here because it gets immediately reverted. There are issue with OCS nodes as well, those nodes have taint like:
taints:
- effect: NoSchedule
key: node.ocs.openshift.io/storage
value: "true"
of cause problem the same, lack of universal Toleration for mco daemons
*** Bug 1836871 has been marked as a duplicate of this bug. *** *** Bug 1843347 has been marked as a duplicate of this bug. *** Verified on 4.6.0-0.nightly-2020-06-24-071932, MCD Daemonsets tolerates all taints and does not get evicted.
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.6.0-0.nightly-2020-06-24-071932 True False 138m Cluster version is 4.6.0-0.nightly-2020-06-24-071932
$ oc -n openshift-machine-config-operator get ds machine-config-daemon -o yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "1"
creationTimestamp: "2020-06-24T12:25:58Z"
generation: 1
managedFields:
- apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:deprecated.daemonset.template.generation: {}
f:spec:
f:revisionHistoryLimit: {}
f:selector:
f:matchLabels:
.: {}
f:k8s-app: {}
f:template:
f:metadata:
f:labels:
.: {}
f:k8s-app: {}
f:name: {}
f:spec:
f:containers:
k:{"name":"machine-config-daemon"}:
.: {}
f:args: {}
f:command: {}
f:env:
.: {}
k:{"name":"NODE_NAME"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:fieldRef:
.: {}
f:apiVersion: {}
f:fieldPath: {}
f:image: {}
f:imagePullPolicy: {}
f:name: {}
f:resources:
.: {}
f:requests:
.: {}
f:cpu: {}
f:memory: {}
f:securityContext:
.: {}
f:privileged: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:volumeMounts:
.: {}
k:{"mountPath":"/rootfs"}:
.: {}
f:mountPath: {}
f:name: {}
k:{"name":"oauth-proxy"}:
.: {}
f:args: {}
f:image: {}
f:imagePullPolicy: {}
f:name: {}
f:ports:
.: {}
k:{"containerPort":9001,"protocol":"TCP"}:
.: {}
f:containerPort: {}
f:hostPort: {}
f:name: {}
f:protocol: {}
f:resources:
.: {}
f:requests:
.: {}
f:cpu: {}
f:memory: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:volumeMounts:
.: {}
k:{"mountPath":"/etc/tls/cookie-secret"}:
.: {}
f:mountPath: {}
f:name: {}
k:{"mountPath":"/etc/tls/private"}:
.: {}
f:mountPath: {}
f:name: {}
f:dnsPolicy: {}
f:hostNetwork: {}
f:hostPID: {}
f:nodeSelector:
.: {}
f:kubernetes.io/os: {}
f:priorityClassName: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext: {}
f:serviceAccount: {}
f:serviceAccountName: {}
f:terminationGracePeriodSeconds: {}
f:tolerations: {}
f:volumes:
.: {}
k:{"name":"cookie-secret"}:
.: {}
f:name: {}
f:secret:
.: {}
f:defaultMode: {}
f:secretName: {}
k:{"name":"proxy-tls"}:
.: {}
f:name: {}
f:secret:
.: {}
f:defaultMode: {}
f:secretName: {}
k:{"name":"rootfs"}:
.: {}
f:hostPath:
.: {}
f:path: {}
f:type: {}
f:name: {}
f:updateStrategy:
f:rollingUpdate:
.: {}
f:maxUnavailable: {}
f:type: {}
manager: machine-config-operator
operation: Update
time: "2020-06-24T12:25:58Z"
- apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:currentNumberScheduled: {}
f:desiredNumberScheduled: {}
f:numberAvailable: {}
f:numberReady: {}
f:observedGeneration: {}
f:updatedNumberScheduled: {}
manager: kube-controller-manager
operation: Update
time: "2020-06-24T13:59:25Z"
name: machine-config-daemon
namespace: openshift-machine-config-operator
resourceVersion: "73508"
selfLink: /apis/apps/v1/namespaces/openshift-machine-config-operator/daemonsets/machine-config-daemon
uid: 43e22aff-eec5-4457-b308-83f1839877b9
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: machine-config-daemon
template:
metadata:
creationTimestamp: null
labels:
k8s-app: machine-config-daemon
name: machine-config-daemon
spec:
containers:
- args:
- start
command:
- /usr/bin/machine-config-daemon
env:
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c96d46d62085af77d8e5a5d292fcf7a45e49bccdcde3d803331fb7cc093bb291
imagePullPolicy: IfNotPresent
name: machine-config-daemon
resources:
requests:
cpu: 20m
memory: 50Mi
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /rootfs
name: rootfs
- args:
- --https-address=:9001
- --provider=openshift
- --openshift-service-account=machine-config-daemon
- --upstream=http://127.0.0.1:8797
- --tls-cert=/etc/tls/private/tls.crt
- --tls-key=/etc/tls/private/tls.key
- --cookie-secret-file=/etc/tls/cookie-secret/cookie-secret
- '--openshift-sar={"resource": "namespaces", "verb": "get"}'
- '--openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}'
image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dc5f074db5fd073b9778958c8396e46d2020ef3636ada76ca99cb924aee42734
imagePullPolicy: IfNotPresent
name: oauth-proxy
ports:
- containerPort: 9001
hostPort: 9001
name: metrics
protocol: TCP
resources:
requests:
cpu: 20m
memory: 50Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/tls/private
name: proxy-tls
- mountPath: /etc/tls/cookie-secret
name: cookie-secret
dnsPolicy: ClusterFirst
hostNetwork: true
hostPID: true
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: machine-config-daemon
serviceAccountName: machine-config-daemon
terminationGracePeriodSeconds: 600
tolerations:
- operator: Exists
volumes:
- hostPath:
path: /
type: ""
name: rootfs
- name: proxy-tls
secret:
defaultMode: 420
secretName: proxy-tls
- name: cookie-secret
secret:
defaultMode: 420
secretName: cookie-secret
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 6
desiredNumberScheduled: 6
numberAvailable: 6
numberMisscheduled: 0
numberReady: 6
observedGeneration: 1
updatedNumberScheduled: 6
$ oc adm taint node ip-10-0-192-187.us-west-2.compute.internal infra=reserved:NoExecute; oc adm taint node ip-10-0-192-187.us-west-2.compute.internal infra=reserved:NoSchedule
node/ip-10-0-192-187.us-west-2.compute.internal tainted
node/ip-10-0-192-187.us-west-2.compute.internal tainted
$ oc get node/ip-10-0-192-187.us-west-2.compute.internal -o yaml
apiVersion: v1
kind: Node
metadata:
annotations:
machine.openshift.io/machine: openshift-machine-api/mnguyen46-clbq2-worker-us-west-2c-6czlq
machineconfiguration.openshift.io/currentConfig: rendered-worker-55ee30d203654c470ba6ceb25267baad
machineconfiguration.openshift.io/desiredConfig: rendered-worker-55ee30d203654c470ba6ceb25267baad
machineconfiguration.openshift.io/reason: ""
machineconfiguration.openshift.io/state: Done
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2020-06-24T12:33:55Z"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: m5.large
beta.kubernetes.io/os: linux
failure-domain.beta.kubernetes.io/region: us-west-2
failure-domain.beta.kubernetes.io/zone: us-west-2c
kubernetes.io/arch: amd64
kubernetes.io/hostname: ip-10-0-192-187
kubernetes.io/os: linux
node-role.kubernetes.io/worker: ""
node.kubernetes.io/instance-type: m5.large
node.openshift.io/os_id: rhcos
topology.kubernetes.io/region: us-west-2
topology.kubernetes.io/zone: us-west-2c
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:machine.openshift.io/machine: {}
manager: nodelink-controller
operation: Update
time: "2020-06-24T12:33:56Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:machineconfiguration.openshift.io/desiredConfig: {}
manager: machine-config-controller
operation: Update
time: "2020-06-24T13:28:28Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:podCIDR: {}
f:podCIDRs:
.: {}
v:"10.128.4.0/24": {}
manager: kube-controller-manager
operation: Update
time: "2020-06-24T13:28:30Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:machineconfiguration.openshift.io/currentConfig: {}
f:machineconfiguration.openshift.io/reason: {}
f:machineconfiguration.openshift.io/state: {}
manager: machine-config-daemon
operation: Update
time: "2020-06-24T13:33:28Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:volumes.kubernetes.io/controller-managed-attach-detach: {}
f:labels:
.: {}
f:beta.kubernetes.io/arch: {}
f:beta.kubernetes.io/instance-type: {}
f:beta.kubernetes.io/os: {}
f:failure-domain.beta.kubernetes.io/region: {}
f:failure-domain.beta.kubernetes.io/zone: {}
f:kubernetes.io/arch: {}
f:kubernetes.io/hostname: {}
f:kubernetes.io/os: {}
f:node-role.kubernetes.io/worker: {}
f:node.kubernetes.io/instance-type: {}
f:node.openshift.io/os_id: {}
f:topology.kubernetes.io/region: {}
f:topology.kubernetes.io/zone: {}
f:spec:
f:providerID: {}
f:status:
f:addresses:
.: {}
k:{"type":"Hostname"}:
.: {}
f:address: {}
f:type: {}
k:{"type":"InternalDNS"}:
.: {}
f:address: {}
f:type: {}
k:{"type":"InternalIP"}:
.: {}
f:address: {}
f:type: {}
f:allocatable:
.: {}
f:attachable-volumes-aws-ebs: {}
f:cpu: {}
f:ephemeral-storage: {}
f:hugepages-1Gi: {}
f:hugepages-2Mi: {}
f:memory: {}
f:pods: {}
f:capacity:
.: {}
f:attachable-volumes-aws-ebs: {}
f:cpu: {}
f:ephemeral-storage: {}
f:hugepages-1Gi: {}
f:hugepages-2Mi: {}
f:memory: {}
f:pods: {}
f:conditions:
.: {}
k:{"type":"DiskPressure"}:
.: {}
f:lastHeartbeatTime: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
k:{"type":"MemoryPressure"}:
.: {}
f:lastHeartbeatTime: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
k:{"type":"PIDPressure"}:
.: {}
f:lastHeartbeatTime: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
k:{"type":"Ready"}:
.: {}
f:lastHeartbeatTime: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
f:daemonEndpoints:
f:kubeletEndpoint:
f:Port: {}
f:images: {}
f:nodeInfo:
f:architecture: {}
f:bootID: {}
f:containerRuntimeVersion: {}
f:kernelVersion: {}
f:kubeProxyVersion: {}
f:kubeletVersion: {}
f:machineID: {}
f:operatingSystem: {}
f:osImage: {}
f:systemUUID: {}
manager: kubelet
operation: Update
time: "2020-06-24T15:08:26Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:taints: {}
manager: oc
operation: Update
time: "2020-06-24T15:12:07Z"
name: ip-10-0-192-187.us-west-2.compute.internal
resourceVersion: "118463"
selfLink: /api/v1/nodes/ip-10-0-192-187.us-west-2.compute.internal
uid: be2ae7c9-895d-4ae5-8485-6d2588e281f2
spec:
podCIDR: 10.128.4.0/24
podCIDRs:
- 10.128.4.0/24
providerID: aws:///us-west-2c/i-089f07959ebbb8046
taints:
- effect: NoSchedule
key: infra
value: reserved
- effect: NoExecute
key: infra
value: reserved
status:
addresses:
- address: 10.0.192.187
type: InternalIP
- address: ip-10-0-192-187.us-west-2.compute.internal
type: Hostname
- address: ip-10-0-192-187.us-west-2.compute.internal
type: InternalDNS
allocatable:
attachable-volumes-aws-ebs: "25"
cpu: 1500m
ephemeral-storage: "114381692328"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 6785784Ki
pods: "250"
capacity:
attachable-volumes-aws-ebs: "25"
cpu: "2"
ephemeral-storage: 125277164Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 7936760Ki
pods: "250"
conditions:
- lastHeartbeatTime: "2020-06-24T15:08:26Z"
lastTransitionTime: "2020-06-24T12:33:55Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2020-06-24T15:08:26Z"
lastTransitionTime: "2020-06-24T12:33:55Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2020-06-24T15:08:26Z"
lastTransitionTime: "2020-06-24T12:33:55Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2020-06-24T15:08:26Z"
lastTransitionTime: "2020-06-24T12:35:26Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:769432fdae52f909658c471bf9aa8d0f00fb398bb4976bec084c75037d87fa60
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 1231206281
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3803cac3f455ceaf28f3cf4d54eb9f59abe637fc2d72c55a862ac38e86867b80
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 469491383
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:24253e74e3605ea85550043f6e60f7a093322720db10d14748f38758d457dbc0
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 467404413
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:14fc53788f079e5488ebd6b28ee0aa711da4cd350b38e47977a64bd1849ce1fb
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 429941173
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c96d46d62085af77d8e5a5d292fcf7a45e49bccdcde3d803331fb7cc093bb291
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 427656308
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c3100d82504baa06eedeb04fd7975309a26fb0b018ec0ef5450db5fac0ca2377
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 365617141
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b75f3665ee9a0997ba358645d3ccfa251b8c7e054ca205c46c99b0c02733bfac
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 342695679
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c074911aad91a9ea7f76c639d274342559ae3cd66eec7227a5aa97e16f195cea
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 339758252
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e28b037e24caedb67a710a14c0f12eb18da4a5e0f387734244644ac460e9b4ec
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 334996356
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c59497362e2f48263eed3334504917ab09184cffb9a89fafa8d672495c2eae3
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 328353707
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:35f0a5a3d5bc51bf8fa36a24b19dc9d228d0d22bae8d4cabf2f9931aad117cb8
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 325492179
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f8110c86aa7b083d282c15d8b4385c8f3c97fe7cd369f88e3a296a928a532f5d
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 319513399
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cec1abdf919e4369359fbd313d9b1a1683e2c7ec350c099d0c35b843813ae5b9
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 305796978
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5d74cab786393064a4adebe8f2800b11c882184d7fd2f32f13b00ab369384486
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 304698519
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4307c0005681eff4859d03233d25b083efb25c09470354b4c16315e6e90e22ce
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 302616144
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:db64b2bdc80ee3cc1b9b783965d43379fdbd80172390d0a0fb0fcbddc62260b8
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 291056751
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3c2aca69defe14789e920a5bf8019d6067008658e212b3fb160a8fd2ecb00d75
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 290286086
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:13854524d2ad6658266c7c53dbdef9da69e253d3fa34b96a817b5c9f70238f3f
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 288703578
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ab66a3f333cebb895f34d30005e55f1f3e6e0e77e426c07b8af2aa69e4df4298
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 281435470
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:58281c442148b0861aaf7627b3fc7b3f85ddbb86ea66726c9fd3942b79f464a9
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 277404760
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:036a0261d5e17290c5c13fe66e9c02984a397ae3a6a480ee6fa71ed6fcf07275
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 277227537
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dc5f074db5fd073b9778958c8396e46d2020ef3636ada76ca99cb924aee42734
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 267697621
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:71cafcab89330c0d97ca1c007c456a540b55b658c91260486727e84895a7c358
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 258267683
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a44f3b6cb78cc303f2430b5a560b5607498dfcc052f32f2bb74baaabfad95745
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 257280584
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:34d23d7f0e1bceca591cbc81bae0840bdff44cd526de9e54a7b77d0640ee69a3
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 255896460
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:da566e590a91b32f1488543beab96200e6753b6381b1b32d377bd245a8dac38b
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 251102117
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eac35fad0c0fbcccdfeb8c21b4adf0f0414a47b45ca27fe20e7e4baf0c130e65
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 241981432
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a1902b52381c35775a51e98292db78607b154471130fcbb0032197871197433b
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
sizeBytes: 237383939
nodeInfo:
architecture: amd64
bootID: 99a3774a-1449-4880-8d21-e64df733c944
containerRuntimeVersion: cri-o://1.19.0-26.dev.rhaos4.6.git44d065f.el8-dev
kernelVersion: 4.18.0-211.el8.x86_64
kubeProxyVersion: v1.18.3+314a8f2
kubeletVersion: v1.18.3+314a8f2
machineID: ec26d7691c9c73560ed77a3580e083cf
operatingSystem: linux
osImage: Red Hat Enterprise Linux CoreOS 46.82.202006231340-0 (Ootpa)
systemUUID: ec26d769-1c9c-7356-0ed7-7a3580e083cf
$ oc get pods -n openshift-machine-config-operator --field-selector spec.nodeName=ip-10-0-192-187.us-west-2.compute.internal
NAME READY STATUS RESTARTS AGE
machine-config-daemon-lw8kf 2/2 Running 0 160m
$ oc -n openshift-machine-config-operator get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
machine-config-daemon 6 6 6 6 6 kubernetes.io/os=linux 169m
machine-config-server 3 3 3 3 3 node-role.kubernetes.io/master= 168m
For reference, here are the backports/cherry pick of https://github.com/openshift/machine-config-operator/pull/1760 OCP 4.3: https://github.com/openshift/machine-config-operator/pull/1816 - Bug 1846358 OCP 4.4: https://github.com/openshift/machine-config-operator/pull/1815 - Bug 1846357 OCP 4.5: https://github.com/openshift/machine-config-operator/pull/1814 - Bug 1846354 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |
Description of problem: The Daemon Set machine-config-daemon in project openshift-machine-config-operator does not set a universal Toleration (like the other system daemon sets like DNS or SDN). Therefore the Machine Config Daemon Pod gets removed from a node if any kind of Taint is set on the node. For example if we set up an Infra Node and taint it to keep non infra components off the node the Machine Config Daemon Set also gets removed. Version-Release number of selected component (if applicable): Observed on 4.2.4. How reproducible: Every time. Steps to Reproduce: 1. Taint a node (oc adm taint node infra-1a-gz8dn infra=reserved:NoExecute, oc adm taint node infra-1a-gz8dn infra=reserved:NoSchedule) 2. DaemonSet Pod gets removed. 3. MachineConfig Updates will no longer get propagated to nodes. Additional info: The DNS DaemonSet has this Toleration: tolerations: - operator: Exists The Machine Config Daemon DaemonSet will (most likely) need the same toleration.