Bug 1729510
| Summary: | MCD does not wait for nodes to drain | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Antonio Murdaca <amurdaca> | |
| Component: | Machine Config Operator | Assignee: | Antonio Murdaca <amurdaca> | |
| Status: | CLOSED ERRATA | QA Contact: | Micah Abbott <miabbott> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.1.z | CC: | agarcial, jchaloup, jhou, mgugino, miabbott, talessio, wzheng | |
| Target Milestone: | --- | |||
| Target Release: | 4.1.z | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1729243 | |||
| : | 1743846 (view as bug list) | Environment: | ||
| Last Closed: | 2019-09-10 15:59:27 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1729512, 1737379, 1743846 | |||
| Bug Blocks: | ||||
|
Description
Antonio Murdaca
2019-07-12 12:49:59 UTC
*** Bug 1744948 has been marked as a duplicate of this bug. *** Discussed with Antonio about how to reproduce this for the MCO side of things. Any new MachineConfig will trigger a drain of the node as the new MC is applied, so steps look like:
1. Create cluster
2. Create MachineConfig for worker
3. `oc logs -f <worker MCD pod>`
4. Watch for messages about draining the node
```
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.1.0-0.nightly-2019-08-30-035445 True False 47m Cluster version is 4.1.0-0.nightly-2019-08-30-035445
### Figure out which nodes are workers
$ oc get node
NAME STATUS ROLES AGE VERSION
ip-10-0-140-86.us-west-2.compute.internal Ready master 62m v1.13.4+3bd346709
ip-10-0-143-236.us-west-2.compute.internal Ready worker 55m v1.13.4+3bd346709
ip-10-0-155-84.us-west-2.compute.internal Ready worker 55m v1.13.4+3bd346709
ip-10-0-157-48.us-west-2.compute.internal Ready master 62m v1.13.4+3bd346709
ip-10-0-166-99.us-west-2.compute.internal Ready worker 54m v1.13.4+3bd346709
ip-10-0-168-59.us-west-2.compute.internal Ready master 63m v1.13.4+3bd346709
### Find a MCD pod on a worker node
$ oc -n openshift-machine-config-operator describe po machine-config-daemon-4njpl | grep Node:
Node: ip-10-0-155-84.us-west-2.compute.internal/10.0.155.84
### Create a MachineConfig
$ cat ../machineConfigs/chrony.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 60-examplecorp-chrony
spec:
config:
ignition:
version: 2.2.0
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK
filesystem: root
mode: 0644
path: /etc/chrony.conf
### Apply the MachineConfig
$ oc apply -f ../machineConfigs/chrony.yaml
machineconfig.machineconfiguration.openshift.io/60-examplecorp-chrony created
### Watch the MCD pod for drain messages
$ oc -n openshift-machine-config-operator logs -f pod/machine-config-daemon-4njpl
I0903 13:30:57.790426 4480 start.go:67] Version: 4.1.14-201908291507-dirty (e17ddba2f24258f7ab7bb0eb034208cd3a0d1bab)
I0903 13:30:57.791293 4480 start.go:100] Starting node writer
I0903 13:30:57.796310 4480 run.go:22] Running captured: chroot /rootfs rpm-ostree status --json
I0903 13:30:58.038685 4480 daemon.go:200] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6b68f108c63ae9f7245621be001cda06d0d909565dc63624530144570c11dba3 (410.8.20190830.0)
I0903 13:30:58.044703 4480 start.go:196] Calling chroot("/rootfs")
I0903 13:30:58.044739 4480 start.go:206] Starting MachineConfigDaemon
I0903 13:30:58.044811 4480 update.go:847] Starting to manage node: ip-10-0-155-84.us-west-2.compute.internal
I0903 13:30:58.074577 4480 run.go:22] Running captured: rpm-ostree status
I0903 13:30:58.473083 4480 daemon.go:740] State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6b68f108c63ae9f7245621be001cda06d0d909565dc63624530144570c11dba3
CustomOrigin: Managed by pivot tool
Version: 410.8.20190830.0 (2019-08-30T02:31:46Z)
pivot://docker-registry-default.cloud.registry.upshift.redhat.com/redhat-coreos/ootpa@sha256:683a6a866a8ec789fedb5da63b6a2ff68c1b0788ec90e7def778f0c4c13197a4
CustomOrigin: Provisioned from oscontainer
Version: 410.8.20190520.0 (2019-05-20T20:10:04Z)
I0903 13:30:58.473113 4480 run.go:22] Running captured: journalctl --list-boots
I0903 13:30:58.526842 4480 daemon.go:747] journalctl --list-boots:
-1 d7495f00adbb4f6e836df08416314f18 Tue 2019-09-03 13:25:34 UTC—Tue 2019-09-03 13:28:33 UTC
0 7aa2913ce3b943ad8dfbca79db301fac Tue 2019-09-03 13:28:46 UTC—Tue 2019-09-03 13:30:58 UTC
I0903 13:30:58.526939 4480 daemon.go:494] Enabling Kubelet Healthz Monitor
I0903 13:30:59.451443 4480 node.go:24] No machineconfiguration.openshift.io/currentConfig annotation on node ip-10-0-155-84.us-west-2.compute.internal: map[machine.openshift.io/machine:openshift-machine-api/miabbott-4-1-14-5t67r-worker-us-west-2b-vcpjd volumes.kubernetes.io/contro
ller-managed-attach-detach:true], in cluster bootstrap, loading initial node annotation from /etc/machine-config-daemon/node-annotations.json
I0903 13:30:59.457046 4480 node.go:45] Setting initial node config: rendered-worker-ed526cce1edf9d8c4d643d17760ccd25
I0903 13:30:59.501338 4480 update.go:734] logger doesn't support --jounald, grepping the journal
I0903 13:30:59.542539 4480 update.go:847] error loading pending config open /etc/machine-config-daemon/state.json: no such file or directory
I0903 13:30:59.554603 4480 daemon.go:667] In bootstrap mode
I0903 13:30:59.554627 4480 daemon.go:695] Current+desired config: rendered-worker-ed526cce1edf9d8c4d643d17760ccd25
I0903 13:30:59.561204 4480 daemon.go:854] No bootstrap pivot required; unlinking bootstrap node annotations
I0903 13:30:59.561255 4480 update.go:847] Validating against current config rendered-worker-ed526cce1edf9d8c4d643d17760ccd25
I0903 13:30:59.563831 4480 daemon.go:898] Validating against current config rendered-worker-ed526cce1edf9d8c4d643d17760ccd25
I0903 13:30:59.576253 4480 daemon.go:904] Validated on-disk state
I0903 13:30:59.585952 4480 daemon.go:951] In desired config rendered-worker-ed526cce1edf9d8c4d643d17760ccd25
E0903 13:36:10.918151 4480 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=9, ErrCode=NO_ERROR, debug=""
E0903 13:36:10.918151 4480 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=9, ErrCode=NO_ERROR, debug=""
W0903 13:36:11.141549 4480 reflector.go:270] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: watch of *v1.MachineConfig ended with: too old resource version: 12587 (14989)
I0903 14:26:18.060959 4480 update.go:185] Checking reconcilable for config rendered-worker-ed526cce1edf9d8c4d643d17760ccd25 to rendered-worker-24ffb68b002cb7511301664cb41c0a68
I0903 14:26:18.062693 4480 update.go:847] Starting update from rendered-worker-ed526cce1edf9d8c4d643d17760ccd25 to rendered-worker-24ffb68b002cb7511301664cb41c0a68
I0903 14:26:18.065515 4480 update.go:399] Updating files
I0903 14:26:18.065528 4480 update.go:601] Writing file "/etc/tmpfiles.d/cleanup-cni.conf"
I0903 14:26:18.067103 4480 update.go:601] Writing file "/etc/systemd/system.conf.d/kubelet-cgroups.conf"
I0903 14:26:18.068593 4480 update.go:601] Writing file "/var/lib/kubelet/config.json"
I0903 14:26:18.070069 4480 update.go:601] Writing file "/etc/kubernetes/ca.crt"
I0903 14:26:18.071754 4480 update.go:601] Writing file "/etc/sysctl.d/forward.conf"
I0903 14:26:18.073279 4480 update.go:601] Writing file "/etc/kubernetes/kubelet-plugins/volume/exec/.dummy"
I0903 14:26:18.074157 4480 update.go:601] Writing file "/etc/containers/registries.conf"
I0903 14:26:18.075597 4480 update.go:601] Writing file "/etc/containers/storage.conf"
I0903 14:26:18.077298 4480 update.go:601] Writing file "/etc/crio/crio.conf"
I0903 14:26:18.079188 4480 update.go:601] Writing file "/etc/kubernetes/cloud.conf"
I0903 14:26:18.080075 4480 update.go:601] Writing file "/etc/kubernetes/kubelet-ca.crt"
I0903 14:26:18.081642 4480 update.go:601] Writing file "/etc/kubernetes/kubelet.conf"
I0903 14:26:18.083098 4480 update.go:601] Writing file "/etc/chrony.conf"
I0903 14:26:18.084600 4480 update.go:556] Writing systemd unit "kubelet.service"
I0903 14:26:18.086184 4480 update.go:573] Enabling systemd unit "kubelet.service"
I0903 14:26:18.086275 4480 update.go:493] /etc/systemd/system/multi-user.target.wants/kubelet.service already exists. Not making a new symlink
I0903 14:26:18.086287 4480 update.go:418] Deleting stale data
I0903 14:26:18.086306 4480 update.go:663] Writing SSHKeys at "/home/core/.ssh/authorized_keys"
I0903 14:26:18.089973 4480 update.go:812] logger doesn't support --jounald, logging json directly
I0903 14:26:18.093733 4480 update.go:847] Update prepared; beginning drain
I0903 14:26:18.101639 4480 update.go:89] cordoned node "ip-10-0-155-84.us-west-2.compute.internal"
I0903 14:26:18.144107 4480 update.go:93] ignoring DaemonSet-managed pods: tuned-jvg2z, dns-default-268kq, node-ca-rwrml, machine-config-daemon-4njpl, node-exporter-ldmv7, multus-tpwj4, ovs-4c5sg, sdn-4lml2; deleting pods with local storage: alertmanager-main-2, kube-state-metrics-
64c4bdf979-m6jsj, prometheus-adapter-7f77c6997f-dbbnz, prometheus-adapter-7f77c6997f-ptlrr
I0903 14:26:22.056307 4480 update.go:89] pod "router-default-5cf946464-l6st8" removed (evicted)
I0903 14:26:22.856358 4480 update.go:89] pod "certified-operators-c4ff7fc9b-8kclm" removed (evicted)
I0903 14:26:24.256264 4480 update.go:89] pod "prometheus-adapter-7f77c6997f-ptlrr" removed (evicted)
I0903 14:26:24.656182 4480 update.go:89] pod "redhat-operators-7699569b86-fnvtm" removed (evicted)
I0903 14:26:26.456437 4480 update.go:89] pod "image-registry-f7cd488bf-xmth9" removed (evicted)
I0903 14:26:26.656932 4480 update.go:89] pod "alertmanager-main-2" removed (evicted)
I0903 14:26:27.456095 4480 update.go:89] pod "telemeter-client-b65c57ff7-kh6h9" removed (evicted)
I0903 14:26:27.856365 4480 update.go:89] pod "community-operators-6f95cf67c6-fh8l6" removed (evicted)
I0903 14:26:28.056288 4480 update.go:89] pod "prometheus-adapter-7f77c6997f-dbbnz" removed (evicted)
I0903 14:26:29.411252 4480 update.go:89] pod "kube-state-metrics-64c4bdf979-m6jsj" removed (evicted)
I0903 14:26:29.411283 4480 update.go:89] drained node "ip-10-0-155-84.us-west-2.compute.internal"
I0903 14:26:29.411291 4480 update.go:847] drain complete
I0903 14:26:29.413821 4480 update.go:847] initiating reboot: Node will reboot into config rendered-worker-24ffb68b002cb7511301664cb41c0a68
I0903 14:26:29.471061 4480 start.go:215] Shutting down MachineConfigDaemon
### Verify MachineConfigPool is updated and all nodes are healthy
$ oc get machineconfig
NAME GENERATEDBYCONTROLLER IGNITIONVERSION CREATED
00-master e17ddba2f24258f7ab7bb0eb034208cd3a0d1bab 2.2.0 74m
00-worker e17ddba2f24258f7ab7bb0eb034208cd3a0d1bab 2.2.0 74m
01-master-container-runtime e17ddba2f24258f7ab7bb0eb034208cd3a0d1bab 2.2.0 74m
01-master-kubelet e17ddba2f24258f7ab7bb0eb034208cd3a0d1bab 2.2.0 74m
01-worker-container-runtime e17ddba2f24258f7ab7bb0eb034208cd3a0d1bab 2.2.0 74m
01-worker-kubelet e17ddba2f24258f7ab7bb0eb034208cd3a0d1bab 2.2.0 74m
60-examplecorp-chrony 2.2.0 12m
99-master-f1c24a07-ce4d-11e9-99b1-02f047b21df0-registries e17ddba2f24258f7ab7bb0eb034208cd3a0d1bab 2.2.0 74m
99-master-ssh 2.2.0 75m
99-worker-f1c3fa01-ce4d-11e9-99b1-02f047b21df0-registries e17ddba2f24258f7ab7bb0eb034208cd3a0d1bab 2.2.0 74m
99-worker-ssh 2.2.0 75m
rendered-master-8b9815ae15d192479af438faa89e5271 e17ddba2f24258f7ab7bb0eb034208cd3a0d1bab 2.2.0 74m
rendered-worker-24ffb68b002cb7511301664cb41c0a68 e17ddba2f24258f7ab7bb0eb034208cd3a0d1bab 2.2.0 12m
rendered-worker-ed526cce1edf9d8c4d643d17760ccd25 e17ddba2f24258f7ab7bb0eb034208cd3a0d1bab 2.2.0 74m
$ oc describe machineconfigpool/worker
Name: worker
Namespace:
Labels: <none>
Annotations: <none>
API Version: machineconfiguration.openshift.io/v1
Kind: MachineConfigPool
Metadata:
Creation Timestamp: 2019-09-03T13:22:57Z
Generation: 3
Resource Version: 33272
Self Link: /apis/machineconfiguration.openshift.io/v1/machineconfigpools/worker
UID: f1c3fa01-ce4d-11e9-99b1-02f047b21df0
Spec:
Configuration:
Name: rendered-worker-24ffb68b002cb7511301664cb41c0a68
Source:
API Version: machineconfiguration.openshift.io/v1
Kind: MachineConfig
Name: 00-worker
API Version: machineconfiguration.openshift.io/v1
Kind: MachineConfig
Name: 01-worker-container-runtime
API Version: machineconfiguration.openshift.io/v1
Kind: MachineConfig
Name: 01-worker-kubelet
API Version: machineconfiguration.openshift.io/v1
Kind: MachineConfig
Name: 60-examplecorp-chrony
API Version: machineconfiguration.openshift.io/v1
Kind: MachineConfig
Name: 99-worker-f1c3fa01-ce4d-11e9-99b1-02f047b21df0-registries
API Version: machineconfiguration.openshift.io/v1
Kind: MachineConfig
Name: 99-worker-ssh
Machine Config Selector:
Match Labels:
machineconfiguration.openshift.io/role: worker
Max Unavailable: <nil>
Node Selector:
Match Labels:
node-role.kubernetes.io/worker:
Paused: false
Status:
Conditions:
Last Transition Time: 2019-09-03T13:23:31Z
Message:
Reason:
Status: False
Type: RenderDegraded
Last Transition Time: 2019-09-03T13:23:36Z
Message:
Reason:
Status: False
Type: NodeDegraded
Last Transition Time: 2019-09-03T13:23:36Z
Message:
Reason:
Status: False
Type: Degraded
Last Transition Time: 2019-09-03T14:35:45Z
Message:
Reason: All nodes are updated with rendered-worker-24ffb68b002cb7511301664cb41c0a68
Status: True
Type: Updated
Last Transition Time: 2019-09-03T14:35:45Z
Message:
Reason:
Status: False
Type: Updating
Configuration:
Name: rendered-worker-24ffb68b002cb7511301664cb41c0a68
Source:
API Version: machineconfiguration.openshift.io/v1
Kind: MachineConfig
Name: 00-worker
API Version: machineconfiguration.openshift.io/v1
Kind: MachineConfig
Name: 01-worker-container-runtime
API Version: machineconfiguration.openshift.io/v1
Kind: MachineConfig
Name: 01-worker-kubelet
API Version: machineconfiguration.openshift.io/v1
Kind: MachineConfig
Name: 60-examplecorp-chrony
API Version: machineconfiguration.openshift.io/v1
Kind: MachineConfig
Name: 99-worker-f1c3fa01-ce4d-11e9-99b1-02f047b21df0-registries
API Version: machineconfiguration.openshift.io/v1
Kind: MachineConfig
Name: 99-worker-ssh
Degraded Machine Count: 0
Machine Count: 3
Observed Generation: 3
Ready Machine Count: 3
Unavailable Machine Count: 0
Updated Machine Count: 3
Events: <none>
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-140-86.us-west-2.compute.internal Ready master 76m v1.13.4+3bd346709
ip-10-0-143-236.us-west-2.compute.internal Ready worker 69m v1.13.4+3bd346709
ip-10-0-155-84.us-west-2.compute.internal Ready worker 69m v1.13.4+3bd346709
ip-10-0-157-48.us-west-2.compute.internal Ready master 76m v1.13.4+3bd346709
ip-10-0-166-99.us-west-2.compute.internal Ready worker 69m v1.13.4+3bd346709
ip-10-0-168-59.us-west-2.compute.internal Ready master 77m v1.13.4+3bd346709
```
```
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2594 |