Bug 1766513
| Summary: | MCD could shadow errors since the force validation file isn't deleted once used | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Antonio Murdaca <amurdaca> | |
| Component: | Machine Config Operator | Assignee: | Antonio Murdaca <amurdaca> | |
| Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.3.0 | CC: | kgarriso | |
| Target Milestone: | --- | |||
| Target Release: | 4.3.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1766514 (view as bug list) | Environment: | ||
| Last Closed: | 2020-01-23 11:09:46 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1766514, 1766515 | |||
|
Description
Antonio Murdaca
2019-10-29 10:27:04 UTC
Any updates to this, we have 2 backports needed stalled. Verified on 4.3.0-0.nightly-2019-11-18-175710
== Apply the following MC to have the file managed by MCO ==
cat file.yaml >
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"machineconfiguration.openshift.io/v1","kind":"MachineConfig","metadata":{"annotations":{},"labels":{"machineconfiguration.openshift.io/role":"worker"},"name":"test-file"},"spec":{"config":{"ignition":{"version":"2.2.0"},"storage":{"files":[{"contents":{"source":"data:text/plain;charset=utf;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK"},"filesystem":"root","mode":420,"path":"/etc/test"}]}}}}
creationTimestamp: "2019-11-19T15:42:50Z"
generation: 1
labels:
machineconfiguration.openshift.io/role: worker
name: test-file
resourceVersion: "286928"
selfLink: /apis/machineconfiguration.openshift.io/v1/machineconfigs/test-file
uid: 9c0fd040-c96e-4fd1-bfa4-122551bf0f8c
spec:
config:
ignition:
version: 2.2.0
storage:
files:
- contents:
source: data:text/plain;charset=utf;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK
filesystem: root
mode: 420
path: /etc/test
oc apply -f file.yaml
== edit the /etc/test which is now managed by MCO ==
$ oc get node
NAME STATUS ROLES AGE VERSION
ip-10-0-132-107.ec2.internal Ready worker 17h v1.16.2
ip-10-0-143-180.ec2.internal Ready master 17h v1.16.2
ip-10-0-155-243.ec2.internal Ready worker 17h v1.16.2
ip-10-0-157-93.ec2.internal Ready master 17h v1.16.2
ip-10-0-166-244.ec2.internal Ready master 17h v1.16.2
ip-10-0-168-238.ec2.internal Ready worker 17h v1.16.2
$ oc debug node/ip-10-0-132-107.ec2.internal
Starting pod/ip-10-0-132-107ec2internal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2#
sh-4.2#
sh-4.2# chroot /host
sh-4.4# vi /etc/test
sh-4.4# cat /etc/test
server foo.example.net maxdelay 0.4 offline
server bar.example.net maxdelay 0.4 offline
MY CHANGES HERE
server baz.example.net maxdelay 0.4 offline
sh-4.4# exit
exit
sh-4.2# exit
exit
Removing debug pod ...
== restart MCD pod on that node ==
$ oc get pods -n openshift-machine-config-operator -l k8s-app=machine-config-daemon --field-selector spec.nodeName=ip-10-0-132-107.ec2.internal
NAME READY STATUS RESTARTS AGE
machine-config-daemon-pffnf 2/2 Running 3 17h
$ oc -n openshift-machine-config-operator log machine-config-daemon-pffnf -c machine-config-daemon
I1119 15:51:41.294634 2141 start.go:74] Version: v4.3.0-201911161914-dirty (1c4436a0952ff43315da8d9a4aa1b5841192aeb1)
I1119 15:51:41.358299 2141 start.go:84] Calling chroot("/rootfs")
I1119 15:51:41.360123 2141 rpm-ostree.go:365] Running captured: rpm-ostree status --json
I1119 15:51:43.252332 2141 daemon.go:209] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eeb7352d393d6aafcb75d1d57f5578c06af9f8ab21d86f00e7cda8388744b320 (43.81.201911181553.0)
I1119 15:51:43.320652 2141 metrics.go:106] Registering Prometheus metrics
I1119 15:51:43.320872 2141 metrics.go:111] Starting metrics listener on 127.0.0.1:8797
I1119 15:51:43.356798 2141 update.go:1050] Starting to manage node: ip-10-0-132-107.ec2.internal
I1119 15:51:43.386440 2141 rpm-ostree.go:365] Running captured: rpm-ostree status
I1119 15:51:43.873395 2141 daemon.go:778] State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eeb7352d393d6aafcb75d1d57f5578c06af9f8ab21d86f00e7cda8388744b320
CustomOrigin: Managed by machine-config-operator
Version: 43.81.201911181553.0 (2019-11-18T15:58:44Z)
ostree://025d77d2680e3a2ce60b3b832106319fefb81cfefbf1bf26fcf4bd1aeae0712d
Version: 43.81.201911081536.0 (2019-11-08T15:41:32Z)
I1119 15:51:43.873421 2141 rpm-ostree.go:365] Running captured: journalctl --list-boots
I1119 15:51:43.890836 2141 daemon.go:785] journalctl --list-boots:
-2 9426aac3147844b8b0703f2a7e70fb1e Mon 2019-11-18 23:05:26 UTC—Mon 2019-11-18 23:07:16 UTC
-1 27c0bbf49686428091cefaaf6ee2e0ac Mon 2019-11-18 23:07:49 UTC—Tue 2019-11-19 15:50:46 UTC
0 56fcd0cea1d3495fa43887d9ac004008 Tue 2019-11-19 15:51:22 UTC—Tue 2019-11-19 15:51:43 UTC
I1119 15:51:43.890857 2141 daemon.go:528] Starting MachineConfigDaemon
I1119 15:51:43.890966 2141 daemon.go:535] Enabling Kubelet Healthz Monitor
E1119 15:51:47.483423 2141 reflector.go:123] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to list *v1.MachineConfig: Get https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: no route to host
E1119 15:51:47.483436 2141 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1.Node: Get https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: no route to host
I1119 15:51:49.939436 2141 daemon.go:731] Current config: rendered-worker-a071bd1866b9d81e64e2b56318138074
I1119 15:51:49.939462 2141 daemon.go:732] Desired config: rendered-worker-e63e8ae93003df690179be3266b9621e
I1119 15:51:49.952134 2141 update.go:1050] Disk currentConfig rendered-worker-e63e8ae93003df690179be3266b9621e overrides node annotation rendered-worker-a071bd1866b9d81e64e2b56318138074
I1119 15:51:49.956574 2141 daemon.go:955] Validating against pending config rendered-worker-e63e8ae93003df690179be3266b9621e
I1119 15:51:49.970491 2141 daemon.go:971] Validated on-disk state
I1119 15:51:49.985618 2141 daemon.go:1005] Completing pending config rendered-worker-e63e8ae93003df690179be3266b9621e
I1119 15:51:49.993916 2141 update.go:1050] completed update for config rendered-worker-e63e8ae93003df690179be3266b9621e
I1119 15:51:49.999053 2141 daemon.go:1021] In desired config rendered-worker-e63e8ae93003df690179be3266b9621e
$ oc -n openshift-machine-config-operator delete pod machine-config-daemon-pffnf
pod "machine-config-daemon-pffnf" deleted
== Verify degraded state in MCD logs after it restarts ==
$ oc get pods -n openshift-machine-config-operator -l k8s-app=machine-config-daemon --field-selector spec.nodeName=ip-10-0-132-107.ec2.internal
NAME READY STATUS RESTARTS AGE
machine-config-daemon-4cfqh 2/2 Running 0 12m
$ oc -n openshift-machine-config-operator log machine-config-daemon-4cfqh -c machine-config-daemon
...snip...
E1119 16:54:09.246433 71013 writer.go:130] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-e63e8ae93003df690179be3266b9621e
I1119 16:55:09.262041 71013 daemon.go:724] Current+desired config: rendered-worker-e63e8ae93003df690179be3266b9621e
I1119 16:55:09.269002 71013 daemon.go:958] Validating against current config rendered-worker-e63e8ae93003df690179be3266b9621e
E1119 16:55:09.270049 71013 daemon.go:1350] content mismatch for file /etc/test: server foo.example.net maxdelay 0.4 offline
server bar.example.net maxdelay 0.4 offline
A: MY CHANGES HERE
server baz.example.net maxdelay 0.4 offline
B: server baz.example.net maxdelay 0.4 offline
E1119 16:55:09.270077 71013 writer.go:130] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-e63e8ae93003df690179be3266b9621e
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT
master rendered-master-6e7212ec4d57e884f64e32eabff065bb True False False 3 3 3 0
worker rendered-worker-e63e8ae93003df690179be3266b9621e False True True 3 2 2 1
== Touch the force file /run/machine-config-daemon-force ==
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-132-107.ec2.internal Ready worker 17h v1.16.2
ip-10-0-143-180.ec2.internal Ready master 17h v1.16.2
ip-10-0-155-243.ec2.internal Ready worker 17h v1.16.2
ip-10-0-157-93.ec2.internal Ready master 17h v1.16.2
ip-10-0-166-244.ec2.internal Ready master 17h v1.16.2
ip-10-0-168-238.ec2.internal Ready worker 17h v1.16.2
[mnguyen@pet30 4.3-nightly]$ ls
openshift-client-linux-4.3.0-0.nightly-2019-11-18-175710.tar.gz README.md testcluster
openshift-install release.txt
openshift-install-linux-4.3.0-0.nightly-2019-11-18-175710.tar.gz sha256sum.txt
[mnguyen@pet30 4.3-nightly]$ oc debug node/ip-10-0-132-107.ec2.internal
Starting pod/ip-10-0-132-107ec2internal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2#
sh-4.2# chroot /host
sh-4.4# cat /etc/test
server foo.example.net maxdelay 0.4 offline
server bar.example.net maxdelay 0.4 offline
MY CHANGES HERE
server baz.example.net maxdelay 0.4 offline
sh-4.4# touch /run/machine-config-daemon-force
sh-4.4# ls /run/
NetworkManager containers dmeventd-server issue.d mdadm ostree-booted samba sudo
agetty.reload crio faillock kubernetes media podman secrets systemd
auditd.pid criu gluster lock mount rpc.statd.pid sepermit tmpfiles.d
chrony cryptsetup ignition.env log multus rpcbind setrans udev
console dbus initctl lvm openshift-sdn rpcbind.sock sshd.pid user
console-login-helper-messages dmeventd-client initramfs machine-config-daemon-force openvswitch runc sssd.pid utmp
sh-4.4# exit
exit
sh-4.2# exit
exit
Removing debug pod ...
== Restart MCD on node again ==
$ oc -n openshift-machine-config-operator delete pod/machine-config-daemon-4cfqh
pod "machine-config-daemon-4cfqh" deleted
== Verify that mcp are no longer degraded and no error messages in MCD logs ==
$ oc get pods -n openshift-machine-config-operator -l k8s-app=machine-config-daemon --field-selector spec.nodeName=ip-10-0-132-107.ec2.internal
NAME READY STATUS RESTARTS AGE
machine-config-daemon-k5w9h 2/2 Running 0 9s
$ oc -n openshift-machine-config-operator log machine-config-daemon-k5w9h -c machine-config-daemon
I1119 17:03:18.700477 97533 start.go:74] Version: v4.3.0-201911161914-dirty (1c4436a0952ff43315da8d9a4aa1b5841192aeb1)
I1119 17:03:18.706312 97533 start.go:84] Calling chroot("/rootfs")
I1119 17:03:18.706418 97533 rpm-ostree.go:365] Running captured: rpm-ostree status --json
I1119 17:03:18.877078 97533 daemon.go:209] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eeb7352d393d6aafcb75d1d57f5578c06af9f8ab21d86f00e7cda8388744b320 (43.81.201911181553.0)
I1119 17:03:18.879443 97533 metrics.go:106] Registering Prometheus metrics
I1119 17:03:18.879586 97533 metrics.go:111] Starting metrics listener on 127.0.0.1:8797
I1119 17:03:18.881125 97533 update.go:1050] Starting to manage node: ip-10-0-132-107.ec2.internal
I1119 17:03:18.886955 97533 rpm-ostree.go:365] Running captured: rpm-ostree status
I1119 17:03:18.964138 97533 daemon.go:778] State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eeb7352d393d6aafcb75d1d57f5578c06af9f8ab21d86f00e7cda8388744b320
CustomOrigin: Managed by machine-config-operator
Version: 43.81.201911181553.0 (2019-11-18T15:58:44Z)
ostree://025d77d2680e3a2ce60b3b832106319fefb81cfefbf1bf26fcf4bd1aeae0712d
Version: 43.81.201911081536.0 (2019-11-08T15:41:32Z)
I1119 17:03:18.964162 97533 rpm-ostree.go:365] Running captured: journalctl --list-boots
I1119 17:03:18.976745 97533 daemon.go:785] journalctl --list-boots:
-2 9426aac3147844b8b0703f2a7e70fb1e Mon 2019-11-18 23:05:26 UTC—Mon 2019-11-18 23:07:16 UTC
-1 27c0bbf49686428091cefaaf6ee2e0ac Mon 2019-11-18 23:07:49 UTC—Tue 2019-11-19 15:50:46 UTC
0 56fcd0cea1d3495fa43887d9ac004008 Tue 2019-11-19 15:51:22 UTC—Tue 2019-11-19 17:03:18 UTC
I1119 17:03:18.976772 97533 daemon.go:528] Starting MachineConfigDaemon
I1119 17:03:18.976874 97533 daemon.go:535] Enabling Kubelet Healthz Monitor
$ oc get mcp/worker
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT
worker rendered-worker-e63e8ae93003df690179be3266b9621e True False False 3 3 3 0
== Change /etc/test again and verify MCP is degrade ==
$ oc get node
NAME STATUS ROLES AGE VERSION
ip-10-0-132-107.ec2.internal Ready worker 17h v1.16.2
ip-10-0-143-180.ec2.internal Ready master 18h v1.16.2
ip-10-0-155-243.ec2.internal Ready worker 17h v1.16.2
ip-10-0-157-93.ec2.internal Ready master 18h v1.16.2
ip-10-0-166-244.ec2.internal Ready master 18h v1.16.2
ip-10-0-168-238.ec2.internal Ready worker 17h v1.16.2
$ oc debug node/ip-10-0-132-107.ec2.internal
Starting pod/ip-10-0-132-107ec2internal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# vi /etc/test
sh-4.4# exit
exit
sh-4.2# exit
exit
Removing debug pod ...
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT
master rendered-master-6e7212ec4d57e884f64e32eabff065bb True False False 3 3 3 0
worker rendered-worker-e63e8ae93003df690179be3266b9621e False True True 3 2 2 1
$ oc get pods -n openshift-machine-config-operator -l k8s-app=machine-config-daemon --field-selector spec.nodeName=ip-10-0-132-107.ec2.internal
NAME READY STATUS RESTARTS AGE
machine-config-daemon-k5w9h 2/2 Running 0 3m25s
$ oc -n openshift-machine-config-operator delete pod/machine-config-daemon-k5w9h
pod "machine-config-daemon-k5w9h" deleted
$ oc get pods -n openshift-machine-config-operator -l k8s-app=machine-config-daemon --field-selector spec.nodeName=ip-10-0-132-107.ec2.internal
NAME READY STATUS RESTARTS AGE
machine-config-daemon-jbw99 2/2 Running 0 5s
$ oc -n openshift-machine-config-operator log machine-config-daemon-jbw99 -c machine-config-daemon
...snip...
E1119 17:07:17.742472 102988 writer.go:130] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-e63e8ae93003df690179be3266b9621e
I1119 17:07:25.760929 102988 daemon.go:724] Current+desired config: rendered-worker-e63e8ae93003df690179be3266b9621e
I1119 17:07:25.767167 102988 daemon.go:958] Validating against current config rendered-worker-e63e8ae93003df690179be3266b9621e
E1119 17:07:25.767269 102988 daemon.go:1350] content mismatch for file /etc/test: server foo.example.net maxdelay 0.4 offline
server bar.example.net maxdelay 0.4 offline
A: MY CHANGES HERE
SECOND CHANGES HERE
server baz.example.net maxdelay 0.4 offline
B: server baz.example.net maxdelay 0.4 offline
E1119 17:07:25.767294 102988 writer.go:130] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-e63e8ae93003df690179be3266b9621e
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062 |