Bug 1892448 - MCDPivotError alert/metric missing
Summary: MCDPivotError alert/metric missing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.7.0
Assignee: Kirsten Garrison
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1893409
TreeView+ depends on / blocked
 
Reported: 2020-10-28 19:18 UTC by Kirsten Garrison
Modified: 2021-02-24 15:29 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:28:37 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2189 0 None closed Bug 1892448: daemon: add back metrics for pivot error 2021-02-11 06:09:25 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:29:05 UTC

Comment 2 Michael Nguyen 2020-11-20 00:59:10 UTC
Verified on 4.7.0-0.nightly-2020-11-18-125028.

To introduce a MCDPivotError, create a bastion so you can SSH to one of the nodes.  Then run this script to delete the ostree extracted from the machine-os-content.

while true; do rm -rf /run/mco-machine-os-content/os-content*/srv/repo; sleep 1; done

Run an upgrade where the OS will be upgraded.

Run `oc -n openshift-monitoring get routes` to get the Prometheus web login.  Login and click 'Alerts'.  Verify that there is an active alert for MCDPivotError.

--------------------


MCDPivotError (1 active)

alert: MCDPivotError
expr: mcd_pivot_err > 0
labels:
  severity: warning
annotations:
  message: 'Error detected in pivot logs on {{ $labels.node }} '

Labels 	State 	Active Since 	Value
alertname="MCDPivotError" container="oauth-proxy" endpoint="metrics" err="failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:119e7a1d4f52c626e893cf3623bb0f7499613b207066dfdb6afb8b4a5a1f834a : with stdout output: : error running rpm-ostree rebase --experimental /run/mco-machine-os-content/os-content-286336622/srv/repo:eec52ee9a0a6d9917e68bdda14c92eceea96ed27966d32236fce47cd595245d5 --custom-origin-url pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:119e7a1d4f52c626e893cf3623bb0f7499613b207066dfdb6afb8b4a5a1f834a --custom-origin-description Managed by machine-config-operator: error: opendir(/run/mco-machine-os-content/os-content-286336622/srv/repo): No such file or directory : exit status 1" instance="10.0.156.42:9001" job="machine-config-daemon" namespace="openshift-machine-config-operator" node="ip-10-0-156-42.us-west-2.compute.internal" pivot_target="quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:119e7a1d4f52c626e893cf3623bb0f7499613b207066dfdb6afb8b4a5a1f834a" pod="machine-config-daemon-r76n5" service="machine-config-daemon" severity="warning" 	firing 	2020-11-20 00:52:32.044526034 +0000 UTC 	1.605833515369452e+09

Comment 5 errata-xmlrpc 2021-02-24 15:28:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.