Bug 1892448

Summary: MCDPivotError alert/metric missing
Product: OpenShift Container Platform Reporter: Kirsten Garrison <kgarriso>
Component: Machine Config OperatorAssignee: Kirsten Garrison <kgarriso>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.6CC: jerzhang, skumari
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:28:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1893409    

Comment 2 Michael Nguyen 2020-11-20 00:59:10 UTC
Verified on 4.7.0-0.nightly-2020-11-18-125028.

To introduce a MCDPivotError, create a bastion so you can SSH to one of the nodes.  Then run this script to delete the ostree extracted from the machine-os-content.

while true; do rm -rf /run/mco-machine-os-content/os-content*/srv/repo; sleep 1; done

Run an upgrade where the OS will be upgraded.

Run `oc -n openshift-monitoring get routes` to get the Prometheus web login.  Login and click 'Alerts'.  Verify that there is an active alert for MCDPivotError.


MCDPivotError (1 active)

alert: MCDPivotError
expr: mcd_pivot_err > 0
  severity: warning
  message: 'Error detected in pivot logs on {{ $labels.node }} '

Labels 	State 	Active Since 	Value
alertname="MCDPivotError" container="oauth-proxy" endpoint="metrics" err="failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:119e7a1d4f52c626e893cf3623bb0f7499613b207066dfdb6afb8b4a5a1f834a : with stdout output: : error running rpm-ostree rebase --experimental /run/mco-machine-os-content/os-content-286336622/srv/repo:eec52ee9a0a6d9917e68bdda14c92eceea96ed27966d32236fce47cd595245d5 --custom-origin-url pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:119e7a1d4f52c626e893cf3623bb0f7499613b207066dfdb6afb8b4a5a1f834a --custom-origin-description Managed by machine-config-operator: error: opendir(/run/mco-machine-os-content/os-content-286336622/srv/repo): No such file or directory : exit status 1" instance="" job="machine-config-daemon" namespace="openshift-machine-config-operator" node="ip-10-0-156-42.us-west-2.compute.internal" pivot_target="quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:119e7a1d4f52c626e893cf3623bb0f7499613b207066dfdb6afb8b4a5a1f834a" pod="machine-config-daemon-r76n5" service="machine-config-daemon" severity="warning" 	firing 	2020-11-20 00:52:32.044526034 +0000 UTC 	1.605833515369452e+09

Comment 5 errata-xmlrpc 2021-02-24 15:28:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.