Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1906570

Summary:	Number of disruptions caused by reboots on a cluster cannot be measured
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	Monitoring	Assignee:	Sergiusz Urbaniak <surbania>
Status:	CLOSED ERRATA	QA Contact:	Junqi Zhao <juzhao>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.7	CC:	alegrand, anpicker, erooth, juzhao, kakkoyun, lcosic, mnguyen, pkrupa, surbania
Target Milestone:	---
Target Release:	4.7.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-02-24 15:41:57 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Clayton Coleman 2020-12-10 19:19:55 UTC

We currently lack a metric that tells us how many reboots have occcurred on a cluster. A reboot impacts availability, tells us when admins or hardware decide to take an outage, and might be an accidental outcome of our software incorrectly changing. By tracking a counter of reboots per node we can track the total amount of reboots over time and gain better insight into how machines are managed by environment.

The wtmp log (accessible via last on RHCOS) represents an effective counter for boots. Our node_exporter should read wtmp on startup and write a count of number of boots to a textfile collector reported via the node, and we should sum that over all nodes and report back to telemetry the number of reboots.

This is a part of overall insight into disruption we inject into customer clusters.

Comment 8 errata-xmlrpc 2021-02-24 15:41:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633