Bug 1789260
| Summary: | update from version 4.2.12-s390x to 4.2.13-s390x fails because of broken node-exporter | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Alexander Klein <alklein> | |
| Component: | Multi-Arch | Assignee: | David Benoit <dbenoit> | |
| Status: | CLOSED ERRATA | QA Contact: | Barry Donahue <bdonahue> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.2.z | CC: | lcosic, palonsor, pgier, psundara, spasquie, vlaad, wking, yselkowi | |
| Target Milestone: | --- | Flags: | alklein:
needinfo-
|
|
| Target Release: | 4.2.z | |||
| Hardware: | s390x | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1791413 (view as bug list) | Environment: | ||
| Last Closed: | 2020-02-24 16:52:45 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1785594, 1791413 | |||
|
Description
Alexander Klein
2020-01-09 08:24:26 UTC
> I think it is a race condition where the monitoring operator degrades after the install finishes.
What is the Degraded Reason/Message? I'd like to search for them in Telemetry data to double-check that this is just an s390x issue and just a 4.2.13 issue.
Also, we've pulled the update edges leading into 4.2.13 out of the recommended-update graph while we look into this [1]. [1]: https://github.com/openshift/cincinnati-graph-data/pull/17 The last condition in `oc describe co/monitoring` is:
Message: Failed to rollout the stack. Error: running task Updating node-exporter failed: reconciling node-exporter DaemonSet failed: updating DaemonSet object failed: waiting for DaemonSetRollout of node-exporter: daemonset node-exporter is not ready. status: (desired: 5, updated: 5, ready: 0, unavailable: 5)
Reason: UpdatingnodeExporterFailed
Status: True
Type: Degraded
It looks like (#572) enabled the routine from PR (#46) which may need support added for s390x. References: (#572) https://github.com/openshift/cluster-monitoring-operator/pull/572/files#diff-ea42f2a19bba3e07005d6b437ed1a902R26 (#46 a) https://github.com/openshift/node_exporter/pull/46/files#diff-0e15beb18c136e9ae4e6a0eec3700896R39 (#46 b) https://github.com/openshift/node_exporter/pull/46/files#diff-0e15beb18c136e9ae4e6a0eec3700896R80-R115 The problem is due to the format of /proc/cpuinfo being different on s390x vs. x86 platforms and the procfs library doesn't handle these differences very well. As a workaround we can skip gathering these metrics on s390x for now as suggested in the related PR (https://github.com/openshift/node_exporter/pull/52). For longer term upstream fix, can someone upload an example of /proc/cpuinfo from an s390x system? Example of /proc/cpuinfo for s390x: vendor_id : IBM/S390 # processors : 4 bogomips per cpu: 3033.00 max thread id : 0 features : esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie facilities : 0 1 2 3 4 6 7 8 9 10 12 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 30 31 32 33 34 35 36 37 40 41 42 43 44 45 46 47 48 49 50 51 52 53 55 57 73 74 75 76 77 80 81 82 128 129 131 cache0 : level=1 type=Data scope=Private size=128K line_size=256 associativity=8 cache1 : level=1 type=Instruction scope=Private size=96K line_size=256 associativity=6 cache2 : level=2 type=Data scope=Private size=2048K line_size=256 associativity=8 cache3 : level=2 type=Instruction scope=Private size=2048K line_size=256 associativity=8 cache4 : level=3 type=Unified scope=Shared size=65536K line_size=256 associativity=16 cache5 : level=4 type=Unified scope=Shared size=491520K line_size=256 associativity=30 processor 0: version = FF, identification = 2733E8, machine = 2964 processor 1: version = FF, identification = 2733E8, machine = 2964 processor 2: version = FF, identification = 2733E8, machine = 2964 processor 3: version = FF, identification = 2733E8, machine = 2964 cpu number : 0 cpu MHz dynamic : 5000 cpu MHz static : 5000 cpu number : 1 cpu MHz dynamic : 5000 cpu MHz static : 5000 cpu number : 2 cpu MHz dynamic : 5000 cpu MHz static : 5000 cpu number : 3 cpu MHz dynamic : 5000 cpu MHz static : 5000 Created upstream PR to add support for reading arm, ppc, and s390x cpuinfo files. https://github.com/prometheus/procfs/pull/257 this is still broken on 4.2.14 which is the version the 'latest' download is pointing to. ALthough the latest is not pointing to it, 4.2.16 is the latest version: https://mirror.openshift.com/pub/openshift-v4/s390x/clients/ocp/4.2.16 - could you try this? It worked for me. i upgraded the cluster with oc adm upgrade, this is fixed in 4.2.16z Need to set Target Release, or the Errata sweeper won't move this to ON_QA. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0460 *** Bug 1795508 has been marked as a duplicate of this bug. *** *** Bug 1829332 has been marked as a duplicate of this bug. *** |