Bug 1378883 - oc adm diagnostics checks not existing systemd units
Summary: oc adm diagnostics checks not existing systemd units
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 3.3.0
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
: 3.9.0
Assignee: Luke Meyer
QA Contact: Xingxing Xia
URL:
Whiteboard:
: 1432221 (view as bug list)
Depends On:
Blocks: 1547245 1547246
TreeView+ depends on / blocked
 
Reported: 2016-09-23 12:57 UTC by Mike Fiedler
Modified: 2020-08-13 08:36 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The systemd units for masters changed without the diagnostics for same being updated. Consequence: The diagnostics silently checked for master systemd units that don't exist instead of the ones that do. So problems with the correct units that might have been reported, weren't. Fix: Diagnostics now check for correct master unit names. Result: Problems with master systemd units / logs may be found.
Clone Of:
: 1547245 1547246 (view as bug list)
Environment:
Last Closed: 2018-03-28 14:05:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:05:44 UTC

Description Mike Fiedler 2016-09-23 12:57:06 UTC
Description of problem:

Running oadm diagnostics on 3.3.0.32 always returns 3 errors for non-existent systemd units

ERROR: [DS1004 from controller openshift/origin/pkg/diagnostics/systemd/locate_units.go]
       Unable to run `systemctl show origin-master`: exit status 1
       Cannot analyze systemd units.
       
ERROR: [DS1004 from controller openshift/origin/pkg/diagnostics/systemd/locate_units.go]
       Unable to run `systemctl show origin-node`: exit status 1
       Cannot analyze systemd units.
       
ERROR: [DS1004 from controller openshift/origin/pkg/diagnostics/systemd/locate_units.go]
       Unable to run `systemctl show kubernetes`: exit status 1
       Cannot analyze systemd units.


The correct units in my HAT config are atomic-openshift-master-controllers, atomic-openshift-master-api and atomic-openshift-node


Version-Release number of selected component (if applicable):


3.3.0.32

How reproducible: always


Steps to Reproduce:
1.  On HA (multi-master) master:  oadmin diagnostics --master-config=/etc/origin/master/master-config.yaml


Actual results:

ERROR: [DS1004 from controller openshift/origin/pkg/diagnostics/systemd/locate_units.go]
       Unable to run `systemctl show origin-master`: exit status 1
       Cannot analyze systemd units.
       
ERROR: [DS1004 from controller openshift/origin/pkg/diagnostics/systemd/locate_units.go]
       Unable to run `systemctl show origin-node`: exit status 1
       Cannot analyze systemd units.
       
ERROR: [DS1004 from controller openshift/origin/pkg/diagnostics/systemd/locate_units.go]
       Unable to run `systemctl show kubernetes`: exit status 1
       Cannot analyze systemd units.


Expected results:

status/diagnostics for actual systemd units

Comment 3 Luke Meyer 2016-09-23 19:03:35 UTC
This is a regression. It's supposed to quietly skip units that aren't actually there; I think perhaps systemctl is returning an error code where it didn't before, but in any case, this needs a fix. (BTW it *is* checking for atomic-openshift-node but needs to be updated to handle the split in the master units.)

Comment 4 Mike Fiedler 2016-09-23 19:11:05 UTC
systemd.x86_64  219-30.el7

Comment 5 Luke Meyer 2016-10-10 19:29:41 UTC
Red Hat hasn't released systemd-219-30.el7 AFAICS. Should we expect to see this in the wild soon?

I don't see this with released version systemd-219-19.el7_2.13. If I run `systemctl show something-bogus`, I get back a unit description (for a non-existent unit) and no error.

Comment 6 Mike Fiedler 2016-10-10 20:13:08 UTC
That version is from the ops mirror repo:   https://mirror.openshift.com/enterprise/rhel/rhel7next/os

Should be in the wild in RHEL 7.3

Comment 7 Mike Fiedler 2016-10-11 18:21:47 UTC
Looks like docker-1.12 brings this level of systemd along as well.

Comment 9 Luke Meyer 2016-10-12 17:31:58 UTC
I think this is being counted as a regression in systemd and addressed in https://bugzilla.redhat.com/show_bug.cgi?id=1380259 - as such I'm inclined not to try to work around the changed systemctl behavior.

Comment 10 Luke Meyer 2018-02-08 21:40:42 UTC
*** Bug 1432221 has been marked as a duplicate of this bug. ***

Comment 11 Luke Meyer 2018-02-09 02:43:21 UTC
The original report was due to a regression in systemctl which was eventually addressed in a systemd update. Checking for both Origin and OCP units is normal; it was just the error received in doing so that was a problem.

However I still needed to update the master unit names for the split into -controllers and -api.

https://github.com/openshift/origin/pull/18542 does that.

Comment 12 Luke Meyer 2018-02-09 18:55:58 UTC
Backports:
3.8: https://github.com/openshift/origin/pull/18555
3.7: https://github.com/openshift/origin/pull/18556
3.6: https://github.com/openshift/origin/pull/18557

It does not seem worth filing bugs for earlier releases, nor backporting earlier than 3.6, but both can be done as needed.

Comment 14 XiaochuanWang 2018-02-23 02:41:31 UTC
Not reproduced on oc/openshift v3.9.0-0.47.0
systemd units are quietly skipped, only display one hint "[Note] Performing systemd discovery"

Comment 15 XiaochuanWang 2018-02-23 08:15:18 UTC
Checked for back ports. The issue is not reproduced, same result as 3.9.
oc/openshift v3.6.173.0.104
oc/openshift v3.7.31

Comment 16 XiaochuanWang 2018-02-23 08:29:01 UTC
Checked on oc/openshift v3.8.32 also has the Note info "[Note] Performing systemd discovery"
It could be verified.

Comment 19 errata-xmlrpc 2018-03-28 14:05:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.