Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1378883 - oc adm diagnostics checks not existing systemd units
oc adm diagnostics checks not existing systemd units
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Command Line Interface (Show other bugs)
3.3.0
x86_64 Linux
unspecified Severity low
: ---
: 3.9.0
Assigned To: Luke Meyer
Xingxing Xia
:
: 1432221 (view as bug list)
Depends On:
Blocks: 1547246 1547245
  Show dependency treegraph
 
Reported: 2016-09-23 08:57 EDT by Mike Fiedler
Modified: 2018-03-28 10:05 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The systemd units for masters changed without the diagnostics for same being updated. Consequence: The diagnostics silently checked for master systemd units that don't exist instead of the ones that do. So problems with the correct units that might have been reported, weren't. Fix: Diagnostics now check for correct master unit names. Result: Problems with master systemd units / logs may be found.
Story Points: ---
Clone Of:
: 1547245 1547246 (view as bug list)
Environment:
Last Closed: 2018-03-28 10:05:01 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 None None None 2018-03-28 10:05 EDT

  None (edit)
Description Mike Fiedler 2016-09-23 08:57:06 EDT
Description of problem:

Running oadm diagnostics on 3.3.0.32 always returns 3 errors for non-existent systemd units

ERROR: [DS1004 from controller openshift/origin/pkg/diagnostics/systemd/locate_units.go]
       Unable to run `systemctl show origin-master`: exit status 1
       Cannot analyze systemd units.
       
ERROR: [DS1004 from controller openshift/origin/pkg/diagnostics/systemd/locate_units.go]
       Unable to run `systemctl show origin-node`: exit status 1
       Cannot analyze systemd units.
       
ERROR: [DS1004 from controller openshift/origin/pkg/diagnostics/systemd/locate_units.go]
       Unable to run `systemctl show kubernetes`: exit status 1
       Cannot analyze systemd units.


The correct units in my HAT config are atomic-openshift-master-controllers, atomic-openshift-master-api and atomic-openshift-node


Version-Release number of selected component (if applicable):


3.3.0.32

How reproducible: always


Steps to Reproduce:
1.  On HA (multi-master) master:  oadmin diagnostics --master-config=/etc/origin/master/master-config.yaml


Actual results:

ERROR: [DS1004 from controller openshift/origin/pkg/diagnostics/systemd/locate_units.go]
       Unable to run `systemctl show origin-master`: exit status 1
       Cannot analyze systemd units.
       
ERROR: [DS1004 from controller openshift/origin/pkg/diagnostics/systemd/locate_units.go]
       Unable to run `systemctl show origin-node`: exit status 1
       Cannot analyze systemd units.
       
ERROR: [DS1004 from controller openshift/origin/pkg/diagnostics/systemd/locate_units.go]
       Unable to run `systemctl show kubernetes`: exit status 1
       Cannot analyze systemd units.


Expected results:

status/diagnostics for actual systemd units
Comment 3 Luke Meyer 2016-09-23 15:03:35 EDT
This is a regression. It's supposed to quietly skip units that aren't actually there; I think perhaps systemctl is returning an error code where it didn't before, but in any case, this needs a fix. (BTW it *is* checking for atomic-openshift-node but needs to be updated to handle the split in the master units.)
Comment 4 Mike Fiedler 2016-09-23 15:11:05 EDT
systemd.x86_64  219-30.el7
Comment 5 Luke Meyer 2016-10-10 15:29:41 EDT
Red Hat hasn't released systemd-219-30.el7 AFAICS. Should we expect to see this in the wild soon?

I don't see this with released version systemd-219-19.el7_2.13. If I run `systemctl show something-bogus`, I get back a unit description (for a non-existent unit) and no error.
Comment 6 Mike Fiedler 2016-10-10 16:13:08 EDT
That version is from the ops mirror repo:   https://mirror.openshift.com/enterprise/rhel/rhel7next/os

Should be in the wild in RHEL 7.3
Comment 7 Mike Fiedler 2016-10-11 14:21:47 EDT
Looks like docker-1.12 brings this level of systemd along as well.
Comment 9 Luke Meyer 2016-10-12 13:31:58 EDT
I think this is being counted as a regression in systemd and addressed in https://bugzilla.redhat.com/show_bug.cgi?id=1380259 - as such I'm inclined not to try to work around the changed systemctl behavior.
Comment 10 Luke Meyer 2018-02-08 16:40:42 EST
*** Bug 1432221 has been marked as a duplicate of this bug. ***
Comment 11 Luke Meyer 2018-02-08 21:43:21 EST
The original report was due to a regression in systemctl which was eventually addressed in a systemd update. Checking for both Origin and OCP units is normal; it was just the error received in doing so that was a problem.

However I still needed to update the master unit names for the split into -controllers and -api.

https://github.com/openshift/origin/pull/18542 does that.
Comment 12 Luke Meyer 2018-02-09 13:55:58 EST
Backports:
3.8: https://github.com/openshift/origin/pull/18555
3.7: https://github.com/openshift/origin/pull/18556
3.6: https://github.com/openshift/origin/pull/18557

It does not seem worth filing bugs for earlier releases, nor backporting earlier than 3.6, but both can be done as needed.
Comment 14 XiaochuanWang 2018-02-22 21:41:31 EST
Not reproduced on oc/openshift v3.9.0-0.47.0
systemd units are quietly skipped, only display one hint "[Note] Performing systemd discovery"
Comment 15 XiaochuanWang 2018-02-23 03:15:18 EST
Checked for back ports. The issue is not reproduced, same result as 3.9.
oc/openshift v3.6.173.0.104
oc/openshift v3.7.31
Comment 16 XiaochuanWang 2018-02-23 03:29:01 EST
Checked on oc/openshift v3.8.32 also has the Note info "[Note] Performing systemd discovery"
It could be verified.
Comment 19 errata-xmlrpc 2018-03-28 10:05:01 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489

Note You need to log in before you can comment on or make changes to this bug.