1376819 – "systemctl start corosync-qnetd" returns 0 even if corosync-qnetd fails to start

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1376819 - "systemctl start corosync-qnetd" returns 0 even if corosync-qnetd fails to start

Summary: "systemctl start corosync-qnetd" returns 0 even if corosync-qnetd fails to start

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	corosync
Sub Component:
Version:	7.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Jan Friesse
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-09-16 13:43 UTC by Tomas Jelinek
Modified:	2019-08-06 13:10 UTC (History)
CC List:	4 users (show)
Fixed In Version:	corosync-2.4.3-5.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-08-06 13:10:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
qnetd: Check existence of NSS DB dir before fork (3.64 KB, patch) 2019-03-19 15:36 UTC, Jan Friesse	no flags	Details \| Diff
Use RuntimeDirectory instead of tmpfiles.d (7.05 KB, patch) 2019-03-21 07:30 UTC, Jan Friesse	no flags	Details \| Diff
configure: add --with-initconfigdir option (9.31 KB, patch) 2019-03-21 07:43 UTC, Jan Friesse	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:2245	0	None	None	None	2019-08-06 13:10:24 UTC

Description Tomas Jelinek 2016-09-16 13:43:00 UTC

Description of problem:
"systemctl start corosync-qnetd" returns 0 even if corosync-qnetd failed to start, "systemctl status corosync-qnetd" shows status correctly


Version-Release number of selected component (if applicable):
corosync-qnetd-2.4.0-4.el7


How reproducible:
always, easily


Steps to Reproduce:
[root@rh72-node3:~]# ls -l /etc/corosync/qdevice/net/
total 0
[root@rh72-node3:~]# systemctl start corosync-qnetd.service
[root@rh72-node3:~]# echo $?
0
[root@rh72-node3:~]# systemctl status corosync-qnetd.service
● corosync-qnetd.service - Corosync Qdevice Network daemon
   Loaded: loaded (/usr/lib/systemd/system/corosync-qnetd.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2016-09-16 15:34:01 CEST; 8s ago
  Process: 6602 ExecStart=/usr/bin/corosync-qnetd -f $COROSYNC_QNETD_OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 6602 (code=exited, status=1/FAILURE)

Sep 16 15:34:01 rh72-node3 systemd[1]: Started Corosync Qdevice Network daemon.
Sep 16 15:34:01 rh72-node3 systemd[1]: Starting Corosync Qdevice Network daemon...
Sep 16 15:34:01 rh72-node3 systemd[1]: corosync-qnetd.service: main process exited, code=exited, status=1/FAILURE
Sep 16 15:34:01 rh72-node3 systemd[1]: Unit corosync-qnetd.service entered failed state.
Sep 16 15:34:01 rh72-node3 systemd[1]: corosync-qnetd.service failed.


Actual results:
[root@rh72-node3:~]# systemctl start corosync-qnetd.service
[root@rh72-node3:~]# echo $?
0


Expected results:
[root@rh72-node3:~]# systemctl start corosync-qnetd.service
[root@rh72-node3:~]# echo $?
1

Comment 1 Jan Friesse 2016-09-16 14:06:44 UTC

Problem is because of behavior of "simple" unit type. Solution seems to be to migrate to notify. It's 7.4 material.

Comment 6 Jan Friesse 2019-03-19 15:36:30 UTC

Created attachment 1545733 [details]
qnetd: Check existence of NSS DB dir before fork

qnetd: Check existence of NSS DB dir before fork

Previously, when user tried start corosync-qnetd without
initialized NSS database then generic (not very helpful
and misleading) NSS error was logged
"NSS error (-8015): The certificate/key database is in an old,
unsupported format.".

Solution is to check if it's possible to open NSS DB directory and
display (usually much more informative) result of strerror function.

Such check is called before fork, so init system can return error code
during start.

To make error reporting work with systemd it's also needed to change
unit type from simple to forking.

Signed-off-by: Jan Friesse <jfriesse>
Reviewed-by: Christine Caulfield <ccaulfie>

Comment 7 Jan Friesse 2019-03-19 15:53:11 UTC

For QA: Patch solves only described scenario when NSS DB doesn't exists (what is probably most common reason of failure). All other "failures" are handled after fork so behavior is same as before.

What I've tested:
# ls -la /etc/corosync/qnetd 
ls: cannot access /etc/corosync/qnetd: No such file or directory

# systemctl start corosync-qnetd; echo $?
Job for corosync-qnetd.service failed because the control process exited with error code. See "systemctl status corosync-qnetd.service" and "journalctl -xe" for details.
1

# journalctl  _COMM=corosync-qnetd
* corosync-qnetd[*]: Can't open NSS DB directory (2): No such file or directory

After creating of CA:
# systemctl start corosync-qnetd; echo $?
0

# systemctl status corosync-qnetd; echo $?
● corosync-qnetd.service - Corosync Qdevice Network daemon
   Loaded: loaded (/usr/lib/systemd/system/corosync-qnetd.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2019-03-19 16:51:27 CET; 2s ago
     Docs: man:corosync-qnetd
  Process: 17632 ExecStart=/usr/bin/corosync-qnetd $COROSYNC_QNETD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 17633 (corosync-qnetd)
...
0

Please note that error message is now (hopefully) more understandable - "No such file or directory" vs "NSS error (-8015): The certificate/key database is in an old, unsupported format.".

Comment 8 Jan Friesse 2019-03-21 07:30:11 UTC

Created attachment 1546365 [details]
Use RuntimeDirectory instead of tmpfiles.d

Use RuntimeDirectory instead of tmpfiles.d
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This reverts part of commit 32123f6bb2ebc4f9ac7865945cc85a9c9b903dc6.

A simple directive is a much lighter solution to the same problem, and
automatically follows the specified User.  I copied the 0770 modes from
the corresponding init scripts; they could use a little documentation.

Signed-off-by: Ferenc Wágner <wferi>
Reviewed-by: Jan Friesse <jfriesse>
(cherry picked from commit c733e9417ef1d2f31268e9b6f99a8fc7712fcea7)

Comment 9 Jan Friesse 2019-03-21 07:43:48 UTC

Created attachment 1546367 [details]
configure: add --with-initconfigdir option

configure: add --with-initconfigdir option
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Default value is /etc/sysconfig and resulting
INITCONFIGDIR is used to reduce duplication in init system
integration code.

Signed-off-by: Ferenc Wágner <wferi>
Reviewed-by: Jan Friesse <jfriesse>
(cherry picked from commit d7208e88370d2bce40b45224a3971eeb68c22d3c)

Comment 11 Michal Mazourek 2019-06-12 06:36:32 UTC

BEFORE (corosync-qnetd-2.4.0-4.el7)
======
## Checks, that NSS DB doesn't exist
root@host-027 ~]# ls -l /etc/corosync/qnetd
ls: cannot access /etc/corosync/qnetd: No such file or directory

## Start the service without NSS DB, which should fail
[root@host-027 ~]# systemctl start corosync-qnetd.service
[root@host-027 ~]# echo $?
0

## systemctl shows status correctly
[root@host-027 ~]# systemctl status corosync-qnetd.service
● corosync-qnetd.service - Corosync Qdevice Network daemon
   Loaded: loaded (/usr/lib/systemd/system/corosync-qnetd.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2019-06-07 01:58:57 CDT; 2s ago
  Process: 8941 ExecStart=/usr/bin/corosync-qnetd -f $COROSYNC_QNETD_OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 8941 (code=exited, status=1/FAILURE)

Jun 07 01:58:57 host-027.virt.lab.msp.redhat.com systemd[1]: Started Corosync Qdevice Network daemon.
Jun 07 01:58:57 host-027.virt.lab.msp.redhat.com systemd[1]: corosync-qnetd.service: main process ex...RE
Jun 07 01:58:57 host-027.virt.lab.msp.redhat.com systemd[1]: Unit corosync-qnetd.service entered fai...e.
Jun 07 01:58:57 host-027.virt.lab.msp.redhat.com systemd[1]: corosync-qnetd.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

AFTER (corosync-qnetd-2.4.3-6.el7)
=====
## NSS DB doesn't exist
root@host-027 ~]# ls -l /etc/corosync/qnetd
ls: cannot access /etc/corosync/qnetd: No such file or directory

## systemctl start is now failing, as it should
[root@host-027 ~]# systemctl start corosync-qnetd.service
Job for corosync-qnetd.service failed because the control process exited with error code. See "systemctl status corosync-qnetd.service" and "journalctl -xe" for details.
[root@host-027 ~]# echo $?
1
[root@host-027 ~]# journalctl _COMM=corosync-qnetd
...
* corosync-qnetd[9809]: Can't open NSS DB directory (2): No such file or directory
...

RESULT
======
After the fix, corosync-qnetd service returns error status, when it is started without NSS DB.
Note: Both versions work fine when NSS DB is created
# corosync-qnetd-certutil -i

Verified for version corosync-2.4.3-6.el7

Comment 13 errata-xmlrpc 2019-08-06 13:10:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2245

Note You need to log in before you can comment on or make changes to this bug.