RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1638376 - pcsd fails to start if configured to listen on specific address
Summary: pcsd fails to start if configured to listen on specific address
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcs
Version: 7.3
Hardware: Unspecified
OS: All
medium
low
Target Milestone: rc
: ---
Assignee: Tomas Jelinek
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-11 12:19 UTC by Oliver Falk
Modified: 2020-03-09 10:45 UTC (History)
10 users (show)

Fixed In Version: pcs-0.9.167-1.el7
Doc Type: Bug Fix
Doc Text:
.The `pcsd` service now starts when the network is ready Previously, When a user configured `pcsd` to bind to a specific IP address and the address was not ready during boot when `pcsd` attempted to start up, then `pcsd` failed to start and a manual intervention was required to start `pcsd`. With this fix, `pcsd.service` depends on `network-online.target`. As a result, `pcsd` starts when the network is ready and is able to bind to an IP address.
Clone Of:
: 1640477 (view as bug list)
Environment:
Last Closed: 2019-08-06 13:10:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
proposed fix (1.58 KB, patch)
2018-11-20 12:57 UTC, Tomas Jelinek
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3655221 0 None None None 2018-12-03 21:03:41 UTC
Red Hat Product Errata RHBA-2019:2244 0 None None None 2019-08-06 13:10:12 UTC

Description Oliver Falk 2018-10-11 12:19:35 UTC
Description of problem:
When configuring pcsd to listen on a specific address in /etc/sysconfig/pcsd via PCSD_BIND_ADDR variable, it fails to start if network is started later.


Version-Release number of selected component (if applicable):
pcs-0.9.x on RHEL 7.3 (customer environment), but reproducible on RHEL 7.6 (beta) as well.

How reproducible:
Always.

Steps to Reproduce:
1. Configure PCSD_BIND_ADDR in /etc/sysconfig/pcsd
2. Restart the system

Actual results:
pcsd doesn't start upon boot, when configured to use a specific address and network comes up later.

Expected results:
pcsd waits for the network to be up (network.target)

Additional info:
Easyfix:
Add the following two lines to pcsd.service (under [Unit] section):

    Requires=network.target
    After=network.target

Comment 2 Tomas Jelinek 2018-10-12 12:00:11 UTC
I am not able to reproduce this on my systems. I do not think this is reproducible with static IP configuration. Even if I switched my cluster to DHCP, I was not able to reproduce it. My virtual machines and DHCP server in their hypervisor seem to be too fast. Reproducibility is definitely not "always".

The proposed flags
Requires=network.target
After=network.target
are unlikely to fix the issue, network.target gets satisfied before IP addresses get assigned to network interfaces. We need network-online.target.

Using IP_FREEBIND socket option is not possible because ruby libraries and framework pcsd is built on do not provide means to specify that option.

Comment 3 Oliver Falk 2018-10-12 12:37:17 UTC
Tomas - thanks for checking. I only reproduced it with an IP that is actually not available on my system - this was reproducible all the time.

Indeed, network-online.target makes more sense, I completely agree with you! Freebind was also my idea, but I didn't find it documented anywhere, that's why I didn't mention it.

Comment 4 Peter Vreman 2018-10-12 13:02:07 UTC
Tomas,

Please verify on a physical server with multiple network cards.
The problem is 100% reproducible on a HPE MC990X server, even with a single chassis.

Peter

Comment 5 Peter Vreman 2018-10-12 13:08:39 UTC
The use of network instead of network-online came from allmost all other daemons using this:

grep -R After /usr/lib/systemd/system | grep network | grep -v online

Comment 7 Tomas Jelinek 2018-10-12 13:20:11 UTC
(In reply to Peter Vreman from comment #5)
> The use of network instead of network-online came from allmost all other
> daemons using this:
> 
> grep -R After /usr/lib/systemd/system | grep network | grep -v online

It's likely other daemons use network.target. Usually, daemons are configured to listen anywhere. Or they use IP_FREEBIND. We cannot do the latter. The former is default configuration for pcsd, which got changed by the user, so this option is also ruled out.

There is a similar bz against corosync which got resolved by configuring corosync to wait for network-online.target.

Comment 8 Peter Vreman 2018-10-12 13:34:40 UTC
Tomas,

I can confirm using network-online.target also works.
My first try to fix was to take the missing [Unit] lines from corosync.service (including the 'nocluster' check).
Then in the boot sequence i noticed that most daemons were started earlier and changed to network.target.

Peter

Comment 9 Tomas Jelinek 2018-10-18 07:58:30 UTC
We will go the network-online.target way.

Comment 12 Tomas Jelinek 2018-11-20 12:57:54 UTC
Created attachment 1507408 [details]
proposed fix

Comment 19 Ivan Devat 2019-03-21 10:18:31 UTC
After Fix:

[kid76 ~] $ rpm -q pcs
pcs-0.9.167-1.el7.x86_64

[kid76 ~] $ cat /usr/lib/systemd/system/pcsd.service|grep network-online
Requires=network-online.target
After=network-online.target

Comment 23 errata-xmlrpc 2019-08-06 13:10:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2244


Note You need to log in before you can comment on or make changes to this bug.