RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1415218 - sshd sometimes does not correctly write pid file
Summary: sshd sometimes does not correctly write pid file
Keywords:
Status: CLOSED DUPLICATE of bug 1381997
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: openssh
Version: 7.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Jakub Jelen
QA Contact: BaseOS QE Security Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-20 15:18 UTC by Gabriele Cerami
Modified: 2020-06-11 13:13 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-23 14:57:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Gabriele Cerami 2017-01-20 15:18:00 UTC
Description of problem:
In our openstack tests openssh-server is installed in a basic CentOS 7.3 image and launched by specific puppet modules.
We're noticing an intermittent problem during service start by systemd; very often this log entry appears in journal

PID file /var/run/sshd.pid not readable (yet?) after start.

and sometimes systemd fails after some retries to mark sshd service as started even if sshd is really up, running and logging connections.

First occurrence of the failure is on 10th January. The process was using
openssh-server version 6.6.1p1-25 until the mid of december

Version-Release number of selected component (if applicable):
Version: 6.6.1p1
Release: 31-el7

How reproducible:
The failure is intermittent, but seems to happen fairly often. We had 53 failures over about 500 runs over the last day

Steps to Reproduce:
1. install openssh-server
2. systemctl start sshd
3. wait for service to be started

Actual results:
sometimes, the service is marked as failed

Expected results:
the service always starts correctly

Additional info:
this is an example log of a failing run

http://logs.openstack.org/73/422673/3/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/5b95cac/logs/undercloud/var/log/journal-text.txt.gz

navigable logs for the host can be found at 

http://logs.openstack.org/73/422673/3/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/5b95cac/logs/undercloud/

and complete (/etc/ and /var/log dirs) collection of the host is compressed at 

http://logs.openstack.org/73/422673/3/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/5b95cac/logs/undercloud.tar.xz

Comment 1 Jakub Jelen 2017-01-20 15:44:22 UTC
This is known problem. SSHD writes it always correctly, but systemd is unable to track it (tries to read it in wrong moments, does not re-read it and is unnecessarily noisy). We have already filled two related bugs #1381997 and #1398360.

Unfortunately, there is no known solution except "building against systemd" as Debian does now (which is not acceptable by upstream [1]). But we certainly want to address this in RHEL7.4 so patience, unless it has very high priority to get it fixed earlier.

[1] https://bugzilla.mindrot.org/show_bug.cgi?id=2641

Comment 2 Michele Baldessari 2017-01-20 16:09:16 UTC
Thanks for the quick info Jakub, much appreciated ;)

Since we started seeing this only recently with version 6.6.1p1-31-el7 and we did not see it previously with 6.6.1p1-25-el7, if we somehow would manage to revert to that version would that be an okay workaround or were we just lucky and the -25 version is affected as well?

Thanks,
Michele

Comment 3 Michele Baldessari 2017-01-20 16:29:09 UTC
Or any other workarounds as it seems it is not entirely trivial to pin openssh to a previous version in our CI infrastructure. Thanks

Comment 4 Jakub Jelen 2017-01-20 16:41:24 UTC
The previous version had a problem with related issue (bug #1291172) which got fixed, but this awkward behavior showed up (and slipped though our testing).

You revert the commit below (change in service file) to restore the old behavior:

--- a/sshd.service
+++ b/sshd.service
@@ -5,8 +5,10 @@ After=network.target sshd-keygen.service
 Wants=sshd-keygen.service
 
 [Service]
+Type=forking
+PIDFile=/var/run/sshd.pid
 EnvironmentFile=/etc/sysconfig/sshd
-ExecStart=/usr/sbin/sshd -D $OPTIONS
+ExecStart=/usr/sbin/sshd $OPTIONS
 ExecReload=/bin/kill -HUP $MAINPID
 KillMode=process
 Restart=on-failure

Comment 6 Perry Myers 2017-02-02 13:26:58 UTC
@jjelen: This bug affects some percentage (10%) of all RHOSP users trying to install RHOSP undercloud on top of RHEL 7.3

Unfortunately, the workaround proposed above which might help us in our CI environments, is not suitable really to bake into our RHOSP Director images.

I have set the prio/sev to this to High because it results in an install failure which will affect customers and our field users. The only reason it is not urgent is because it's not 100% reproducible and only affects folks in an intermittent fashion.

Can we please investigate a resolution to this issue and prioritize the fix for a backport to 7.3.z once the fix is uncovered?

Comment 18 Rob Young 2017-02-23 14:28:53 UTC
@salmy Please see comment 17 above on the context and urgency on this. @michele, @gcerami can provide risk assessment of adding to z-stream.

Comment 20 Jakub Jelen 2017-02-23 14:57:01 UTC
Closing this bug.

Please lets continue discussion in the duplicate. I will update the other bug with the proposed solution so also the other customers can verify the proposed solution.

*** This bug has been marked as a duplicate of bug 1381997 ***


Note You need to log in before you can comment on or make changes to this bug.