1628505 – Ansible processes might get killed when logrotate runs for smart_proxy_dynflow_core

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1628505 - Ansible processes might get killed when logrotate runs for smart_proxy_dynflow_core

Summary: Ansible processes might get killed when logrotate runs for smart_proxy_dynflo...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Remote Execution
Sub Component:
Version:	6.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	6.5.0
Assignee:	Adam Ruzicka
QA Contact:	Lukas Pramuk
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1628145 (view as bug list)
Depends On:
Blocks:	1628145
TreeView+	depends on / blocked

Reported:	2018-09-13 09:29 UTC by Adam Ruzicka
Modified:	2022-03-13 15:33 UTC (History)
CC List:	9 users (show)
Fixed In Version:	tfm-rubygem-smart_proxy_dynflow_core-0.2.0-4
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1646745 (view as bug list)
Environment:
Last Closed:	2019-05-14 12:37:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	24927	0	Normal	Closed	Ansible processes might get killed when logrotate runs for smart_proxy_dynflow_core	2020-02-05 14:23:32 UTC
Red Hat Product Errata	RHSA-2019:1222	0	None	None	None	2019-05-14 12:38:05 UTC

Description Adam Ruzicka 2018-09-13 09:29:08 UTC

Description of problem:
We used systemctl kill ... to deliver SIGUSR1 to smart_proxy_dynflow_core to make it reopen its logs. While systemctl kill delivered the signal reliably, it delivered the signal to every process in the service's control group. The issue was when the service forked more processes (when running ansible-playbook). pkill signals only the processes with matching name.

Ansible-playbook reacts to SIGUSR{1,2} by printing "User defined signal
1" and exiting, which is undesired.


Version-Release number of selected component (if applicable):
6.4


How reproducible:
Always

Steps to Reproduce:
1. Run sleep 1000 using the ansible command template
2. When the job runs, run logrotate -f /etc/logrotate.conf -v
3. (optional) Watch the logrotate output as it may not rotate the logs if they were rotated recently

Actual results:
Logs are rotated, ansible processes die. Possible IOError shown in the jobs output

Expected results:
Logs are rotated, running ansible processes are left intact.

Additional info:

Comment 1 Adam Ruzicka 2018-09-13 09:31:43 UTC

Created redmine issue http://projects.theforeman.org/issues/24927 from this bug

Comment 2 Satellite Program 2018-09-13 10:08:06 UTC

Upstream bug assigned to aruzicka

Comment 3 Satellite Program 2018-09-13 10:08:08 UTC

Upstream bug assigned to aruzicka

Comment 5 Ivan Necas 2018-09-18 11:23:29 UTC

*** Bug 1628145 has been marked as a duplicate of this bug. ***

Comment 7 Patrick Creech 2018-09-24 14:43:31 UTC

snap 23, not 63

Comment 8 Lukas Pramuk 2018-09-26 08:38:41 UTC

FailedQA.

@satellite-6.4.0-15.el7sat.noarch
tfm-rubygem-smart_proxy_dynflow_core-0.2.1-1.el7sat.noarch

using manual reproducer described in comment #0:

0) Remove offending file preventing to logrotate relevant logs
# rm -rf /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20180926.gz

1) Schedule Ansible Command "sleep 1000" against a host and watch ps tree
# watch "ps -efH | grep ^foreman+"

2) Run logrotate -f /etc/logrotate.conf -v
...
rotating log /var/log/foreman-proxy/smart_proxy_dynflow_core.log, log->rotateCount is 5
dateext suffix '-20180926'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
fscreate context set to system_u:object_r:var_log_t:s0
renaming /var/log/foreman-proxy/smart_proxy_dynflow_core.log to /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20180926
...

>>> relevant log rotated

3) upon logrotate ps tree jobs disappeared and REX job failed with IOError:

---
1:
PLAY [all] *********************************************************************
   2:
   3:
TASK [Gathering Facts] *********************************************************
   4:
ok: [host1.example.com]
   5:
   6:
TASK [shell] *******************************************************************
   7:
Runner error: IOError - closed stream
   8:
Exit status: EXCEPTION
---

Comment 14 Lukas Pramuk 2018-10-03 18:05:50 UTC

Upon logrotate ps tree jobs do not disappear and REX job succeeds.
However, logrotate now reports error:
---
rotating log /var/log/foreman-proxy/smart_proxy_dynflow_core.log, log->rotateCount is 5
dateext suffix '-20181003'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
glob finding old rotated logs failed
fscreate context set to system_u:object_r:var_log_t:s0
renaming /var/log/foreman-proxy/smart_proxy_dynflow_core.log to /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20181003
creating new /var/log/foreman-proxy/smart_proxy_dynflow_core.log mode = 0644 uid = 991 gid = 988
running postrotate script
error: error running shared postrotate script for '/var/log/foreman-proxy/smart_proxy_dynflow_core.log '
---

So I'd like to failqa this again what do you mean?

Comment 15 Lukas Pramuk 2018-10-03 20:15:16 UTC

Lets consider using "pkill --signal SIGUSR1 --full ^/usr/bin/smart_proxy_dynflow_core"

Killing ^/usr/bin/smart_proxy_dynflow_core ( instead of /usr/bin/smart_proxy_dynflow_core ) would avoid killing parent process having the same string in commandline:

sh -c "pkill --signal SIGUSR1 --full /usr/bin/smart_proxy_dynflow_core ..."

Comment 16 Lukas Pramuk 2018-10-04 09:55:43 UTC

FailedQA.

@satellite-6.4.0-15.el7sat.noarch
tfm-rubygem-smart_proxy_dynflow_core-0.2.1-2.el7sat.noarch

pkill command is too wide as it is killing also parent (logrotate controlled) proccess

Comment 22 Lukas Pramuk 2018-11-27 00:04:01 UTC

VERIFIED.

@satellite-6.5.0-5.beta.el7sat.noarch
tfm-rubygem-smart_proxy_dynflow_core-0.2.1-3.el7sat.noarch

using manual reproducer described in comment #0:

0) Remove offending file preventing to logrotate relevant logs
# rm -rf /var/log/foreman-proxy/smart_proxy_dynflow_core.log-`date +%Y%m%d`.gz

1) Schedule Ansible Command "sleep 1000" against a host and watch ps tree
# watch "ps -efH | grep ^foreman+"

2) Run logrotate -f /etc/logrotate.conf -v
...
rotating pattern: /var/log/foreman-proxy/smart_proxy_dynflow_core.log  forced from command line (5 rotations)
empty log files are not rotated, old logs are removed
considering log /var/log/foreman-proxy/smart_proxy_dynflow_core.log
  log needs rotating
rotating log /var/log/foreman-proxy/smart_proxy_dynflow_core.log, log->rotateCount is 5
dateext suffix '-20181126'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
glob finding old rotated logs failed
fscreate context set to system_u:object_r:var_log_t:s0
renaming /var/log/foreman-proxy/smart_proxy_dynflow_core.log to /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20181126
creating new /var/log/foreman-proxy/smart_proxy_dynflow_core.log mode = 0644 uid = 993 gid = 991
running postrotate script
compressing log with: /bin/gzip
set default create context to system_u:object_r:var_log_t:s0
...

>>> relevant log rotated successfully (no errors)

3) after logrotate ps tree jobs still present REX job succeded after 1000 secs:

   1:
   2:PLAY [all] *********************************************************************
   3:
   4:TASK [Gathering Facts] *********************************************************
   5:
   6:ok: [host1.example.com]
   7:
   8:TASK [command] *****************************************************************
   9:
  10:changed: [host1.example.com]
  11:
  12:TASK [debug] *******************************************************************
  13:ok: [host1.example.com] => {
  14:    "out": {
  15:        "changed": true, 
  16:        "cmd": "sleep 1000", 
  17:        "delta": "0:16:40.005265", 
  18:        "end": "2018-11-26 18:57:50.891947", 
  19:        "failed": false, 
  20:        "rc": 0, 
  21:        "start": "2018-11-26 18:41:10.886682", 
  22:        "stderr": "", 
  23:        "stderr_lines": [], 
  24:        "stdout": "", 
  25:        "stdout_lines": []
  26:    }
  27:}
  28:
  29:PLAY RECAP *********************************************************************
  30:host1.example.com : ok=3    changed=1    unreachable=0    failed=0   
  31:
  32:
  33:Exit status: 0

Comment 24 errata-xmlrpc 2019-05-14 12:37:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:1222

Note You need to log in before you can comment on or make changes to this bug.