Bug 1628505 - Ansible processes might get killed when logrotate runs for smart_proxy_dynflow_core
Summary: Ansible processes might get killed when logrotate runs for smart_proxy_dynflo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Remote Execution
Version: 6.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: 6.5.0
Assignee: Adam Ruzicka
QA Contact: Lukas Pramuk
URL:
Whiteboard:
: 1628145 (view as bug list)
Depends On:
Blocks: 1628145
TreeView+ depends on / blocked
 
Reported: 2018-09-13 09:29 UTC by Adam Ruzicka
Modified: 2022-03-13 15:33 UTC (History)
9 users (show)

Fixed In Version: tfm-rubygem-smart_proxy_dynflow_core-0.2.0-4
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1646745 (view as bug list)
Environment:
Last Closed: 2019-05-14 12:37:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 24927 0 Normal Closed Ansible processes might get killed when logrotate runs for smart_proxy_dynflow_core 2020-02-05 14:23:32 UTC
Red Hat Product Errata RHSA-2019:1222 0 None None None 2019-05-14 12:38:05 UTC

Description Adam Ruzicka 2018-09-13 09:29:08 UTC
Description of problem:
We used systemctl kill ... to deliver SIGUSR1 to smart_proxy_dynflow_core to make it reopen its logs. While systemctl kill delivered the signal reliably, it delivered the signal to every process in the service's control group. The issue was when the service forked more processes (when running ansible-playbook). pkill signals only the processes with matching name.

Ansible-playbook reacts to SIGUSR{1,2} by printing "User defined signal
1" and exiting, which is undesired.


Version-Release number of selected component (if applicable):
6.4


How reproducible:
Always

Steps to Reproduce:
1. Run sleep 1000 using the ansible command template
2. When the job runs, run logrotate -f /etc/logrotate.conf -v
3. (optional) Watch the logrotate output as it may not rotate the logs if they were rotated recently

Actual results:
Logs are rotated, ansible processes die. Possible IOError shown in the jobs output

Expected results:
Logs are rotated, running ansible processes are left intact.

Additional info:

Comment 1 Adam Ruzicka 2018-09-13 09:31:43 UTC
Created redmine issue http://projects.theforeman.org/issues/24927 from this bug

Comment 2 Satellite Program 2018-09-13 10:08:06 UTC
Upstream bug assigned to aruzicka

Comment 3 Satellite Program 2018-09-13 10:08:08 UTC
Upstream bug assigned to aruzicka

Comment 5 Ivan Necas 2018-09-18 11:23:29 UTC
*** Bug 1628145 has been marked as a duplicate of this bug. ***

Comment 7 Patrick Creech 2018-09-24 14:43:31 UTC
snap 23, not 63

Comment 8 Lukas Pramuk 2018-09-26 08:38:41 UTC
FailedQA.

@satellite-6.4.0-15.el7sat.noarch
tfm-rubygem-smart_proxy_dynflow_core-0.2.1-1.el7sat.noarch

using manual reproducer described in comment #0:

0) Remove offending file preventing to logrotate relevant logs
# rm -rf /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20180926.gz

1) Schedule Ansible Command "sleep 1000" against a host and watch ps tree
# watch "ps -efH | grep ^foreman+"

2) Run logrotate -f /etc/logrotate.conf -v
...
rotating log /var/log/foreman-proxy/smart_proxy_dynflow_core.log, log->rotateCount is 5
dateext suffix '-20180926'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
fscreate context set to system_u:object_r:var_log_t:s0
renaming /var/log/foreman-proxy/smart_proxy_dynflow_core.log to /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20180926
...

>>> relevant log rotated

3) upon logrotate ps tree jobs disappeared and REX job failed with IOError:

---
1:
PLAY [all] *********************************************************************
   2:
   3:
TASK [Gathering Facts] *********************************************************
   4:
ok: [host1.example.com]
   5:
   6:
TASK [shell] *******************************************************************
   7:
Runner error: IOError - closed stream
   8:
Exit status: EXCEPTION
---

Comment 14 Lukas Pramuk 2018-10-03 18:05:50 UTC
Upon logrotate ps tree jobs do not disappear and REX job succeeds.
However, logrotate now reports error:
---
rotating log /var/log/foreman-proxy/smart_proxy_dynflow_core.log, log->rotateCount is 5
dateext suffix '-20181003'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
glob finding old rotated logs failed
fscreate context set to system_u:object_r:var_log_t:s0
renaming /var/log/foreman-proxy/smart_proxy_dynflow_core.log to /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20181003
creating new /var/log/foreman-proxy/smart_proxy_dynflow_core.log mode = 0644 uid = 991 gid = 988
running postrotate script
error: error running shared postrotate script for '/var/log/foreman-proxy/smart_proxy_dynflow_core.log '
---

So I'd like to failqa this again what do you mean?

Comment 15 Lukas Pramuk 2018-10-03 20:15:16 UTC
Lets consider using "pkill --signal SIGUSR1 --full ^/usr/bin/smart_proxy_dynflow_core"

Killing ^/usr/bin/smart_proxy_dynflow_core ( instead of /usr/bin/smart_proxy_dynflow_core ) would avoid killing parent process having the same string in commandline:

sh -c "pkill --signal SIGUSR1 --full /usr/bin/smart_proxy_dynflow_core ..."

Comment 16 Lukas Pramuk 2018-10-04 09:55:43 UTC
FailedQA.

@satellite-6.4.0-15.el7sat.noarch
tfm-rubygem-smart_proxy_dynflow_core-0.2.1-2.el7sat.noarch

pkill command is too wide as it is killing also parent (logrotate controlled) proccess

Comment 22 Lukas Pramuk 2018-11-27 00:04:01 UTC
VERIFIED.

@satellite-6.5.0-5.beta.el7sat.noarch
tfm-rubygem-smart_proxy_dynflow_core-0.2.1-3.el7sat.noarch

using manual reproducer described in comment #0:

0) Remove offending file preventing to logrotate relevant logs
# rm -rf /var/log/foreman-proxy/smart_proxy_dynflow_core.log-`date +%Y%m%d`.gz

1) Schedule Ansible Command "sleep 1000" against a host and watch ps tree
# watch "ps -efH | grep ^foreman+"

2) Run logrotate -f /etc/logrotate.conf -v
...
rotating pattern: /var/log/foreman-proxy/smart_proxy_dynflow_core.log  forced from command line (5 rotations)
empty log files are not rotated, old logs are removed
considering log /var/log/foreman-proxy/smart_proxy_dynflow_core.log
  log needs rotating
rotating log /var/log/foreman-proxy/smart_proxy_dynflow_core.log, log->rotateCount is 5
dateext suffix '-20181126'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
glob finding old rotated logs failed
fscreate context set to system_u:object_r:var_log_t:s0
renaming /var/log/foreman-proxy/smart_proxy_dynflow_core.log to /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20181126
creating new /var/log/foreman-proxy/smart_proxy_dynflow_core.log mode = 0644 uid = 993 gid = 991
running postrotate script
compressing log with: /bin/gzip
set default create context to system_u:object_r:var_log_t:s0
...

>>> relevant log rotated successfully (no errors)

3) after logrotate ps tree jobs still present REX job succeded after 1000 secs:

   1:
   2:PLAY [all] *********************************************************************
   3:
   4:TASK [Gathering Facts] *********************************************************
   5:
   6:ok: [host1.example.com]
   7:
   8:TASK [command] *****************************************************************
   9:
  10:changed: [host1.example.com]
  11:
  12:TASK [debug] *******************************************************************
  13:ok: [host1.example.com] => {
  14:    "out": {
  15:        "changed": true, 
  16:        "cmd": "sleep 1000", 
  17:        "delta": "0:16:40.005265", 
  18:        "end": "2018-11-26 18:57:50.891947", 
  19:        "failed": false, 
  20:        "rc": 0, 
  21:        "start": "2018-11-26 18:41:10.886682", 
  22:        "stderr": "", 
  23:        "stderr_lines": [], 
  24:        "stdout": "", 
  25:        "stdout_lines": []
  26:    }
  27:}
  28:
  29:PLAY RECAP *********************************************************************
  30:host1.example.com : ok=3    changed=1    unreachable=0    failed=0   
  31:
  32:
  33:Exit status: 0

Comment 24 errata-xmlrpc 2019-05-14 12:37:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:1222


Note You need to log in before you can comment on or make changes to this bug.