Bug 1628505

Summary: Ansible processes might get killed when logrotate runs for smart_proxy_dynflow_core
Product: Red Hat Satellite Reporter: Adam Ruzicka <aruzicka>
Component: Remote ExecutionAssignee: Adam Ruzicka <aruzicka>
Status: CLOSED ERRATA QA Contact: Lukas Pramuk <lpramuk>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.4CC: aruzicka, egolov, inecas, ktordeur, mmccune, mvanderw, ofalk, pcreech, sghai
Target Milestone: 6.5.0Keywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tfm-rubygem-smart_proxy_dynflow_core-0.2.0-4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1646745 (view as bug list) Environment:
Last Closed: 2019-05-14 12:37:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1628145    

Description Adam Ruzicka 2018-09-13 09:29:08 UTC
Description of problem:
We used systemctl kill ... to deliver SIGUSR1 to smart_proxy_dynflow_core to make it reopen its logs. While systemctl kill delivered the signal reliably, it delivered the signal to every process in the service's control group. The issue was when the service forked more processes (when running ansible-playbook). pkill signals only the processes with matching name.

Ansible-playbook reacts to SIGUSR{1,2} by printing "User defined signal
1" and exiting, which is undesired.


Version-Release number of selected component (if applicable):
6.4


How reproducible:
Always

Steps to Reproduce:
1. Run sleep 1000 using the ansible command template
2. When the job runs, run logrotate -f /etc/logrotate.conf -v
3. (optional) Watch the logrotate output as it may not rotate the logs if they were rotated recently

Actual results:
Logs are rotated, ansible processes die. Possible IOError shown in the jobs output

Expected results:
Logs are rotated, running ansible processes are left intact.

Additional info:

Comment 1 Adam Ruzicka 2018-09-13 09:31:43 UTC
Created redmine issue http://projects.theforeman.org/issues/24927 from this bug

Comment 2 Satellite Program 2018-09-13 10:08:06 UTC
Upstream bug assigned to aruzicka

Comment 3 Satellite Program 2018-09-13 10:08:08 UTC
Upstream bug assigned to aruzicka

Comment 5 Ivan Necas 2018-09-18 11:23:29 UTC
*** Bug 1628145 has been marked as a duplicate of this bug. ***

Comment 7 Patrick Creech 2018-09-24 14:43:31 UTC
snap 23, not 63

Comment 8 Lukas Pramuk 2018-09-26 08:38:41 UTC
FailedQA.

@satellite-6.4.0-15.el7sat.noarch
tfm-rubygem-smart_proxy_dynflow_core-0.2.1-1.el7sat.noarch

using manual reproducer described in comment #0:

0) Remove offending file preventing to logrotate relevant logs
# rm -rf /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20180926.gz

1) Schedule Ansible Command "sleep 1000" against a host and watch ps tree
# watch "ps -efH | grep ^foreman+"

2) Run logrotate -f /etc/logrotate.conf -v
...
rotating log /var/log/foreman-proxy/smart_proxy_dynflow_core.log, log->rotateCount is 5
dateext suffix '-20180926'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
fscreate context set to system_u:object_r:var_log_t:s0
renaming /var/log/foreman-proxy/smart_proxy_dynflow_core.log to /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20180926
...

>>> relevant log rotated

3) upon logrotate ps tree jobs disappeared and REX job failed with IOError:

---
1:
PLAY [all] *********************************************************************
   2:
   3:
TASK [Gathering Facts] *********************************************************
   4:
ok: [host1.example.com]
   5:
   6:
TASK [shell] *******************************************************************
   7:
Runner error: IOError - closed stream
   8:
Exit status: EXCEPTION
---

Comment 14 Lukas Pramuk 2018-10-03 18:05:50 UTC
Upon logrotate ps tree jobs do not disappear and REX job succeeds.
However, logrotate now reports error:
---
rotating log /var/log/foreman-proxy/smart_proxy_dynflow_core.log, log->rotateCount is 5
dateext suffix '-20181003'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
glob finding old rotated logs failed
fscreate context set to system_u:object_r:var_log_t:s0
renaming /var/log/foreman-proxy/smart_proxy_dynflow_core.log to /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20181003
creating new /var/log/foreman-proxy/smart_proxy_dynflow_core.log mode = 0644 uid = 991 gid = 988
running postrotate script
error: error running shared postrotate script for '/var/log/foreman-proxy/smart_proxy_dynflow_core.log '
---

So I'd like to failqa this again what do you mean?

Comment 15 Lukas Pramuk 2018-10-03 20:15:16 UTC
Lets consider using "pkill --signal SIGUSR1 --full ^/usr/bin/smart_proxy_dynflow_core"

Killing ^/usr/bin/smart_proxy_dynflow_core ( instead of /usr/bin/smart_proxy_dynflow_core ) would avoid killing parent process having the same string in commandline:

sh -c "pkill --signal SIGUSR1 --full /usr/bin/smart_proxy_dynflow_core ..."

Comment 16 Lukas Pramuk 2018-10-04 09:55:43 UTC
FailedQA.

@satellite-6.4.0-15.el7sat.noarch
tfm-rubygem-smart_proxy_dynflow_core-0.2.1-2.el7sat.noarch

pkill command is too wide as it is killing also parent (logrotate controlled) proccess

Comment 22 Lukas Pramuk 2018-11-27 00:04:01 UTC
VERIFIED.

@satellite-6.5.0-5.beta.el7sat.noarch
tfm-rubygem-smart_proxy_dynflow_core-0.2.1-3.el7sat.noarch

using manual reproducer described in comment #0:

0) Remove offending file preventing to logrotate relevant logs
# rm -rf /var/log/foreman-proxy/smart_proxy_dynflow_core.log-`date +%Y%m%d`.gz

1) Schedule Ansible Command "sleep 1000" against a host and watch ps tree
# watch "ps -efH | grep ^foreman+"

2) Run logrotate -f /etc/logrotate.conf -v
...
rotating pattern: /var/log/foreman-proxy/smart_proxy_dynflow_core.log  forced from command line (5 rotations)
empty log files are not rotated, old logs are removed
considering log /var/log/foreman-proxy/smart_proxy_dynflow_core.log
  log needs rotating
rotating log /var/log/foreman-proxy/smart_proxy_dynflow_core.log, log->rotateCount is 5
dateext suffix '-20181126'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
glob finding old rotated logs failed
fscreate context set to system_u:object_r:var_log_t:s0
renaming /var/log/foreman-proxy/smart_proxy_dynflow_core.log to /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20181126
creating new /var/log/foreman-proxy/smart_proxy_dynflow_core.log mode = 0644 uid = 993 gid = 991
running postrotate script
compressing log with: /bin/gzip
set default create context to system_u:object_r:var_log_t:s0
...

>>> relevant log rotated successfully (no errors)

3) after logrotate ps tree jobs still present REX job succeded after 1000 secs:

   1:
   2:PLAY [all] *********************************************************************
   3:
   4:TASK [Gathering Facts] *********************************************************
   5:
   6:ok: [host1.example.com]
   7:
   8:TASK [command] *****************************************************************
   9:
  10:changed: [host1.example.com]
  11:
  12:TASK [debug] *******************************************************************
  13:ok: [host1.example.com] => {
  14:    "out": {
  15:        "changed": true, 
  16:        "cmd": "sleep 1000", 
  17:        "delta": "0:16:40.005265", 
  18:        "end": "2018-11-26 18:57:50.891947", 
  19:        "failed": false, 
  20:        "rc": 0, 
  21:        "start": "2018-11-26 18:41:10.886682", 
  22:        "stderr": "", 
  23:        "stderr_lines": [], 
  24:        "stdout": "", 
  25:        "stdout_lines": []
  26:    }
  27:}
  28:
  29:PLAY RECAP *********************************************************************
  30:host1.example.com : ok=3    changed=1    unreachable=0    failed=0   
  31:
  32:
  33:Exit status: 0

Comment 24 errata-xmlrpc 2019-05-14 12:37:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:1222