Description of problem: We used systemctl kill ... to deliver SIGUSR1 to smart_proxy_dynflow_core to make it reopen its logs. While systemctl kill delivered the signal reliably, it delivered the signal to every process in the service's control group. The issue was when the service forked more processes (when running ansible-playbook). pkill signals only the processes with matching name. Ansible-playbook reacts to SIGUSR{1,2} by printing "User defined signal 1" and exiting, which is undesired. Version-Release number of selected component (if applicable): 6.4 How reproducible: Always Steps to Reproduce: 1. Run sleep 1000 using the ansible command template 2. When the job runs, run logrotate -f /etc/logrotate.conf -v 3. (optional) Watch the logrotate output as it may not rotate the logs if they were rotated recently Actual results: Logs are rotated, ansible processes die. Possible IOError shown in the jobs output Expected results: Logs are rotated, running ansible processes are left intact. Additional info:
Created redmine issue http://projects.theforeman.org/issues/24927 from this bug
Upstream bug assigned to aruzicka
*** Bug 1628145 has been marked as a duplicate of this bug. ***
snap 23, not 63
FailedQA. @satellite-6.4.0-15.el7sat.noarch tfm-rubygem-smart_proxy_dynflow_core-0.2.1-1.el7sat.noarch using manual reproducer described in comment #0: 0) Remove offending file preventing to logrotate relevant logs # rm -rf /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20180926.gz 1) Schedule Ansible Command "sleep 1000" against a host and watch ps tree # watch "ps -efH | grep ^foreman+" 2) Run logrotate -f /etc/logrotate.conf -v ... rotating log /var/log/foreman-proxy/smart_proxy_dynflow_core.log, log->rotateCount is 5 dateext suffix '-20180926' glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]' fscreate context set to system_u:object_r:var_log_t:s0 renaming /var/log/foreman-proxy/smart_proxy_dynflow_core.log to /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20180926 ... >>> relevant log rotated 3) upon logrotate ps tree jobs disappeared and REX job failed with IOError: --- 1: PLAY [all] ********************************************************************* 2: 3: TASK [Gathering Facts] ********************************************************* 4: ok: [host1.example.com] 5: 6: TASK [shell] ******************************************************************* 7: Runner error: IOError - closed stream 8: Exit status: EXCEPTION ---
Upon logrotate ps tree jobs do not disappear and REX job succeeds. However, logrotate now reports error: --- rotating log /var/log/foreman-proxy/smart_proxy_dynflow_core.log, log->rotateCount is 5 dateext suffix '-20181003' glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]' glob finding old rotated logs failed fscreate context set to system_u:object_r:var_log_t:s0 renaming /var/log/foreman-proxy/smart_proxy_dynflow_core.log to /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20181003 creating new /var/log/foreman-proxy/smart_proxy_dynflow_core.log mode = 0644 uid = 991 gid = 988 running postrotate script error: error running shared postrotate script for '/var/log/foreman-proxy/smart_proxy_dynflow_core.log ' --- So I'd like to failqa this again what do you mean?
Lets consider using "pkill --signal SIGUSR1 --full ^/usr/bin/smart_proxy_dynflow_core" Killing ^/usr/bin/smart_proxy_dynflow_core ( instead of /usr/bin/smart_proxy_dynflow_core ) would avoid killing parent process having the same string in commandline: sh -c "pkill --signal SIGUSR1 --full /usr/bin/smart_proxy_dynflow_core ..."
FailedQA. @satellite-6.4.0-15.el7sat.noarch tfm-rubygem-smart_proxy_dynflow_core-0.2.1-2.el7sat.noarch pkill command is too wide as it is killing also parent (logrotate controlled) proccess
VERIFIED. @satellite-6.5.0-5.beta.el7sat.noarch tfm-rubygem-smart_proxy_dynflow_core-0.2.1-3.el7sat.noarch using manual reproducer described in comment #0: 0) Remove offending file preventing to logrotate relevant logs # rm -rf /var/log/foreman-proxy/smart_proxy_dynflow_core.log-`date +%Y%m%d`.gz 1) Schedule Ansible Command "sleep 1000" against a host and watch ps tree # watch "ps -efH | grep ^foreman+" 2) Run logrotate -f /etc/logrotate.conf -v ... rotating pattern: /var/log/foreman-proxy/smart_proxy_dynflow_core.log forced from command line (5 rotations) empty log files are not rotated, old logs are removed considering log /var/log/foreman-proxy/smart_proxy_dynflow_core.log log needs rotating rotating log /var/log/foreman-proxy/smart_proxy_dynflow_core.log, log->rotateCount is 5 dateext suffix '-20181126' glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]' glob finding old rotated logs failed fscreate context set to system_u:object_r:var_log_t:s0 renaming /var/log/foreman-proxy/smart_proxy_dynflow_core.log to /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20181126 creating new /var/log/foreman-proxy/smart_proxy_dynflow_core.log mode = 0644 uid = 993 gid = 991 running postrotate script compressing log with: /bin/gzip set default create context to system_u:object_r:var_log_t:s0 ... >>> relevant log rotated successfully (no errors) 3) after logrotate ps tree jobs still present REX job succeded after 1000 secs: 1: 2:PLAY [all] ********************************************************************* 3: 4:TASK [Gathering Facts] ********************************************************* 5: 6:ok: [host1.example.com] 7: 8:TASK [command] ***************************************************************** 9: 10:changed: [host1.example.com] 11: 12:TASK [debug] ******************************************************************* 13:ok: [host1.example.com] => { 14: "out": { 15: "changed": true, 16: "cmd": "sleep 1000", 17: "delta": "0:16:40.005265", 18: "end": "2018-11-26 18:57:50.891947", 19: "failed": false, 20: "rc": 0, 21: "start": "2018-11-26 18:41:10.886682", 22: "stderr": "", 23: "stderr_lines": [], 24: "stdout": "", 25: "stdout_lines": [] 26: } 27:} 28: 29:PLAY RECAP ********************************************************************* 30:host1.example.com : ok=3 changed=1 unreachable=0 failed=0 31: 32: 33:Exit status: 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:1222