nginx.service: - Drop syslog.target from After=, not needed. - Drop TimeoutStopSec=5? Default systemd global configuration (/etc/systemd/system.conf) sets it to 90s. - Replace ExecStop with "KillSignal=SIGQUIT". See man systemd.kill. nginx.logrotate: Not sure why 'systemctl kill' was replaced. Was not just a matter of using '--kill-who=main'? 'main' process for Type=forking daemons is the pid file one for systemd. /bin/systemctl kill --signal=SIGUSR1 --kill-who=main nginx.service 2>/dev/null || true ? And please backport these changes to F21 ("KillMode=mixed" addition especially).
Hi. Thanks for the bug report. (In reply to Marcos Mello from comment #0) > - Drop syslog.target from After=, not needed. Done. > - Drop TimeoutStopSec=5? Default systemd global configuration > (/etc/systemd/system.conf) sets it to 90s. The default is too long. If you send a SIGQUIT but for some reason nginx gets stuck, you probably don't want to wait around 90 seconds for a SIGTERM. > - Replace ExecStop with "KillSignal=SIGQUIT". See man systemd.kill. Done, though this doesn't really solve any real problems. > Not sure why 'systemctl kill' was replaced. I changed from `/usr/bin/systemctl` to `/bin/kill` as a quickfix for an SELinux issue a long time ago. The issue was fixed, so I'll change it back but again I don't think this solves any real problems. > And please backport these changes to F21 ("KillMode=mixed" addition > especially). Done. Builds and updates to follow.
nginx-1.8.0-8.fc22 has been submitted as an update for Fedora 22. https://admin.fedoraproject.org/updates/nginx-1.8.0-8.fc22
nginx-1.6.3-2.fc21 has been submitted as an update for Fedora 21. https://admin.fedoraproject.org/updates/nginx-1.6.3-2.fc21
nginx-1.6.3-3.el7 has been submitted as an update for Fedora EPEL 7. https://admin.fedoraproject.org/updates/nginx-1.6.3-3.el7
(In reply to Jamie Nguyen from comment #1) > Hi. Thanks for the bug report. > > (In reply to Marcos Mello from comment #0) > > - Drop syslog.target from After=, not needed. > > Done. > > > > - Drop TimeoutStopSec=5? Default systemd global configuration > > (/etc/systemd/system.conf) sets it to 90s. > > The default is too long. If you send a SIGQUIT but for some reason nginx > gets stuck, you probably don't want to wait around 90 seconds for a SIGTERM. > Now with KillSignal=SIGQUIT and KillMode=mixed, there is no SIGTERM anymore. systemd will send SIGQUIT to main pid and after 5 seconds all processes will be SIGKILLed if still there in the cgroup. This does not affect my light load server, but on busy ones perhaps it can? > > > - Replace ExecStop with "KillSignal=SIGQUIT". See man systemd.kill. > > Done, though this doesn't really solve any real problems. > > > > Not sure why 'systemctl kill' was replaced. > > I changed from `/usr/bin/systemctl` to `/bin/kill` as a quickfix for an > SELinux issue a long time ago. The issue was fixed, so I'll change it back > but again I don't think this solves any real problems. > > > > And please backport these changes to F21 ("KillMode=mixed" addition > > especially). > > Done. > > > Builds and updates to follow. Thank you for your prompt reply!
(In reply to Marcos Mello from comment #5) > Now with KillSignal=SIGQUIT and KillMode=mixed, there is no SIGTERM anymore. > systemd will send SIGQUIT to main pid and after 5 seconds all processes will > be SIGKILLed if still there in the cgroup. This does not affect my light > load server, but on busy ones perhaps it can? Yes I meant SIGKILL. If the server is very busy and the admin needs a longer timeout, they can override this default with a systemd snippet. (NB: Debian also sets TimeoutStopSec=5 in their service file.) > Thank you for your prompt reply! No problem!
Yeah, makes sense.
On CentOS 7, with default KillMode=control-group, I see sometimes this on shutdown (worker pids): [info] 1516#0: epoll_wait() failed (4: Interrupted system call) Probably some race because all processes are receiving SIGQUIT. KillMode=process solves it signaling only the main pid. "mixed" is more robust, but we have to wait for RHEL 8...
Package nginx-1.8.0-8.fc22: * should fix your issue, * was pushed to the Fedora 22 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing nginx-1.8.0-8.fc22' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2015-10025/nginx-1.8.0-8.fc22 then log in and leave karma (feedback).
(In reply to Marcos Mello from comment #8) > On CentOS 7, with default KillMode=control-group, I see sometimes this on > shutdown (worker pids): > > [info] 1516#0: epoll_wait() failed (4: Interrupted system call) In the situation you described, are any processes (either master or workers) left behind after `systemctl stop`?
Nope. They are all finished. I do not have access to the CentOS machine now. You can reproduce it in Fedora though. Comment KillMode, raise log level to info, and configure nginx to use multiple workers (4 in this example). PPID PID COMMAND 1 2795 nginx: master process /usr/sbin/nginx 2795 2796 \_ nginx: worker process 2795 2797 \_ nginx: worker process 2795 2798 \_ nginx: worker process 2795 2799 \_ nginx: worker process Every pid received SIGQUIT: 2015/06/15 19:49:08 [notice] 2792#0: using the "epoll" event method 2015/06/15 19:49:08 [notice] 2792#0: nginx/1.6.3 2015/06/15 19:49:08 [notice] 2792#0: built by gcc 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC) 2015/06/15 19:49:08 [notice] 2792#0: OS: Linux 4.0.4-202.fc21.x86_64 2015/06/15 19:49:08 [notice] 2792#0: getrlimit(RLIMIT_NOFILE): 1024:4096 2015/06/15 19:49:08 [notice] 2795#0: start worker processes 2015/06/15 19:49:08 [notice] 2795#0: start worker process 2796 2015/06/15 19:49:08 [notice] 2795#0: start worker process 2797 2015/06/15 19:49:08 [notice] 2795#0: start worker process 2798 2015/06/15 19:49:08 [notice] 2795#0: start worker process 2799 2015/06/15 19:49:14 [notice] 2797#0: signal 3 (SIGQUIT) received, shutting down 2015/06/15 19:49:14 [info] 2797#0: epoll_wait() failed (4: Interrupted system call) 2015/06/15 19:49:14 [notice] 2797#0: gracefully shutting down 2015/06/15 19:49:14 [notice] 2797#0: exiting 2015/06/15 19:49:14 [notice] 2795#0: signal 3 (SIGQUIT) received, shutting down 2015/06/15 19:49:14 [notice] 2796#0: signal 3 (SIGQUIT) received, shutting down 2015/06/15 19:49:14 [notice] 2796#0: gracefully shutting down 2015/06/15 19:49:14 [notice] 2796#0: exiting 2015/06/15 19:49:14 [notice] 2797#0: exit 2015/06/15 19:49:14 [notice] 2796#0: exit 2015/06/15 19:49:14 [notice] 2798#0: signal 3 (SIGQUIT) received, shutting down 2015/06/15 19:49:14 [notice] 2798#0: gracefully shutting down 2015/06/15 19:49:14 [notice] 2798#0: exiting 2015/06/15 19:49:14 [notice] 2798#0: exit 2015/06/15 19:49:14 [notice] 2799#0: signal 3 (SIGQUIT) received, shutting down 2015/06/15 19:49:14 [notice] 2799#0: gracefully shutting down 2015/06/15 19:49:14 [notice] 2799#0: exiting 2015/06/15 19:49:14 [notice] 2799#0: exit 2015/06/15 19:49:14 [notice] 2795#0: signal 17 (SIGCHLD) received 2015/06/15 19:49:14 [notice] 2795#0: worker process 2796 exited with code 0 2015/06/15 19:49:14 [notice] 2795#0: signal 29 (SIGIO) received 2015/06/15 19:49:14 [notice] 2795#0: signal 17 (SIGCHLD) received 2015/06/15 19:49:14 [notice] 2795#0: worker process 2797 exited with code 0 2015/06/15 19:49:14 [notice] 2795#0: signal 29 (SIGIO) received 2015/06/15 19:49:14 [notice] 2795#0: signal 17 (SIGCHLD) received 2015/06/15 19:49:14 [notice] 2795#0: worker process 2799 exited with code 0 2015/06/15 19:49:14 [notice] 2795#0: signal 29 (SIGIO) received 2015/06/15 19:49:14 [notice] 2795#0: signal 17 (SIGCHLD) received 2015/06/15 19:49:14 [notice] 2795#0: worker process 2798 exited with code 0 2015/06/15 19:49:14 [notice] 2795#0: exit Worker 2797 is complaining. Master (2795) was already cleaning things up I think. It seems quite racy and is easily reproducible. A "clean" log (KillMode=process): PPID PID COMMAND 1 2855 nginx: master process /usr/sbin/nginx 2855 2856 \_ nginx: worker process 2855 2857 \_ nginx: worker process 2855 2858 \_ nginx: worker process 2855 2859 \_ nginx: worker process 2015/06/15 19:56:55 [notice] 2852#0: using the "epoll" event method 2015/06/15 19:56:55 [notice] 2852#0: nginx/1.6.3 2015/06/15 19:56:55 [notice] 2852#0: built by gcc 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC) 2015/06/15 19:56:55 [notice] 2852#0: OS: Linux 4.0.4-202.fc21.x86_64 2015/06/15 19:56:55 [notice] 2852#0: getrlimit(RLIMIT_NOFILE): 1024:4096 2015/06/15 19:56:55 [notice] 2855#0: start worker processes 2015/06/15 19:56:55 [notice] 2855#0: start worker process 2856 2015/06/15 19:56:55 [notice] 2855#0: start worker process 2857 2015/06/15 19:56:55 [notice] 2855#0: start worker process 2858 2015/06/15 19:56:55 [notice] 2855#0: start worker process 2859 2015/06/15 19:58:01 [notice] 2855#0: signal 3 (SIGQUIT) received, shutting down 2015/06/15 19:58:01 [notice] 2856#0: gracefully shutting down 2015/06/15 19:58:01 [notice] 2856#0: exiting 2015/06/15 19:58:01 [notice] 2857#0: gracefully shutting down 2015/06/15 19:58:01 [notice] 2857#0: exiting 2015/06/15 19:58:01 [notice] 2856#0: exit 2015/06/15 19:58:01 [notice] 2857#0: exit 2015/06/15 19:58:01 [notice] 2859#0: gracefully shutting down 2015/06/15 19:58:01 [notice] 2858#0: gracefully shutting down 2015/06/15 19:58:01 [notice] 2859#0: exiting 2015/06/15 19:58:01 [notice] 2858#0: exiting 2015/06/15 19:58:01 [notice] 2859#0: exit 2015/06/15 19:58:01 [notice] 2858#0: exit 2015/06/15 19:58:01 [notice] 2855#0: signal 17 (SIGCHLD) received 2015/06/15 19:58:01 [notice] 2855#0: worker process 2857 exited with code 0 2015/06/15 19:58:01 [notice] 2855#0: signal 29 (SIGIO) received 2015/06/15 19:58:01 [notice] 2855#0: signal 17 (SIGCHLD) received 2015/06/15 19:58:01 [notice] 2855#0: worker process 2859 exited with code 0 2015/06/15 19:58:01 [notice] 2855#0: signal 29 (SIGIO) received 2015/06/15 19:58:01 [notice] 2855#0: signal 17 (SIGCHLD) received 2015/06/15 19:58:01 [notice] 2855#0: worker process 2856 exited with code 0 2015/06/15 19:58:01 [notice] 2855#0: worker process 2858 exited with code 0 2015/06/15 19:58:01 [notice] 2855#0: exit nginx man page says "While there is no need to explicitly control worker processes normally, they support some signals too...". Master process is responsible for workers shutdown. KillMode=mixed is better than KillMode=process, since it will force-kill everything after the timeout. Too bad we cannot use it in CentOS 7. :-(
I agree, KillMode=process is probably better than KillMode=control-group. Your issue looks easily reproducible. Also, I've never seen rogue workers left behind after master has been killed so it's probably OK just to do KillMode=process.
nginx-1.8.0-9.fc22 has been submitted as an update for Fedora 22. https://admin.fedoraproject.org/updates/nginx-1.8.0-9.fc22
nginx-1.6.3-3.fc21 has been submitted as an update for Fedora 21. https://admin.fedoraproject.org/updates/nginx-1.6.3-3.fc21
nginx-1.6.3-5.el7 has been submitted as an update for Fedora EPEL 7. https://admin.fedoraproject.org/updates/nginx-1.6.3-5.el7
Thanks again! BTW, you released a F20 update. Does it worth backport same EPEL 7 nginx.{service,logrotate} changes to it (F20 uses systemd 208 too)? If you have spare time only... F20 is almost EOLed.
nginx-1.8.0-9.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.
nginx-1.6.3-3.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.
I am seeing this on CentOS 7: type=USER_AVC msg=audit(1435918621.481:550): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='avc: denied { stop } for auid=0 uid=0 gid=0 path="/usr/lib/systemd/system/nginx.service" cmdline="/usr/bin/systemctl kill --signal=SIGUSR1 --kill-who=main nginx.service" scontext=system_u:system_r:logrotate_t:s0-s0:c0.c1023 tcontext=system_u:object_r:httpd_unit_file_t:s0 tclass=service exe="/usr/lib/systemd/systemd" sauid=0 hostname=? addr=? terminal=?' But despite it nginx logs *are* being rotated. Where should I report CentOS selinux-policy bugs?
(In reply to Marcos Mello from comment #19) > Where should I report CentOS selinux-policy bugs? https://bugzilla.redhat.com/enter_bug.cgi?product=Red%20Hat%20Enterprise%20Linux%207&component=selinux-policy
Done: https://bugzilla.redhat.com/show_bug.cgi?id=1239045
(In reply to Jamie Nguyen from comment #1) > > Not sure why 'systemctl kill' was replaced. > > I changed from `/usr/bin/systemctl` to `/bin/kill` as a quickfix for an > SELinux issue a long time ago. The issue was fixed, so I'll change it back > but again I don't think this solves any real problems. The new SELinux denials do in fact prevent NGINX from re-opening the log files. This is true for both Fedora and CentOS. Thus I'm reverting back to /bin/kill on all branches. I am very unlikely to switch back to /usr/bin/systemctl in the foreseeable future, unless a compelling reason presents itself.
nginx-1.6.3-6.el7 has been submitted as an update for Fedora EPEL 7. https://admin.fedoraproject.org/updates/nginx-1.6.3-6.el7
Well, systemctl kill is future proof, if one day nginx gains Type=notify support and a blessed* foreground mode, etc, pid file is not needed (and can be ommited or even disabled). When selinux-policy gets fixed, perhaps in the future you can put it back. Sorry for not testing this right. * upstream puts some restrictions on 'daemon off'.
If one day nginx doesn't need a pid file anymore, at that point I'll consider switching back to systemctl.
nginx-1.6.3-6.el7 has been pushed to the Fedora EPEL 7 stable repository. If problems still persist, please make note of it in this bug report.