+++ This bug was initially created as a clone of Bug #1295830 +++ Via BZ https://bugzilla.redhat.com/show_bug.cgi?id=1275324 we increased the stop timeout to 100 seconds. Initially the 100s recommendation came from the DefaultTimeoutStopSec=90s setting in /etc/systemd/system.conf, and I believe the 120s recommendation (https://bugzilla.redhat.com/show_bug.cgi?id=1275324#c15) came from anedoctal evidence observed during test runs, but was lost in the noise of the above BZ. So I took a look at the RHEL 7.2 systemd's source and noticed that the correct formula is actually: DefaultTimeoutStopSec * 2 + <scheduling-delta*> * I assume we need a bit of time to make sure that systemd is scheduled, that it sends a SIGKILL and that everything (process structures, mainly) is gone and five seconds seems quite reasonable (aka if systemd does not get to run within 5 seconds you likely have other issues anyways) This is because in src/core/service.c:static int service_dispatch_timer(sd_event_source *source, usec_t usec, void *userdata) { ... case SERVICE_STOP_SIGTERM: if (s->kill_context.send_sigkill) { log_unit_warning(UNIT(s)->id, "%s stop-sigterm timed out. Killing.", UNIT(s)->id); service_enter_signal(s, SERVICE_STOP_SIGKILL, SERVICE_FAILURE_TIMEOUT); } else { log_unit_warning(UNIT(s)->id, "%s stop-sigterm timed out. Skipping SIGKILL.", UNIT(s)->id); service_enter_stop_post(s, SERVICE_FAILURE_TIMEOUT); } break; ... The man page seems to confirm that systemd will wait one Timout timespan for the initial stop request. Then it will send a SIGTERM and wait for another Timeout to occur and, if the service is still around, then we send a SIGKILL. """ TimeoutStopSec= Configures the time to wait for stop. If a service is asked to stop but does not terminate in the specified time, it will be terminated forcibly via SIGTERM, and after another delay of this time with SIGKILL (See KillMode= in systemd.kill(5)). Takes a unit-less value in seconds, or a time span value such as "5min 20s". Pass 0 to disable the timeout logic. Defaults to TimeoutStartSec= in manager configuration file. """ This also confirms that we have seen services still around even after > 100 seconds. So we need to change the pcs default timeout according to the following formula: DefaultTimeoutStopSec * 2 + X = 180 + X ~= 185s This is under the assumption that DefaultTimeoutStopSec in system.conf is left at the RHEL default of 90 seconds. Since pacemaker will fence a node when a service fails to stop within the configured timeout, this change should avoid most of the spurious fencing events when a service was still around after the old 100 seconds timeout.
*** Bug 1291474 has been marked as a duplicate of this bug. ***
Hi I can see all openstack services are set to start/stop 200sec beside Rabbitmq/redis is that by design ? Ofer [root@overcloud-controller-2 ~]# pcs resource --full | grep stop -C 1 Operations: start interval=0s timeout=20s (ip-192.0.2.12-start-interval-0s) stop interval=0s timeout=20s (ip-192.0.2.12-stop-interval-0s) monitor interval=10s timeout=20s (ip-192.0.2.12-monitor-interval-10s) -- Operations: start interval=0s timeout=200s (haproxy-start-interval-0s) stop interval=0s timeout=200s (haproxy-stop-interval-0s) monitor interval=60s (haproxy-monitor-interval-60s) -- Operations: start interval=0s timeout=120 (galera-start-interval-0s) stop interval=0s timeout=120 (galera-stop-interval-0s) monitor interval=20 timeout=30 (galera-monitor-interval-20) -- Operations: start interval=0s timeout=20s (ip-192.0.2.11-start-interval-0s) stop interval=0s timeout=20s (ip-192.0.2.11-stop-interval-0s) monitor interval=10s timeout=20s (ip-192.0.2.11-monitor-interval-10s) -- Operations: start interval=0s timeout=120 (redis-start-interval-0s) stop interval=0s timeout=120 (redis-stop-interval-0s) monitor interval=45 timeout=60 (redis-monitor-interval-45) -- Operations: start interval=0s timeout=370s (mongod-start-interval-0s) stop interval=0s timeout=200s (mongod-stop-interval-0s) monitor interval=60s (mongod-monitor-interval-60s) -- Operations: start interval=0s timeout=100 (rabbitmq-start-interval-0s) stop interval=0s timeout=90 (rabbitmq-stop-interval-0s) monitor interval=10 timeout=40 (rabbitmq-monitor-interval-10) -- Operations: start interval=0s timeout=200s (memcached-start-interval-0s) stop interval=0s timeout=200s (memcached-stop-interval-0s) monitor interval=60s (memcached-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-nova-scheduler-start-interval-0s) stop interval=0s timeout=200s (openstack-nova-scheduler-stop-interval-0s) monitor interval=60s start-delay=10s (openstack-nova-scheduler-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (neutron-l3-agent-start-interval-0s) stop interval=0s timeout=200s (neutron-l3-agent-stop-interval-0s) monitor interval=60s (neutron-l3-agent-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-ceilometer-alarm-notifier-start-interval-0s) stop interval=0s timeout=200s (openstack-ceilometer-alarm-notifier-stop-interval-0s) monitor interval=60s (openstack-ceilometer-alarm-notifier-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-heat-engine-start-interval-0s) stop interval=0s timeout=200s (openstack-heat-engine-stop-interval-0s) monitor interval=60s (openstack-heat-engine-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-ceilometer-api-start-interval-0s) stop interval=0s timeout=200s (openstack-ceilometer-api-stop-interval-0s) monitor interval=60s (openstack-ceilometer-api-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (neutron-metadata-agent-start-interval-0s) stop interval=0s timeout=200s (neutron-metadata-agent-stop-interval-0s) monitor interval=60s (neutron-metadata-agent-monitor-interval-60s) -- Operations: start interval=0s timeout=40 (neutron-ovs-cleanup-start-interval-0s) stop interval=0s timeout=300 (neutron-ovs-cleanup-stop-interval-0s) monitor interval=10 timeout=20 (neutron-ovs-cleanup-monitor-interval-10) -- Operations: start interval=0s timeout=40 (neutron-netns-cleanup-start-interval-0s) stop interval=0s timeout=300 (neutron-netns-cleanup-stop-interval-0s) monitor interval=10 timeout=20 (neutron-netns-cleanup-monitor-interval-10) -- Operations: start interval=0s timeout=200s (openstack-heat-api-start-interval-0s) stop interval=0s timeout=200s (openstack-heat-api-stop-interval-0s) monitor interval=60s (openstack-heat-api-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-cinder-scheduler-start-interval-0s) stop interval=0s timeout=200s (openstack-cinder-scheduler-stop-interval-0s) monitor interval=60s (openstack-cinder-scheduler-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-nova-api-start-interval-0s) stop interval=0s timeout=200s (openstack-nova-api-stop-interval-0s) monitor interval=60s start-delay=10s (openstack-nova-api-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-heat-api-cloudwatch-start-interval-0s) stop interval=0s timeout=200s (openstack-heat-api-cloudwatch-stop-interval-0s) monitor interval=60s (openstack-heat-api-cloudwatch-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-ceilometer-collector-start-interval-0s) stop interval=0s timeout=200s (openstack-ceilometer-collector-stop-interval-0s) monitor interval=60s (openstack-ceilometer-collector-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-keystone-start-interval-0s) stop interval=0s timeout=200s (openstack-keystone-stop-interval-0s) monitor interval=60s (openstack-keystone-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-nova-consoleauth-start-interval-0s) stop interval=0s timeout=200s (openstack-nova-consoleauth-stop-interval-0s) monitor interval=60s start-delay=10s (openstack-nova-consoleauth-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-glance-registry-start-interval-0s) stop interval=0s timeout=200s (openstack-glance-registry-stop-interval-0s) monitor interval=60s (openstack-glance-registry-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-ceilometer-notification-start-interval-0s) stop interval=0s timeout=200s (openstack-ceilometer-notification-stop-interval-0s) monitor interval=60s (openstack-ceilometer-notification-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-cinder-api-start-interval-0s) stop interval=0s timeout=200s (openstack-cinder-api-stop-interval-0s) monitor interval=60s (openstack-cinder-api-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (neutron-dhcp-agent-start-interval-0s) stop interval=0s timeout=200s (neutron-dhcp-agent-stop-interval-0s) monitor interval=60s (neutron-dhcp-agent-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-glance-api-start-interval-0s) stop interval=0s timeout=200s (openstack-glance-api-stop-interval-0s) monitor interval=60s (openstack-glance-api-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (neutron-openvswitch-agent-start-interval-0s) stop interval=0s timeout=200s (neutron-openvswitch-agent-stop-interval-0s) monitor interval=60s (neutron-openvswitch-agent-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-nova-novncproxy-start-interval-0s) stop interval=0s timeout=200s (openstack-nova-novncproxy-stop-interval-0s) monitor interval=60s start-delay=10s (openstack-nova-novncproxy-monitor-interval-60s) -- Operations: start interval=0s timeout=30 (delay-start-interval-0s) stop interval=0s timeout=30 (delay-stop-interval-0s) monitor interval=10 timeout=30 (delay-monitor-interval-10) -- Operations: start interval=0s timeout=200s (neutron-server-start-interval-0s) stop interval=0s timeout=200s (neutron-server-stop-interval-0s) monitor interval=60s (neutron-server-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (httpd-start-interval-0s) stop interval=0s timeout=200s (httpd-stop-interval-0s) monitor interval=60s (httpd-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-ceilometer-central-start-interval-0s) stop interval=0s timeout=200s (openstack-ceilometer-central-stop-interval-0s) monitor interval=60s (openstack-ceilometer-central-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-ceilometer-alarm-evaluator-start-interval-0s) stop interval=0s timeout=200s (openstack-ceilometer-alarm-evaluator-stop-interval-0s) monitor interval=60s (openstack-ceilometer-alarm-evaluator-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-heat-api-cfn-start-interval-0s) stop interval=0s timeout=200s (openstack-heat-api-cfn-stop-interval-0s) monitor interval=60s (openstack-heat-api-cfn-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-cinder-volume-start-interval-0s) stop interval=0s timeout=200s (openstack-cinder-volume-stop-interval-0s) monitor interval=60s (openstack-cinder-volume-monitor-interval-60s) -- Operations: start interval=0s timeout=200s (openstack-nova-conductor-start-interval-0s) stop interval=0s timeout=200s (openstack-nova-conductor-stop-interval-0s) monitor interval=60s start-delay=10s (openstack-nova-conductor-monitor-interval-60s)
Hi Ofer, since Rabbit and Redis are not systemd resources, then it's ok to have different timeout for those, this bug is limited to the systemd resources.
Correct, the 200s comes from systemd default timeout (90s) times 2 + delta: 90s*2 + 20s = 200s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0264.html
*** Bug 1322387 has been marked as a duplicate of this bug. ***