Bug 1290599
Summary: | Unable to shutdown controller nodes | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Chris Dearborn <christopher_dearborn> | ||||||||||
Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> | ||||||||||
Status: | CLOSED DUPLICATE | QA Contact: | cluster-qe <cluster-qe> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 7.2 | CC: | abeekhof, arkady_kanevsky, cdevine, christopher_dearborn, cluster-maint, fdinitto, hbrock, jjarvis, John_walsh, kbader, kgaillot, kurt_hey, mburns, morazi, rajini.karthik, randy_perryman, rhel-osp-director-maint, rsussman, sclewis, sreichar, wayne_allen | ||||||||||
Target Milestone: | pre-dev-freeze | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2016-05-16 18:14:21 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1261979 | ||||||||||||
Attachments: |
|
I suspect that this is not a valid use case, though one that is particularly common. Since the controllers are configured in a pacemaker cluster, you should first take the node out of the cluster before reboot, then re-add it after it comes back. If you're attempting to simulate fencing or HA recovery, there are other ways to do that. Adding a member of the HA team to comment on that. Andrew is currently on vacation. Can you please attach sosreports from the 3 controller nodes? From experience, pacemaker cannot shutdown because one of the services is unable to stop, but we need to verify which service and why. Created attachment 1106753 [details]
controller0 sos report
Created attachment 1106754 [details]
controller1 sos report
Created attachment 1106755 [details]
controller2 sos report
I ended up redeploying the overcloud, but the new deployment exhibits the same behavior. I attached the requested sosreports from the 3 controllers.
This is a Pacemaker bug. If one runs crm_simulate on: sosreport-CDearborn.1290599-20151217094252/sos_commands/cluster/crm_report/overcloud-controller-2.localdomain/pengine/pe-input-5.bz2 You can see that an invalid transition is produced (even with the current upstream master): Executing cluster transition: * Pseudo action: redis-master_pre_notify_demote_0 * Resource action: redis notify on overcloud-controller-2 * Pseudo action: redis-master_confirmed-pre_notify_demote_0 Transition failed: terminated An invalid transition was produced Actually, not so much. I went back to check and sure enough, fencing is disabled. Almost certainly a stop operation failed (probably timed out, possibly due to the oslo bug) at which point the cluster can't continue because fencing, the one thing it could use to make progress, is disabled. So looks like an invalid test to me based on config + unreliable resource But fencing is disabled by default. It needs to be explicitly turned on. Shouldn't you be able to shut down the controller nodes when you use the default installation options? In Pacemaker itself, and clusters created by pcs cluster setup, fencing is enabled by default. Scripts that automate cluster deployment may explicitly disable it, but we don't recommend that, partly because of the stop issue discussed here. Also, Red Hat does not support clusters with fencing disabled. If someone configures a constraint "A after B", that means that A must start after B, *and* A must stop before B. If A fails to stop, what can Pacemaker do? If fencing is available, it can kill the machine, and be certain that A is stopped. But with fencing disabled, it's stuck. There's no safe way to continue the shutdown. Here's a (facetious) example that illustrates why it's better to hang than continue shutting down: pcs constraint order nuclear_reactor after reactor_cooling In this case, there are a number of non-pacemaker errors in /var/log/messages (numerous python ImportError backtraces, "missing timeouts", etc.) that could be meaningful. There are also repeated messages like: Dec 17 09:21:06 overcloud-controller-1 heat-engine: 2015-12-17 09:21:06.858 14086 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 192.168.140.22:5672 is unreachable: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'. Trying again in 1 seconds. which is likely the failing stop action that's causing the hang. I'm not sure which "oslo bug" Beekhof is referring to, but that's worth looking into. Also make sure you have the constraints mentioned in BZ#1257414 (easiest way is to check that "pcs resource defaults" includes resource-stickiness=INFINITY). Ok, I went back and looked at the beta docs, and it does clearly state that turning on fencing is the final step in standing up an overcloud. Mea Culpa. I went ahead and enabled fencing per the documentation, then tried shutting down a single controller node. The node did shut down, but it took around 12-15 minutes for it to do so. I then tried shutting down a second controller node, and it also took 12-15 minutes. Is this expected? This seems very long to me. Usually, shutdown is quicker than that, but it depends on the specific resources being used. It should take about the same amount of time as stopping all the services manually would. After a node is shut down, does "pcs status" on one of the other nodes show any failed actions? I did a shutdown -h now on overcloud-controller-1. Doing a pcs status from controller-2 shows the following failed actions: Failed Actions: * neutron-ovs-cleanup_stop_0 on overcloud-controller-1 'unknown error' (1): call=327, status=complete, exitreason='none', last-rc-change='Thu Jan 28 14:33:06 2016', queued=0ms, exec=727ms * rabbitmq_monitor_10000 on overcloud-controller-1 'not running' (7): call=227, status=complete, exitreason='none', last-rc-change='Thu Jan 28 14:32:57 2016', queued=0ms, exec=0ms Kurt, has the pacemaker configuration been changed since the sosreports have been generated? There has been a series of fixes around stop actions timeout that could affect this problem but I am not entirely sure in which OSP8d builds have landed. pcs config should show, for systemd resources: "op stop timeout=200s" if not, then we need to apply the fix manually. If yes, then the system is just taking a long time to stop all the services and that might be unrelated to pacemaker. We will need to re-check all the logs for openstack service errors. Fabio, Here is the output of pcs config for system: [root@overcloud-controller-0 ~]# pcs config | grep systemd Resource: haproxy (class=systemd type=haproxy) Resource: mongod (class=systemd type=mongod) Resource: memcached (class=systemd type=memcached) Resource: openstack-nova-scheduler (class=systemd type=openstack-nova-scheduler) Resource: neutron-l3-agent (class=systemd type=neutron-l3-agent) Resource: openstack-ceilometer-alarm-notifier (class=systemd type=openstack-ceilometer-alarm-notifier) Resource: openstack-heat-engine (class=systemd type=openstack-heat-engine) Resource: openstack-ceilometer-api (class=systemd type=openstack-ceilometer-api) Resource: neutron-metadata-agent (class=systemd type=neutron-metadata-agent) Resource: openstack-heat-api (class=systemd type=openstack-heat-api) Resource: openstack-cinder-scheduler (class=systemd type=openstack-cinder-scheduler) Resource: openstack-nova-api (class=systemd type=openstack-nova-api) Resource: openstack-heat-api-cloudwatch (class=systemd type=openstack-heat-api-cloudwatch) Resource: openstack-ceilometer-collector (class=systemd type=openstack-ceilometer-collector) Resource: openstack-keystone (class=systemd type=openstack-keystone) Resource: openstack-nova-consoleauth (class=systemd type=openstack-nova-consoleauth) Resource: openstack-glance-registry (class=systemd type=openstack-glance-registry) Resource: openstack-ceilometer-notification (class=systemd type=openstack-ceilometer-notification) Resource: openstack-cinder-api (class=systemd type=openstack-cinder-api) Resource: neutron-dhcp-agent (class=systemd type=neutron-dhcp-agent) Resource: openstack-glance-api (class=systemd type=openstack-glance-api) Resource: neutron-openvswitch-agent (class=systemd type=neutron-openvswitch-agent) Resource: openstack-nova-novncproxy (class=systemd type=openstack-nova-novncproxy) Resource: neutron-server (class=systemd type=neutron-server) Resource: httpd (class=systemd type=httpd) Resource: openstack-ceilometer-central (class=systemd type=openstack-ceilometer-central) Resource: openstack-ceilometer-alarm-evaluator (class=systemd type=openstack-ceilometer-alarm-evaluator) Resource: openstack-heat-api-cfn (class=systemd type=openstack-heat-api-cfn) Resource: openstack-cinder-volume (class=systemd type=openstack-cinder-volume) Resource: openstack-nova-conductor (class=systemd type=openstack-nova-conductor) Am I looking at the right thing? (In reply to Kurt Hey from comment #17) > Fabio, > > Here is the output of pcs config for system: > > [root@overcloud-controller-0 ~]# pcs config | grep systemd > Resource: haproxy (class=systemd type=haproxy) > Resource: mongod (class=systemd type=mongod) > Resource: memcached (class=systemd type=memcached) > Resource: openstack-nova-scheduler (class=systemd > type=openstack-nova-scheduler) > Resource: neutron-l3-agent (class=systemd type=neutron-l3-agent) > Resource: openstack-ceilometer-alarm-notifier (class=systemd > type=openstack-ceilometer-alarm-notifier) > Resource: openstack-heat-engine (class=systemd type=openstack-heat-engine) > Resource: openstack-ceilometer-api (class=systemd > type=openstack-ceilometer-api) > Resource: neutron-metadata-agent (class=systemd > type=neutron-metadata-agent) > Resource: openstack-heat-api (class=systemd type=openstack-heat-api) > Resource: openstack-cinder-scheduler (class=systemd > type=openstack-cinder-scheduler) > Resource: openstack-nova-api (class=systemd type=openstack-nova-api) > Resource: openstack-heat-api-cloudwatch (class=systemd > type=openstack-heat-api-cloudwatch) > Resource: openstack-ceilometer-collector (class=systemd > type=openstack-ceilometer-collector) > Resource: openstack-keystone (class=systemd type=openstack-keystone) > Resource: openstack-nova-consoleauth (class=systemd > type=openstack-nova-consoleauth) > Resource: openstack-glance-registry (class=systemd > type=openstack-glance-registry) > Resource: openstack-ceilometer-notification (class=systemd > type=openstack-ceilometer-notification) > Resource: openstack-cinder-api (class=systemd type=openstack-cinder-api) > Resource: neutron-dhcp-agent (class=systemd type=neutron-dhcp-agent) > Resource: openstack-glance-api (class=systemd type=openstack-glance-api) > Resource: neutron-openvswitch-agent (class=systemd > type=neutron-openvswitch-agent) > Resource: openstack-nova-novncproxy (class=systemd > type=openstack-nova-novncproxy) > Resource: neutron-server (class=systemd type=neutron-server) > Resource: httpd (class=systemd type=httpd) > Resource: openstack-ceilometer-central (class=systemd > type=openstack-ceilometer-central) > Resource: openstack-ceilometer-alarm-evaluator (class=systemd > type=openstack-ceilometer-alarm-evaluator) > Resource: openstack-heat-api-cfn (class=systemd > type=openstack-heat-api-cfn) > Resource: openstack-cinder-volume (class=systemd > type=openstack-cinder-volume) > Resource: openstack-nova-conductor (class=systemd > type=openstack-nova-conductor) > > > Am I looking at the right thing? Close enough :) but if you grep for systemd you will need to be able to see the timeout configure. pcs config|less , then search for systemd and you will 3/4 lines below each resource with timeout configuration. Looks like all the timeouts are 90s or at least most of them. Nothing at 200s. (In reply to Kurt Hey from comment #19) > Looks like all the timeouts are 90s or at least most of them. Nothing at > 200s. Ok then the version of OSPd you guys used doesn´t have the fix yet. You can bump it manually for each service with: pcs resource update <name> op stop timeout=200s pcs resource update <name> op monitor timeout=200s pcs resource update <name> op start timeout=200s where name is the systemd resource. Example: Resource: neutron-openvswitch-agent (class=systemd type=neutron-openvswitch-agent) name is neutron-openvswitch-agent (or awk ´{print $2}´). Fabio I just tried this too with beta 4. It did take like 12-15 minutes to shutdown a controller node. pcs status was clean before the shutdown. pcs status after the shutdown hung for a minute or 2 when trying to get the status of the controller that was down, but the status came up clean: ... Clone Set: openstack-ceilometer-alarm-evaluator-clone [openstack-ceilometer-alarm-evaluator] Started: [ overcloud-controller-0 overcloud-controller-1 ] Stopped: [ overcloud-controller-2 ] Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] Started: [ overcloud-controller-0 overcloud-controller-1 ] Stopped: [ overcloud-controller-2 ] openstack-cinder-volume (systemd:openstack-cinder-volume): Started overcloud-controller-1 Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor] Started: [ overcloud-controller-0 overcloud-controller-1 ] Stopped: [ overcloud-controller-2 ] PCSD Status: overcloud-controller-0: Online overcloud-controller-1: Online <hang was here> overcloud-controller-2: Offline Daemon Status: corosync: active/enabled pacemaker: active/enabled (In reply to Chris Dearborn from comment #21) > I just tried this too with beta 4. It did take like 12-15 minutes to > shutdown a controller node. pcs status was clean before the shutdown. Ok, I don´t know why it´s taking that long. Can you get me sosreports from the Beta 4 please? > pcs > status after the shutdown hung for a minute or 2 when trying to get the > status of the controller that was down, this is expected. The pcs to pcsd connection towards that controller will have to timeout (TCP). We do have a bug to improve that. > but the status came up clean: > > ... > Clone Set: openstack-ceilometer-alarm-evaluator-clone > [openstack-ceilometer-alarm-evaluator] > Started: [ overcloud-controller-0 overcloud-controller-1 ] > Stopped: [ overcloud-controller-2 ] > Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] > Started: [ overcloud-controller-0 overcloud-controller-1 ] > Stopped: [ overcloud-controller-2 ] > openstack-cinder-volume (systemd:openstack-cinder-volume): > Started overcloud-controller-1 > Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor] > Started: [ overcloud-controller-0 overcloud-controller-1 ] > Stopped: [ overcloud-controller-2 ] > > PCSD Status: > overcloud-controller-0: Online > overcloud-controller-1: Online > <hang was here> > overcloud-controller-2: Offline > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled Ok perfect! (In reply to Fabio Massimo Di Nitto from comment #22) > (In reply to Chris Dearborn from comment #21) > > I just tried this too with beta 4. It did take like 12-15 minutes to > > shutdown a controller node. pcs status was clean before the shutdown. > > Ok, I don´t know why it´s taking that long. Can you get me sosreports from > the Beta 4 please? We can look at the sosreports, but if you're being affected by the oslo bug (almost certain if bumping timeouts to 200s helped), then 15 minutes will disappear really fast if 5 or 6 systemd services are using their maximum of 190s. Increasing the timeout just avoids failed actions by waiting long enough for systemd to handle. So the lack of failed actions doesn't mean the service started shutting down cleanly or in a timely manner. Most likely this is a duplicate of BZ#1288528, an oslo.service bug whose fix has recently passed QA but not been incorporated into 8.0 yet. The signature is log messages containing "Cannot switch to MAINLOOP from MAINLOOP", which we do see in the original logs here. New reports from the beta 4 deployment would still be helpful to verify that those messages are still involved, and to check whether any additional issues may be relevant. Besides the logs, probably the best course is to keep the higher timeouts (and unfortunately long shutdown time) until the fix makes it into a new beta. If that is undesirable, from the description of the bug, it sounds like systemd unit file overrides for the affected daemons with KillMode=process might work around the issue, but that has not been tested here. The fix for BZ#1288528 was released 2016-04-07 as part of 8.0. Have you had a chance to test this behavior with that fix deployed? Just tested this with OSP 8 GA, and I was able to shutdown a controller node in just over 2 minutes, which seems very reasonable. Feel free to close this one out. Thanks for all the help! *** This bug has been marked as a duplicate of bug 1288528 *** |
Created attachment 1104514 [details] Controller node console hang Description of problem: I tried to reboot the controller nodes, and all three appeared to hang indefinitely while shutting down. After 10 minutes I powered them off and back on again. Version-Release number of selected component (if applicable): 8.0 Beta 2 How reproducible: Try shutting down a controller node. It hung during shutdown when I executed a "reboot" command. Steps to Reproduce: 1. Deploy the overcloud 2. Try to shutdown the controller nodes Actual results: Controller node shutdown hangs at the same place on each node. Expected results: Controller nodes should shutdown. Additional info: All 3 controllers hung at: A stop job is running for Pacemaker...r Manager (time) See attached image.