Bug 1261921

Summary: updating overcloud stack packages doesn't stop cluster and will cause it to be down
Product: Red Hat OpenStack Reporter: Ofer Blaut <oblaut>
Component: openstack-tripleo-heat-templatesAssignee: Jay Dobies <jason.dobies>
Status: CLOSED ERRATA QA Contact: Alexander Chuzhoy <sasha>
Severity: urgent Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: abeekhof, calfonso, dhill, dprince, emilien, fdinitto, gkeegan, jguiditt, kbasil, mburns, mcornea, michele, oblaut, ohochman, rhel-osp-director-maint, sasha, sbaker, vcojot, zbitter
Target Milestone: y1   
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-0.8.6-71.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-08 12:18:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1257642, 1257717, 1259905, 1264203    
Attachments:
Description Flags
LOGS none

Description Ofer Blaut 2015-09-10 12:46:40 UTC
Created attachment 1072167 [details]
LOGS

Description of problem:

In order to update packages on overcloud we use : openstack overcloud update stack overcloud

Seem that update doesn't stop the cluster on the controllers and cause cluster problems ( attached logs)

In case update is done to host with pacemaker, cluster must be stopped -> updat -> started again 
 
I also looked at /usr/share/openstack-tripleo-heat-templates/extraconfig/tasks/yum_update.sh 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.openstack overcloud update stack overcloud -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/network-environment.yaml --templates -i 


2. check pcs status on that controller and see it is not stopped
3. check error logs
4. 
Actual results:


Expected results:


Additional info:

Comment 2 chris alfonso 2015-09-14 16:27:59 UTC
What are the specific problems that are caused by this. Does the cluster recover? Is the cluster actually down? Is the status consistent with the actual cluster state?

Comment 3 Zane Bitter 2015-09-14 18:03:51 UTC
The theory of this is that Puppet knows about the cluster, and therefore if we use Puppet to update the services then it will do the Right Thing by Pacemaker.

I'm setting this as Depending On bug 1259905 (which implements the Puppet integration). It *appears* that you're testing a puddle without that support, because the yum update command is updating openstack services directly and there's no evidence of a puppet run. With the patches for bug 1259905, yum should stop updating the openstack services (it will only update packages not managed by Puppet) and puppet should update them instead.

Comment 5 Fabio Massimo Di Nitto 2015-09-16 11:54:27 UTC
(In reply to chris alfonso from comment #2)
> What are the specific problems that are caused by this. Does the cluster
> recover? Is the cluster actually down? Is the status consistent with the
> actual cluster state?

Irrelevant really. RHEL HA / cluster does not support updates with the cluster running.

The correct sequence of node updates is (pseudo code):

for n in node_list; do
  connect to node n
  pcs cluster stop
  yum update ...
  reboot || pcs cluster start
done

Comment 6 Fabio Massimo Di Nitto 2015-09-16 14:51:42 UTC
(In reply to Fabio Massimo Di Nitto from comment #5)
> (In reply to chris alfonso from comment #2)
> > What are the specific problems that are caused by this. Does the cluster
> > recover? Is the cluster actually down? Is the status consistent with the
> > actual cluster state?
> 
> Irrelevant really. RHEL HA / cluster does not support updates with the
> cluster running.
> 
> The correct sequence of node updates is (pseudo code):
> 
> for n in node_list; do
>   connect to node n
>   pcs cluster stop
>   yum update ...
>   reboot || pcs cluster start

^^^^ wait for node to come back online before moving to the next node.

> done

Comment 7 Zane Bitter 2015-09-16 15:06:56 UTC
The good news is that we already have the orchestration in place to make sure only one node is updated at a time.

What is going to be tricky is only stopping the cluster when there is actually something to update. Two possibilities come to mind:

* Do the cluster stop + start only after the package update, and only if something actually changed.
* Always stop + start the cluster regardless of whether anything is changing. This may be fine for a 3+ node HA cluster? I think there's also a case though where we're deploying on a single controller but still using Pacemaker? It seems suboptimal in that case.

Any thoughts about whether either of these could work?

Comment 8 Fabio Massimo Di Nitto 2015-09-16 15:14:55 UTC
(In reply to Zane Bitter from comment #7)
> The good news is that we already have the orchestration in place to make
> sure only one node is updated at a time.

We will need to review this sequence once OSPd can deploy Instance HA feature because you will need a slightly different order of commands to update compute nodes. Something to keep in the back of our heads for later.

> 
> What is going to be tricky is only stopping the cluster when there is
> actually something to update. Two possibilities come to mind:
> 
> * Do the cluster stop + start only after the package update, and only if
> something actually changed.

You cannot do this, you need to stop before the update.

> * Always stop + start the cluster regardless of whether anything is
> changing. This may be fine for a 3+ node HA cluster? I think there's also a
> case though where we're deploying on a single controller but still using
> Pacemaker? It seems suboptimal in that case.
> 
> Any thoughts about whether either of these could work?

What about doing a yum dry run to check if there are updates and then take action?

Comment 9 Zane Bitter 2015-09-16 19:13:36 UTC
(In reply to Fabio Massimo Di Nitto from comment #8)
> What about doing a yum dry run to check if there are updates and then take
> action?

That could be a possibility. As I mentioned in comment #3 though, we're aiming to have Puppet take care of the updates to the OpenStack services using ensure => latest. So this might add significant extra complexity in that case.

Comment 10 David Hill 2015-09-17 15:20:23 UTC
I'm not sure it's the same bug but here is another overcloud that broke while updating : https://bugzilla.redhat.com/1263855

Comment 11 chris alfonso 2015-09-17 16:18:14 UTC
*** Bug 1263855 has been marked as a duplicate of this bug. ***

Comment 13 Zane Bitter 2015-09-18 15:59:57 UTC
So, to clarify, we're talking purely about z-stream updates here. No DB migrations. No RPC version changes. Just z-stream bug fixes.

Do we really have to stop the entire cluster once for every controller node to be updated? That doesn't sound very HA. In fact, it sounds a lot less HA than not using Pacemaker at all would be.

Can you go more into the specific reasons why the entire cluster has to be stopped while we install an updated version of an OpenStack service with a minor bug fix? It's really hard to know how best to work within the constraints that we're seeing here without knowing what is driving them.

Comment 14 Zane Bitter 2015-09-18 16:02:22 UTC
BTW the plan for reboots is that we will not automatically do them as part of the package update (i.e. minor z-stream update) process. There will be a separate "openstack overcloud ..." command to do a rolling reboot of nodes if e.g. the yum update has installed a new kernel and the user wants to start making use of it.

Comment 15 Zane Bitter 2015-09-18 17:59:47 UTC
OK, attempting to answer my own questions to some extent after discussions on IRC:

* The "cluster stop" suggestion is missing a parameter, which confused me as to its meaning. The command is in fact "cluster stop <node>", so it only stops pacemaker on one node: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/_perform_a_failover.html

* The "cluster standby <node>" suggestion does seem better to me - keep Pacemaker running so the other cluster members know about it but just shut down the services it manages: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/_test_cluster_failover.html

* As I mentioned in comment #14, we're planning to implement a separate reboot command. That one will definitely want to do "cluster stop <node>", but given that this script does not include a reboot I think we should be able to get away with "cluster standby <node>".

* There is probably no need to get Puppet involved in cases where Pacemaker is managing the services. In those cases, the knowledge of how to restart them in the correct order rests with Pacemaker, not Puppet.

* For OSP Director, we always deploy controller nodes with Pacemaker (even if there is only one of them). So we need Puppet integration only for compute nodes? (Or do those use Pacemaker too?) Are there any services managed by Puppet but not under Pacemaker control in a Pacemaker-enabled deployment?

* For upstream tripleo, however, deploying with Pacemaker remains optional so we'll need to play nice with that code in any event.

* Doing "yum list updates" seems like a good way of determining whether the cluster member actually needs to be placed in standby/stopped prior to the update - we could compare to the list of packages being managed by Puppet, or maybe just to the Pacemaker config somehow. We should probably use the output of this command to feed back into the actual yum update, so that we don't get race conditions.

* Even better might be to do "yum update -y --downloadonly" and then use the transaction saved in /tmp with "yum load-transaction" so that we get exactly the versions we were expecting and can't accidentally pull in anything else from an even-newer version of a package that may have appeared.

* For bonus points, we could use the "yum deplist command" to get the list of all packages which the openstack services are depending on, so that we could be sure to restart the service even if only a library changed. Unfortunately you have to do it recursively to get everything though, which sounds very slow.

Comment 16 Fabio Massimo Di Nitto 2015-09-18 18:01:46 UTC
(In reply to Zane Bitter from comment #13)
> So, to clarify, we're talking purely about z-stream updates here. No DB
> migrations. No RPC version changes. Just z-stream bug fixes.

Yes that is correct.

> 
> Do we really have to stop the entire cluster once for every controller node
> to be updated?

No? I never said the entire cluster has to be stopped at once. I said one node at a time has to be stopped, updated and then started again in sequence. See comment #5 and comment #6.

> That doesn't sound very HA. In fact, it sounds a lot less HA
> than not using Pacemaker at all would be.
> 
> Can you go more into the specific reasons why the entire cluster has to be
> stopped while we install an updated version of an OpenStack service with a
> minor bug fix? It's really hard to know how best to work within the
> constraints that we're seeing here without knowing what is driving them.

I am not sure where "this entire cluster" is coming from.

Comment 17 Zane Bitter 2015-09-18 18:35:12 UTC
Thanks Fabio, we were confused but I think we figured it out (see comment #15).

Comment 18 Fabio Massimo Di Nitto 2015-09-18 18:49:25 UTC
(In reply to Zane Bitter from comment #15)
> OK, attempting to answer my own questions to some extent after discussions
> on IRC:
> 
> * The "cluster stop" suggestion is missing a parameter, which confused me as
> to its meaning. The command is in fact "cluster stop <node>", so it only
> stops pacemaker on one node:
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/
> Clusters_from_Scratch/_perform_a_failover.html
> 

The <node> parameter is required only if you need to stop a node that is not the current one where you are executing the command, otherwise it´s implicit that you want to stop only "localhost".

> * The "cluster standby <node>" suggestion does seem better to me - keep
> Pacemaker running so the other cluster members know about it but just shut
> down the services it manages:
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/
> Clusters_from_Scratch/_test_cluster_failover.html

I discourage it for this specific use case. If the update is pulling in also pacemaker & co updates, you need to special case and restart them manually. Might as well take the safe path, stop also pacemaker, update and start.

> 
> * As I mentioned in comment #14, we're planning to implement a separate
> reboot command. That one will definitely want to do "cluster stop <node>",
> but given that this script does not include a reboot I think we should be
> able to get away with "cluster standby <node>".

As above, please avoid standby mode unless you want to parse yum output.

> 
> * There is probably no need to get Puppet involved in cases where Pacemaker
> is managing the services. In those cases, the knowledge of how to restart
> them in the correct order rests with Pacemaker, not Puppet.

+1

> 
> * For OSP Director, we always deploy controller nodes with Pacemaker (even
> if there is only one of them). So we need Puppet integration only for
> compute nodes? (Or do those use Pacemaker too?) Are there any services
> managed by Puppet but not under Pacemaker control in a Pacemaker-enabled
> deployment?

At some point in OSP8 pacemaker will manage also compute nodes, but we will need to work on that upgrade steps once we cross that bridge.

> 
> * For upstream tripleo, however, deploying with Pacemaker remains optional
> so we'll need to play nice with that code in any event.
> 
> * Doing "yum list updates" seems like a good way of determining whether the
> cluster member actually needs to be placed in standby/stopped prior to the
> update - we could compare to the list of packages being managed by Puppet,
> or maybe just to the Pacemaker config somehow. We should probably use the
> output of this command to feed back into the actual yum update, so that we
> don't get race conditions.

this is a very dangerous path to play, unless you want to start rabbit holing into all possible dependency trees of every single package.

say given C library foo is updated but it´s not in the list of package, tho it affects all services.. how do you plan to unfold all those dependencies?

IMHO, if there is an update, you want to take the safest best and do the proper actions.

> 
> * Even better might be to do "yum update -y --downloadonly" and then use the
> transaction saved in /tmp with "yum load-transaction" so that we get exactly
> the versions we were expecting and can't accidentally pull in anything else
> from an even-newer version of a package that may have appeared.

As above, parsing data in this case is dangerous.

> 
> * For bonus points, we could use the "yum deplist command" to get the list
> of all packages which the openstack services are depending on, so that we
> could be sure to restart the service even if only a library changed.
> Unfortunately you have to do it recursively to get everything though, which
> sounds very slow.

That doesn´t work for all packages unfortunately. So be careful. Due to some restrictions not all packages can express a dependency properly. I can bring you a few examples if necessary (hence my comment above).

Comment 19 Zane Bitter 2015-09-20 16:52:34 UTC
(In reply to Fabio Massimo Di Nitto from comment #18)
> (In reply to Zane Bitter from comment #15)
> > * Doing "yum list updates" seems like a good way of determining whether the
> > cluster member actually needs to be placed in standby/stopped prior to the
> > update - we could compare to the list of packages being managed by Puppet,
> > or maybe just to the Pacemaker config somehow. We should probably use the
> > output of this command to feed back into the actual yum update, so that we
> > don't get race conditions.
> 
> this is a very dangerous path to play, unless you want to start rabbit
> holing into all possible dependency trees of every single package.
> 
> say given C library foo is updated but it´s not in the list of package, tho
> it affects all services.. how do you plan to unfold all those dependencies?
> 
> IMHO, if there is an update, you want to take the safest best and do the
> proper actions.
> 
> > 
> > * Even better might be to do "yum update -y --downloadonly" and then use the
> > transaction saved in /tmp with "yum load-transaction" so that we get exactly
> > the versions we were expecting and can't accidentally pull in anything else
> > from an even-newer version of a package that may have appeared.
> 
> As above, parsing data in this case is dangerous.
> 
> > 
> > * For bonus points, we could use the "yum deplist command" to get the list
> > of all packages which the openstack services are depending on, so that we
> > could be sure to restart the service even if only a library changed.
> > Unfortunately you have to do it recursively to get everything though, which
> > sounds very slow.
> 
> That doesn´t work for all packages unfortunately. So be careful. Due to some
> restrictions not all packages can express a dependency properly. I can bring
> you a few examples if necessary (hence my comment above).

Well what we have now is we run yum, and whatever services get restarted by the %postupgrade scripts get restarted, though probably not in the correct order.

What we were planning is to get Puppet involved, so that Puppet would do the restarts in the order that is required when packages it knows about demand that. This is strictly better.

What I was suggesting here is to calculate the dependencies recursive using yum and restart the cluster member if any dependencies yum knows about have changed. This, while far from perfect, is strictly better than either of the previous two approaches. It may or may not be practical though - "yum deplist" is not recursive, so it could get messy very quickly.

I don't really see RDO Manager/Director's role here as introducing new system administration tools, so much as knowing about the topology of the OpenStack deployment such that it can orchestrate the existing administration tools, like yum. A sysadmin is still going to need to decide when a yum update is sufficient and when they want to reboot all the nodes, which we will provide a separate command to orchestrate.

Comment 20 Fabio Massimo Di Nitto 2015-09-20 17:45:49 UTC
(In reply to Zane Bitter from comment #19)
> (In reply to Fabio Massimo Di Nitto from comment #18)
> > (In reply to Zane Bitter from comment #15)
> > > * Doing "yum list updates" seems like a good way of determining whether the
> > > cluster member actually needs to be placed in standby/stopped prior to the
> > > update - we could compare to the list of packages being managed by Puppet,
> > > or maybe just to the Pacemaker config somehow. We should probably use the
> > > output of this command to feed back into the actual yum update, so that we
> > > don't get race conditions.
> > 
> > this is a very dangerous path to play, unless you want to start rabbit
> > holing into all possible dependency trees of every single package.
> > 
> > say given C library foo is updated but it´s not in the list of package, tho
> > it affects all services.. how do you plan to unfold all those dependencies?
> > 
> > IMHO, if there is an update, you want to take the safest best and do the
> > proper actions.
> > 
> > > 
> > > * Even better might be to do "yum update -y --downloadonly" and then use the
> > > transaction saved in /tmp with "yum load-transaction" so that we get exactly
> > > the versions we were expecting and can't accidentally pull in anything else
> > > from an even-newer version of a package that may have appeared.
> > 
> > As above, parsing data in this case is dangerous.
> > 
> > > 
> > > * For bonus points, we could use the "yum deplist command" to get the list
> > > of all packages which the openstack services are depending on, so that we
> > > could be sure to restart the service even if only a library changed.
> > > Unfortunately you have to do it recursively to get everything though, which
> > > sounds very slow.
> > 
> > That doesn´t work for all packages unfortunately. So be careful. Due to some
> > restrictions not all packages can express a dependency properly. I can bring
> > you a few examples if necessary (hence my comment above).
> 
> Well what we have now is we run yum, and whatever services get restarted by
> the %postupgrade scripts get restarted, though probably not in the correct
> order.

Nope that´s not completely coorect. %postupgrade does a restart only if the service is already started and registered with systemd.

When a systemd service is managed by pacemaker, pacemaker adds some extra data to the unit file (in override mode) that will filter out those restarts since the service is managed externally.

> 
> What we were planning is to get Puppet involved, so that Puppet would do the
> restarts in the order that is required when packages it knows about demand
> that. This is strictly better.

If a service is running under pacemaker, have pacemaker handle the restart. If you start adding a 3rd layers for managing services (between systemd/puppet/pacemaker) it´s going to be a receipt for disasters.

> 
> What I was suggesting here is to calculate the dependencies recursive using
> yum and restart the cluster member if any dependencies yum knows about have
> changed. This, while far from perfect, is strictly better than either of the
> previous two approaches. It may or may not be practical though - "yum
> deplist" is not recursive, so it could get messy very quickly.

As I explained before, yum dependencies can´t always be expressed properly. There are packages out there that do lack explicit dependencies. Take for example resource-agents. It cannot by definition, pull in all dependencies for each agent that it can drive. Hence, you cannot assume that upgrading a package won´t affect something else.

> 
> I don't really see RDO Manager/Director's role here as introducing new
> system administration tools, so much as knowing about the topology of the
> OpenStack deployment such that it can orchestrate the existing
> administration tools, like yum.

We don´t need to replace yum, but we will need to issue update commands in a given order.

> A sysadmin is still going to need to decide
> when a yum update is sufficient and when they want to reboot all the nodes,
> which we will provide a separate command to orchestrate.

Agreed. that´s why I mentioned in comment 5 "pcs cluster start || reboot" <- can´t make that call automatically.

Comment 23 Steve Baker 2015-09-21 21:32:46 UTC
"cluster start is an async operation. there is no perfect way to block here because you don´t know exactly when all resources will complete their start operation. It is safe enough to loop on galera resource to be started tho, since it´s the resource that takes the longest and once it´s in Master state, all the other resources will start very fast right after."

Fabio, I'm not comfortable waiting for just galera, it seems like this will just make the potential race less frequent.

What is needed is a way to discover what services are managed by pacemaker so that the script can check their status until they are active. Do you have any suggestions?

Comment 25 Fabio Massimo Di Nitto 2015-09-22 02:51:47 UTC
(In reply to Steve Baker from comment #23)
> "cluster start is an async operation. there is no perfect way to block here
> because you don´t know exactly when all resources will complete their start
> operation. It is safe enough to loop on galera resource to be started tho,
> since it´s the resource that takes the longest and once it´s in Master
> state, all the other resources will start very fast right after."
> 
> Fabio, I'm not comfortable waiting for just galera, it seems like this will
> just make the potential race less frequent.
> 
> What is needed is a way to discover what services are managed by pacemaker
> so that the script can check their status until they are active. Do you have
> any suggestions?


pcs status

but the point is that there is no race. Once pacemaker is up, (pcs cluster start) you can safely move to the next. waiting for galera is a good safety measure. there is no race either ways.

Comment 27 Zane Bitter 2015-09-23 23:51:12 UTC
(In reply to Fabio Massimo Di Nitto from comment #25)
> pcs status
> 
> but the point is that there is no race. Once pacemaker is up, (pcs cluster
> start) you can safely move to the next. waiting for galera is a good safety
> measure. there is no race either ways.

OK, I think I would separate this problem into three concerns:
- That we could move on and disable Pacemaker on the next node before it has fully started on the current node, and thus lose quorum. I think this is what you're referring to when you say there is no 'race'.
- That we could move on and disable Pacemaker on the next node and then the node after that before the services have finished starting on this node, with the result that we have a service down on all 3 nodes of a setup at the same time. This is what Steve meant when he talked about a 'race'.
- That something could fail to start after an update deployment had reported success. This is our last chance to detect any problems, and we don't want to miss it because the orchestration will go on and likely cause the same problem on the other nodes too.

We can all agree that monitoring "pcs status" clearly solves the first concern. Once Pacemaker has joined the cluster, the other members will recognise that and there will be no loss of quorum. BTW what specifically should we be looking for in that output?

It's not clear, but I think you're suggesting that waiting for Galera to start would help with the second concern. This will probably work, because Galera is likely to take long enough to start up that it's unlikely that we would have time for it to start up on the second node and for us to go ahead and disable the cluster on the third node before the rest of the services on the first node could start up. But it doesn't sound inherently safe, it's just a weird timing issue away from disaster. Can we get the information we need to prevent this from "pcs status"? If not, how can we get it? How are you proposing to check that Galera is active? Maybe we can do the same with the other services.

AFAICT, none of the ideas discussed so far help at all with the third concern. How can we be sure to wait long enough that we can detect any errors with services failing to start up so that we can report the results?

Comment 28 Steve Baker 2015-09-24 00:07:39 UTC
If all we care about is that the current node is back in the cluster then something like the following can be run in a loop until there is a match

    pcs status | grep "Online:.*$HOSTNAME.*"

This loop would need a timeout.

The command "pcs status xml" outputs structured data which includes the state of nodes and resources. I think it is worthwhile to eventually write a python script which blocks until there is an entry for every crm_mon.resources.resource[*].node[name=$HOSTNAME]. Until that script (or something based on another pcs command) exists there is a risk that a package upgrade will happily break resources on every node until HA is lost.

Comment 29 Steve Baker 2015-09-24 00:10:29 UTC
comment 27 and comment 28 collided in mid air. I think my "pcs status xml" would address zanes 2nd and 3rd concern

Comment 30 Steve Baker 2015-09-24 02:22:40 UTC
I've pushed an updated change for review https://code.engineering.redhat.com/gerrit/#/c/58102/2

Comment 31 Fabio Massimo Di Nitto 2015-09-24 05:07:46 UTC
(In reply to Zane Bitter from comment #27)
> (In reply to Fabio Massimo Di Nitto from comment #25)
> > pcs status
> > 
> > but the point is that there is no race. Once pacemaker is up, (pcs cluster
> > start) you can safely move to the next. waiting for galera is a good safety
> > measure. there is no race either ways.
> 
> OK, I think I would separate this problem into three concerns:
> - That we could move on and disable Pacemaker on the next node before it has
> fully started on the current node, and thus lose quorum. I think this is
> what you're referring to when you say there is no 'race'.

As long as pacemaker daemon is started, and it's online (as noted in the comment below) than there is no loss of quorum.

> - That we could move on and disable Pacemaker on the next node and then the
> node after that before the services have finished starting on this node,
> with the result that we have a service down on all 3 nodes of a setup at the
> same time. This is what Steve meant when he talked about a 'race'.

That can't happen because pacemaker would relocate the service on another node (there is the 3rd node remember?).

> - That something could fail to start after an update deployment had reported
> success. This is our last chance to detect any problems, and we don't want
> to miss it because the orchestration will go on and likely cause the same
> problem on the other nodes too.

Ok, that's a different pandora's box. This issue would have to be detected by testing upgrades before we send them to customers. But yes I agree that parsing all would be ideal. The complexity here is that you don't know ahead of time which services are supposed to be running on a given node. So you don't really know what to look for.

> 
> We can all agree that monitoring "pcs status" clearly solves the first
> concern. Once Pacemaker has joined the cluster, the other members will
> recognise that and there will be no loss of quorum. BTW what specifically
> should we be looking for in that output?
> 
> It's not clear, but I think you're suggesting that waiting for Galera to
> start would help with the second concern. This will probably work, because
> Galera is likely to take long enough to start up that it's unlikely that we
> would have time for it to start up on the second node and for us to go ahead
> and disable the cluster on the third node before the rest of the services on
> the first node could start up. But it doesn't sound inherently safe, it's
> just a weird timing issue away from disaster. Can we get the information we
> need to prevent this from "pcs status"? If not, how can we get it? How are
> you proposing to check that Galera is active? Maybe we can do the same with
> the other services.

Well you shouldn't stop node 2 till node 1 has completed to have galera up and in master state, and surely you shouldn't touch all 3 nodes at once as we discussed before, otherwise it defeats the whole point of upgrading one node at a time. if you end up in a condition where node 1 has not started and you are stopping 2 and 3, then something else is wrong.

> 
> AFAICT, none of the ideas discussed so far help at all with the third
> concern. How can we be sure to wait long enough that we can detect any
> errors with services failing to start up so that we can report the results?

Well, you will need to make sure that there are no errors in pcs status before starting the upgrade, if there are you would need to clear them. Then do the upgrade and wait say 5/10 minutes and see if there are other errors, but it's very hairy. Also, even if a service start (let say nova-api), you still don't know if that daemon is actually working properly and responding to api request.

From a pacemaker perspective the service is started tho.

Comment 33 Alexander Chuzhoy 2015-09-28 22:13:54 UTC
FailedQA:

Environment:
openstack-tripleo-heat-templates-0.8.6-69.el7ost.noarch

Failing based on comments:
https://bugzilla.redhat.com/show_bug.cgi?id=1264203#c17
https://bugzilla.redhat.com/show_bug.cgi?id=1264203#c19
https://bugzilla.redhat.com/show_bug.cgi?id=1264203#c20
https://bugzilla.redhat.com/show_bug.cgi?id=1264203#c23

Comment 35 Steve Baker 2015-09-30 21:24:27 UTC
This isn't working on a single non-HA pacemaker managed Controller because stopping the node loses the quorum of 1.

http://paste.openstack.org/show/474864/

I've proposed a fix for this case:
https://code.engineering.redhat.com/gerrit/#/c/58700/

Comment 36 Andrew Beekhof 2015-09-30 23:30:43 UTC
Fix looks sane to me

Comment 38 Alexander Chuzhoy 2015-10-01 20:57:07 UTC
Verified:

Environment:
openstack-tripleo-heat-templates-0.8.6-71.el7ost.noarch


Below is the relevant output from pcs status on controllers during the update:

1) Before the yum update started.


[root@overcloud-controller-1 ~]# pcs status|tail -n 10

PCSD Status:
  overcloud-controller-0: Online
  overcloud-controller-1: Online
  overcloud-controller-2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


2) Shutting down the pacemaker

[root@overcloud-controller-1 ~]# pcs status|tail -n 10

PCSD Status:
  overcloud-controller-0: Online
  overcloud-controller-1: Online
  overcloud-controller-2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: deactivating/enabled
  pcsd: active/enabled           


3) The pacemaker is down
  
[root@overcloud-controller-1 ~]# pcs status|tail -n 10

PCSD Status:
  overcloud-controller-0: Online
  overcloud-controller-1: Online
  overcloud-controller-2: Online

Daemon Status:
  corosync: deactivating/enabled
  pacemaker: inactive/enabled   
  pcsd: active/enabled          

4) can't even contact 
[root@overcloud-controller-1 ~]# pcs status|tail -n 10
Error: cluster is not currently running on this node 

5) UP
[root@overcloud-controller-1 ~]# pcs status|tail -n 10

PCSD Status:
  overcloud-controller-0: Online
  overcloud-controller-1: Online
  overcloud-controller-2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Comment 40 errata-xmlrpc 2015-10-08 12:18:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:1862