Bug 1285523

Summary: Update fence-compute to include mark host down and taggable instance support
Product: Red Hat Enterprise Linux 7 Reporter: Stephen Gordon <sgordon>
Component: fence-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: Asaf Hirshberg <ahirshbe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.3CC: abeekhof, cluster-maint, fdinitto, jkurik
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: fence-agents-4.0.11-37.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1285524 1304329 (view as bug list) Environment:
Last Closed: 2016-11-04 04:48:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1185030, 1285524, 1304329    
Attachments:
Description Flags
corosync log from controller-0 from the moment of fencing none

Description Stephen Gordon 2015-11-25 20:18:46 UTC
Description of problem:

In RHEL OpenStack Platform 8 we wish to include support in the fence compute (Nova) agent support for:

- The new OpenStack mark host down API call, where available, for telling NOva the host is down.
- Only evacuating instances that are marked evacuable in their associated image properties or flavor extra specifications.

Initial upstream work has been done:

https://github.com/ClusterLabs/fence-agents/commit/e3d7ccd652edb5d0bd60c210b029ce73c5fa27e9

https://github.com/ClusterLabs/fence-agents/commit/d9f2f483611253ece7fb097a9119de20a22d9111

...but an additional change is required to ensure *only* those instances that were marked evacuable are evacuated. Currently if *no* instances are found to be marked evacuable then *all* instances are considered evacuable which is not the behaviour we want out of the box (for those that do want all instances evacuated, tagging them as such is a simple matter).

Comment 8 Andrew Beekhof 2016-05-27 00:14:52 UTC
Additional patch: 

   https://github.com/beekhof/fence-agents/commit/06f592e

Comment 9 Andrew Beekhof 2016-05-27 03:03:22 UTC
Merged upstream:

   https://github.com/ClusterLabs/fence-agents/commit/e4599e4

7.2 packages available at:

   http://people.redhat.com/abeekhof/instance-ha/

Comment 10 Andrew Beekhof 2016-06-02 01:53:52 UTC
One last commit (to allow everything to be disabled):

   https://github.com/ClusterLabs/fence-agents/commit/cdbdd93


Marek: Could we get a build with these commits please?

Comment 11 Oyvind Albrigtsen 2016-06-02 11:51:28 UTC
New build with the last commits.

Comment 13 Asaf Hirshberg 2016-06-07 05:48:18 UTC
Failed using tagged image method, uploaded 2 cirros images and tagged one as "evacuable=true" then fenced one compute. corosync.log from the controller is attached.

[stack@puma33 ~]$ glance image-show ce2c9c9b-223f-4808-b0ce-cec536587b85
+------------------+----------------------------------------------------------------------------------+
| Property         | Value                                                                            |
+------------------+----------------------------------------------------------------------------------+
| checksum         | 50bdc35edb03a38d91b1b071afb20a3c                                                 |
| container_format | bare                                                                             |
| created_at       | 2016-06-07T05:15:18Z                                                             |
| direct_url       | rbd://3e95f1ce-2c66-11e6-a8b8-009c02b08fc8/images/ce2c9c9b-223f-4808-b0ce-       |
|                  | cec536587b85/snap                                                                |
| disk_format      | qcow2                                                                            |
| evacuable        | true                                                                             |
| id               | ce2c9c9b-223f-4808-b0ce-cec536587b85                                             |
| min_disk         | 0                                                                                |
| min_ram          | 0                                                                                |
| name             | cirros-tag                                                                       |
| owner            | 155fd4c559e842c98f037d9f3257e8c5                                                 |
| protected        | False                                                                            |
| size             | 9761280                                                                          |
| status           | active                                                                           |
| tags             | []                                                                               |
| updated_at       | 2016-06-07T05:16:03Z                                                             |
| virtual_size     | None                                                                             |
| visibility       | private                                                                          |
+------------------+----------------------------------------------------------------------------------+
[stack@puma33 ~]$ glance image-show bbb0b3d8-1dcf-4ab7-bd86-1139f213b7c7
+------------------+----------------------------------------------------------------------------------+
| Property         | Value                                                                            |
+------------------+----------------------------------------------------------------------------------+
| checksum         | 50bdc35edb03a38d91b1b071afb20a3c                                                 |
| container_format | bare                                                                             |
| created_at       | 2016-06-07T05:15:08Z                                                             |
| direct_url       | rbd://3e95f1ce-2c66-11e6-a8b8-009c02b08fc8/images/bbb0b3d8-1dcf-                 |
|                  | 4ab7-bd86-1139f213b7c7/snap                                                      |
| disk_format      | qcow2                                                                            |
| id               | bbb0b3d8-1dcf-4ab7-bd86-1139f213b7c7                                             |
| min_disk         | 0                                                                                |
| min_ram          | 0                                                                                |
| name             | cirros                                                                           |
| owner            | 155fd4c559e842c98f037d9f3257e8c5                                                 |
| protected        | False                                                                            |
| size             | 9761280                                                                          |
| status           | active                                                                           |
| tags             | []                                                                               |
| updated_at       | 2016-06-07T05:15:11Z                                                             |
| virtual_size     | None                                                                             |
| visibility       | private                                                                          |
+------------------+----------------------------------------------------------------------------------+
[stack@puma33 ~]$ 

from the controller:
[root@overcloud-controller-0 ~]# pcs stonith fence overcloud-novacompute-1
Node: overcloud-novacompute-1 fenced

[stack@puma33 ~]$ nova list --fields name,status,host
+--------------------------------------+------------+--------+-------------------------------------+
| ID                                   | Name       | Status | Host                                |
+--------------------------------------+------------+--------+-------------------------------------+
| 659e9c99-dedb-494b-a4b0-c81680ca2c22 | vm-regular | ACTIVE | overcloud-novacompute-0.localdomain |
| 595da4aa-5489-4eb7-99f2-eda275692423 | vm-tag     | ACTIVE | overcloud-novacompute-0.localdomain |
+--------------------------------------+------------+--------+-------------------------------------+

[stack@puma33 ~]$ nova list --fields name,status,host
+--------------------------------------+------------+--------+-------------------------------------+
| ID                                   | Name       | Status | Host                                |
+--------------------------------------+------------+--------+-------------------------------------+
| 659e9c99-dedb-494b-a4b0-c81680ca2c22 | vm-regular | ACTIVE | overcloud-novacompute-1.localdomain |
| 595da4aa-5489-4eb7-99f2-eda275692423 | vm-tag     | ACTIVE | overcloud-novacompute-1.localdomain |
+--------------------------------------+------------+--------+-------------------------------------+
[stack@puma33 ~]$

Comment 14 Asaf Hirshberg 2016-06-07 05:51:24 UTC
Created attachment 1165487 [details]
corosync log from controller-0 from the moment of fencing

Comment 15 Andrew Beekhof 2016-06-07 07:17:31 UTC
You say 'uploaded 2 cirros images and tagged one as "evacuable=true" ' but neither of the glance image-show commands indicate a tag is set.

Both have:

tags             | [] 

This is what it looks like for me:

[stack@undercloud ~]$ glance image-show c99f990c-f05f-41ae-ac8c-934c8fa3e377
+----------------------+--------------------------------------+
| Property             | Value                                |
+----------------------+--------------------------------------+
| Property 'evacuable' | true                                 |


Also, we'd need more than just corosync.log
Please do a sos report for all nodes.

Comment 16 Oyvind Albrigtsen 2016-06-07 09:30:10 UTC
New build with fix for indent issue:
https://github.com/ClusterLabs/fence-agents/pull/78

Comment 17 Asaf Hirshberg 2016-06-08 08:18:41 UTC
Andrew, I took the bug again as the version of fence-agents was older than the "fixed in version" so my comment-13 not really relevant.. 
but just for the record, i did tagged it and showed it in comment-13
[stack@puma33 ~]$ glance image-show ce2c9c9b-223f-4808-b0ce-cec536587b85
+------------------+----------------------------------------------------------------------------------+
| Property         | Value                                                                            |
+------------------+----------------------------------------------------------------------------------+
| checksum         | 50bdc35edb03a38d91b1b071afb20a3c                                                 |
| container_format | bare                                                                             |
| created_at       | 2016-06-07T05:15:18Z                                                             |
| direct_url       | rbd://3e95f1ce-2c66-11e6-a8b8-009c02b08fc8/images/ce2c9c9b-223f-4808-b0ce-       |
|                  | cec536587b85/snap                                                                |
| disk_format      | qcow2                                                                            |
| evacuable        | true  

                
Which of the following is needed in order to test the bug? after updating just fence-agent-common and fence-agent-scsi which was a dependency the evacuation haven't accrued after the fencing action:
fence-agents-all-4.0.11-36.el7.x86_64.rpm
fence-agents-amt-ws-4.0.11-36.el7.x86_64.rpm
fence-agents-apc-4.0.11-36.el7.x86_64.rpm
fence-agents-apc-snmp-4.0.11-36.el7.x86_64.rpm
fence-agents-bladecenter-4.0.11-36.el7.x86_64.rpm
fence-agents-brocade-4.0.11-36.el7.x86_64.rpm
fence-agents-cisco-mds-4.0.11-36.el7.x86_64.rpm
fence-agents-cisco-ucs-4.0.11-36.el7.x86_64.rpm
fence-agents-common-4.0.11-36.el7.x86_64.rpm
fence-agents-compute-4.0.11-36.el7.x86_64.rpm
fence-agents-drac5-4.0.11-36.el7.x86_64.rpm
fence-agents-eaton-snmp-4.0.11-36.el7.x86_64.rpm
fence-agents-emerson-4.0.11-36.el7.x86_64.rpm
fence-agents-eps-4.0.11-36.el7.x86_64.rpm
fence-agents-hpblade-4.0.11-36.el7.x86_64.rpm
fence-agents-ibmblade-4.0.11-36.el7.x86_64.rpm
fence-agents-ifmib-4.0.11-36.el7.x86_64.rpm
fence-agents-ilo2-4.0.11-36.el7.x86_64.rpm
fence-agents-ilo-moonshot-4.0.11-36.el7.x86_64.rpm
fence-agents-ilo-mp-4.0.11-36.el7.x86_64.rpm
fence-agents-ilo-ssh-4.0.11-36.el7.x86_64.rpm
fence-agents-intelmodular-4.0.11-36.el7.x86_64.rpm
fence-agents-ipdu-4.0.11-36.el7.x86_64.rpm
fence-agents-ipmilan-4.0.11-36.el7.x86_64.rpm
fence-agents-kdump-4.0.11-36.el7.x86_64.rpm
fence-agents-mpath-4.0.11-36.el7.x86_64.rpm
fence-agents-rhevm-4.0.11-36.el7.x86_64.rpm
fence-agents-rsa-4.0.11-36.el7.x86_64.rpm
fence-agents-rsb-4.0.11-36.el7.x86_64.rpm
fence-agents-scsi-4.0.11-36.el7.x86_64.rpm
fence-agents-virsh-4.0.11-36.el7.x86_64.rpm
fence-agents-vmware-soap-4.0.11-36.el7.x86_64.rpm
fence-agents-wti-4.0.11-36.el7.x86_64.rpm

Comment 18 Andrew Beekhof 2016-06-14 03:06:49 UTC
fence-agents-common-4.0.11-36.el7.x86_64.rpm
fence-agents-compute-4.0.11-36.el7.x86_64.rpm

but it will need the "tag and" fix we talked about last week

Comment 19 Andrew Beekhof 2016-06-14 10:46:27 UTC
Here's the final patch:

    https://github.com/ClusterLabs/fence-agents/commit/90dfc11

Oyvind: Can we get a new build please?

Comment 20 Oyvind Albrigtsen 2016-06-14 11:11:30 UTC
New build with the last patch.

Comment 22 Oyvind Albrigtsen 2016-06-15 11:46:06 UTC
*** Bug 1288312 has been marked as a duplicate of this bug. ***

Comment 23 Asaf Hirshberg 2016-06-16 04:57:50 UTC
Verified on RHEL-OSP director 9.0 puddle - 2016-06-03.1 using fence-agents-4.0.11-37.el7

[stack@puma33 ~]$ nova list --fields name,status,host
+--------------------------------------+---------+--------+-------------------------------------+
| ID                                   | Name    | Status | Host                                |
+--------------------------------------+---------+--------+-------------------------------------+
| 5f5c1db1-e9db-414b-a27e-5f046ea8a5fc | vm-TAG  | ACTIVE | overcloud-novacompute-0.localdomain |
| 0a571f41-c932-4486-a32f-da2ad7e56068 | vm-reg1 | ACTIVE | overcloud-novacompute-0.localdomain |
| 3a9f4957-1fa8-4fc8-a26f-1b5737dd513d | vm-reg2 | ACTIVE | overcloud-novacompute-1.localdomain |
+--------------------------------------+---------+--------+-------------------------------------+

*** Fencing compute-0 
[root@overcloud-controller-1 ~]# pcs stonith fence overcloud-novacompute-0

[stack@puma33 ~]$ nova list --fields name,status,host
+--------------------------------------+---------+--------+-------------------------------------+
| ID                                   | Name    | Status | Host                                |
+--------------------------------------+---------+--------+-------------------------------------+
| 5f5c1db1-e9db-414b-a27e-5f046ea8a5fc | vm-TAG  | ACTIVE | overcloud-novacompute-1.localdomain |
| 0a571f41-c932-4486-a32f-da2ad7e56068 | vm-reg1 | ACTIVE | overcloud-novacompute-0.localdomain |
| 3a9f4957-1fa8-4fc8-a26f-1b5737dd513d | vm-reg2 | ACTIVE | overcloud-novacompute-1.localdomain |
+--------------------------------------+---------+--------+-------------------------------------+
[stack@puma33 ~]$ 

*** Fencing compute-0 again
[root@overcloud-controller-1 ~]# pcs stonith fence overcloud-novacompute-0

+--------------------------------------+---------+--------+-------------------------------------+
| ID                                   | Name    | Status | Host                                |
+--------------------------------------+---------+--------+-------------------------------------+
| 5f5c1db1-e9db-414b-a27e-5f046ea8a5fc | vm-TAG  | ACTIVE | overcloud-novacompute-1.localdomain |
| 0a571f41-c932-4486-a32f-da2ad7e56068 | vm-reg1 | ACTIVE | overcloud-novacompute-0.localdomain |
| 3a9f4957-1fa8-4fc8-a26f-1b5737dd513d | vm-reg2 | ACTIVE | overcloud-novacompute-1.localdomain |
+--------------------------------------+---------+--------+-------------------------------------+

* deleting the tagged instance and the tagged image, fencing compute-0 again
[root@overcloud-controller-1 ~]# pcs stonith fence overcloud-novacompute-0

[stack@puma33 ~]$ nova list --fields name,status,host
+--------------------------------------+---------+--------+-------------------------------------+
| ID                                   | Name    | Status | Host                                |
+--------------------------------------+---------+--------+-------------------------------------+
| 0a571f41-c932-4486-a32f-da2ad7e56068 | vm-reg1 | ACTIVE | overcloud-novacompute-1.localdomain |
| 3a9f4957-1fa8-4fc8-a26f-1b5737dd513d | vm-reg2 | ACTIVE | overcloud-novacompute-1.localdomain |
+--------------------------------------+---------+--------+-------------------------------------+

Comment 25 errata-xmlrpc 2016-11-04 04:48:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2373.html