Bug 1285523 - Update fence-compute to include mark host down and taggable instance support
Update fence-compute to include mark host down and taggable instance support
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: fence-agents (Show other bugs)
7.3
Unspecified Unspecified
urgent Severity urgent
: rc
: ---
Assigned To: Oyvind Albrigtsen
Asaf Hirshberg
: ZStream
: 1288312 (view as bug list)
Depends On:
Blocks: 1185030 1285524 1304329
  Show dependency treegraph
 
Reported: 2015-11-25 15:18 EST by Stephen Gordon
Modified: 2016-11-04 00:48 EDT (History)
4 users (show)

See Also:
Fixed In Version: fence-agents-4.0.11-37.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1285524 1304329 (view as bug list)
Environment:
Last Closed: 2016-11-04 00:48:05 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
corosync log from controller-0 from the moment of fencing (157.35 KB, text/plain)
2016-06-07 01:51 EDT, Asaf Hirshberg
no flags Details

  None (edit)
Description Stephen Gordon 2015-11-25 15:18:46 EST
Description of problem:

In RHEL OpenStack Platform 8 we wish to include support in the fence compute (Nova) agent support for:

- The new OpenStack mark host down API call, where available, for telling NOva the host is down.
- Only evacuating instances that are marked evacuable in their associated image properties or flavor extra specifications.

Initial upstream work has been done:

https://github.com/ClusterLabs/fence-agents/commit/e3d7ccd652edb5d0bd60c210b029ce73c5fa27e9

https://github.com/ClusterLabs/fence-agents/commit/d9f2f483611253ece7fb097a9119de20a22d9111

...but an additional change is required to ensure *only* those instances that were marked evacuable are evacuated. Currently if *no* instances are found to be marked evacuable then *all* instances are considered evacuable which is not the behaviour we want out of the box (for those that do want all instances evacuated, tagging them as such is a simple matter).
Comment 8 Andrew Beekhof 2016-05-26 20:14:52 EDT
Additional patch: 

   https://github.com/beekhof/fence-agents/commit/06f592e
Comment 9 Andrew Beekhof 2016-05-26 23:03:22 EDT
Merged upstream:

   https://github.com/ClusterLabs/fence-agents/commit/e4599e4

7.2 packages available at:

   http://people.redhat.com/abeekhof/instance-ha/
Comment 10 Andrew Beekhof 2016-06-01 21:53:52 EDT
One last commit (to allow everything to be disabled):

   https://github.com/ClusterLabs/fence-agents/commit/cdbdd93


Marek: Could we get a build with these commits please?
Comment 11 Oyvind Albrigtsen 2016-06-02 07:51:28 EDT
New build with the last commits.
Comment 13 Asaf Hirshberg 2016-06-07 01:48:18 EDT
Failed using tagged image method, uploaded 2 cirros images and tagged one as "evacuable=true" then fenced one compute. corosync.log from the controller is attached.

[stack@puma33 ~]$ glance image-show ce2c9c9b-223f-4808-b0ce-cec536587b85
+------------------+----------------------------------------------------------------------------------+
| Property         | Value                                                                            |
+------------------+----------------------------------------------------------------------------------+
| checksum         | 50bdc35edb03a38d91b1b071afb20a3c                                                 |
| container_format | bare                                                                             |
| created_at       | 2016-06-07T05:15:18Z                                                             |
| direct_url       | rbd://3e95f1ce-2c66-11e6-a8b8-009c02b08fc8/images/ce2c9c9b-223f-4808-b0ce-       |
|                  | cec536587b85/snap                                                                |
| disk_format      | qcow2                                                                            |
| evacuable        | true                                                                             |
| id               | ce2c9c9b-223f-4808-b0ce-cec536587b85                                             |
| min_disk         | 0                                                                                |
| min_ram          | 0                                                                                |
| name             | cirros-tag                                                                       |
| owner            | 155fd4c559e842c98f037d9f3257e8c5                                                 |
| protected        | False                                                                            |
| size             | 9761280                                                                          |
| status           | active                                                                           |
| tags             | []                                                                               |
| updated_at       | 2016-06-07T05:16:03Z                                                             |
| virtual_size     | None                                                                             |
| visibility       | private                                                                          |
+------------------+----------------------------------------------------------------------------------+
[stack@puma33 ~]$ glance image-show bbb0b3d8-1dcf-4ab7-bd86-1139f213b7c7
+------------------+----------------------------------------------------------------------------------+
| Property         | Value                                                                            |
+------------------+----------------------------------------------------------------------------------+
| checksum         | 50bdc35edb03a38d91b1b071afb20a3c                                                 |
| container_format | bare                                                                             |
| created_at       | 2016-06-07T05:15:08Z                                                             |
| direct_url       | rbd://3e95f1ce-2c66-11e6-a8b8-009c02b08fc8/images/bbb0b3d8-1dcf-                 |
|                  | 4ab7-bd86-1139f213b7c7/snap                                                      |
| disk_format      | qcow2                                                                            |
| id               | bbb0b3d8-1dcf-4ab7-bd86-1139f213b7c7                                             |
| min_disk         | 0                                                                                |
| min_ram          | 0                                                                                |
| name             | cirros                                                                           |
| owner            | 155fd4c559e842c98f037d9f3257e8c5                                                 |
| protected        | False                                                                            |
| size             | 9761280                                                                          |
| status           | active                                                                           |
| tags             | []                                                                               |
| updated_at       | 2016-06-07T05:15:11Z                                                             |
| virtual_size     | None                                                                             |
| visibility       | private                                                                          |
+------------------+----------------------------------------------------------------------------------+
[stack@puma33 ~]$ 

from the controller:
[root@overcloud-controller-0 ~]# pcs stonith fence overcloud-novacompute-1
Node: overcloud-novacompute-1 fenced

[stack@puma33 ~]$ nova list --fields name,status,host
+--------------------------------------+------------+--------+-------------------------------------+
| ID                                   | Name       | Status | Host                                |
+--------------------------------------+------------+--------+-------------------------------------+
| 659e9c99-dedb-494b-a4b0-c81680ca2c22 | vm-regular | ACTIVE | overcloud-novacompute-0.localdomain |
| 595da4aa-5489-4eb7-99f2-eda275692423 | vm-tag     | ACTIVE | overcloud-novacompute-0.localdomain |
+--------------------------------------+------------+--------+-------------------------------------+

[stack@puma33 ~]$ nova list --fields name,status,host
+--------------------------------------+------------+--------+-------------------------------------+
| ID                                   | Name       | Status | Host                                |
+--------------------------------------+------------+--------+-------------------------------------+
| 659e9c99-dedb-494b-a4b0-c81680ca2c22 | vm-regular | ACTIVE | overcloud-novacompute-1.localdomain |
| 595da4aa-5489-4eb7-99f2-eda275692423 | vm-tag     | ACTIVE | overcloud-novacompute-1.localdomain |
+--------------------------------------+------------+--------+-------------------------------------+
[stack@puma33 ~]$
Comment 14 Asaf Hirshberg 2016-06-07 01:51 EDT
Created attachment 1165487 [details]
corosync log from controller-0 from the moment of fencing
Comment 15 Andrew Beekhof 2016-06-07 03:17:31 EDT
You say 'uploaded 2 cirros images and tagged one as "evacuable=true" ' but neither of the glance image-show commands indicate a tag is set.

Both have:

tags             | [] 

This is what it looks like for me:

[stack@undercloud ~]$ glance image-show c99f990c-f05f-41ae-ac8c-934c8fa3e377
+----------------------+--------------------------------------+
| Property             | Value                                |
+----------------------+--------------------------------------+
| Property 'evacuable' | true                                 |


Also, we'd need more than just corosync.log
Please do a sos report for all nodes.
Comment 16 Oyvind Albrigtsen 2016-06-07 05:30:10 EDT
New build with fix for indent issue:
https://github.com/ClusterLabs/fence-agents/pull/78
Comment 17 Asaf Hirshberg 2016-06-08 04:18:41 EDT
Andrew, I took the bug again as the version of fence-agents was older than the "fixed in version" so my comment-13 not really relevant.. 
but just for the record, i did tagged it and showed it in comment-13
[stack@puma33 ~]$ glance image-show ce2c9c9b-223f-4808-b0ce-cec536587b85
+------------------+----------------------------------------------------------------------------------+
| Property         | Value                                                                            |
+------------------+----------------------------------------------------------------------------------+
| checksum         | 50bdc35edb03a38d91b1b071afb20a3c                                                 |
| container_format | bare                                                                             |
| created_at       | 2016-06-07T05:15:18Z                                                             |
| direct_url       | rbd://3e95f1ce-2c66-11e6-a8b8-009c02b08fc8/images/ce2c9c9b-223f-4808-b0ce-       |
|                  | cec536587b85/snap                                                                |
| disk_format      | qcow2                                                                            |
| evacuable        | true  

                
Which of the following is needed in order to test the bug? after updating just fence-agent-common and fence-agent-scsi which was a dependency the evacuation haven't accrued after the fencing action:
fence-agents-all-4.0.11-36.el7.x86_64.rpm
fence-agents-amt-ws-4.0.11-36.el7.x86_64.rpm
fence-agents-apc-4.0.11-36.el7.x86_64.rpm
fence-agents-apc-snmp-4.0.11-36.el7.x86_64.rpm
fence-agents-bladecenter-4.0.11-36.el7.x86_64.rpm
fence-agents-brocade-4.0.11-36.el7.x86_64.rpm
fence-agents-cisco-mds-4.0.11-36.el7.x86_64.rpm
fence-agents-cisco-ucs-4.0.11-36.el7.x86_64.rpm
fence-agents-common-4.0.11-36.el7.x86_64.rpm
fence-agents-compute-4.0.11-36.el7.x86_64.rpm
fence-agents-drac5-4.0.11-36.el7.x86_64.rpm
fence-agents-eaton-snmp-4.0.11-36.el7.x86_64.rpm
fence-agents-emerson-4.0.11-36.el7.x86_64.rpm
fence-agents-eps-4.0.11-36.el7.x86_64.rpm
fence-agents-hpblade-4.0.11-36.el7.x86_64.rpm
fence-agents-ibmblade-4.0.11-36.el7.x86_64.rpm
fence-agents-ifmib-4.0.11-36.el7.x86_64.rpm
fence-agents-ilo2-4.0.11-36.el7.x86_64.rpm
fence-agents-ilo-moonshot-4.0.11-36.el7.x86_64.rpm
fence-agents-ilo-mp-4.0.11-36.el7.x86_64.rpm
fence-agents-ilo-ssh-4.0.11-36.el7.x86_64.rpm
fence-agents-intelmodular-4.0.11-36.el7.x86_64.rpm
fence-agents-ipdu-4.0.11-36.el7.x86_64.rpm
fence-agents-ipmilan-4.0.11-36.el7.x86_64.rpm
fence-agents-kdump-4.0.11-36.el7.x86_64.rpm
fence-agents-mpath-4.0.11-36.el7.x86_64.rpm
fence-agents-rhevm-4.0.11-36.el7.x86_64.rpm
fence-agents-rsa-4.0.11-36.el7.x86_64.rpm
fence-agents-rsb-4.0.11-36.el7.x86_64.rpm
fence-agents-scsi-4.0.11-36.el7.x86_64.rpm
fence-agents-virsh-4.0.11-36.el7.x86_64.rpm
fence-agents-vmware-soap-4.0.11-36.el7.x86_64.rpm
fence-agents-wti-4.0.11-36.el7.x86_64.rpm
Comment 18 Andrew Beekhof 2016-06-13 23:06:49 EDT
fence-agents-common-4.0.11-36.el7.x86_64.rpm
fence-agents-compute-4.0.11-36.el7.x86_64.rpm

but it will need the "tag and" fix we talked about last week
Comment 19 Andrew Beekhof 2016-06-14 06:46:27 EDT
Here's the final patch:

    https://github.com/ClusterLabs/fence-agents/commit/90dfc11

Oyvind: Can we get a new build please?
Comment 20 Oyvind Albrigtsen 2016-06-14 07:11:30 EDT
New build with the last patch.
Comment 22 Oyvind Albrigtsen 2016-06-15 07:46:06 EDT
*** Bug 1288312 has been marked as a duplicate of this bug. ***
Comment 23 Asaf Hirshberg 2016-06-16 00:57:50 EDT
Verified on RHEL-OSP director 9.0 puddle - 2016-06-03.1 using fence-agents-4.0.11-37.el7

[stack@puma33 ~]$ nova list --fields name,status,host
+--------------------------------------+---------+--------+-------------------------------------+
| ID                                   | Name    | Status | Host                                |
+--------------------------------------+---------+--------+-------------------------------------+
| 5f5c1db1-e9db-414b-a27e-5f046ea8a5fc | vm-TAG  | ACTIVE | overcloud-novacompute-0.localdomain |
| 0a571f41-c932-4486-a32f-da2ad7e56068 | vm-reg1 | ACTIVE | overcloud-novacompute-0.localdomain |
| 3a9f4957-1fa8-4fc8-a26f-1b5737dd513d | vm-reg2 | ACTIVE | overcloud-novacompute-1.localdomain |
+--------------------------------------+---------+--------+-------------------------------------+

*** Fencing compute-0 
[root@overcloud-controller-1 ~]# pcs stonith fence overcloud-novacompute-0

[stack@puma33 ~]$ nova list --fields name,status,host
+--------------------------------------+---------+--------+-------------------------------------+
| ID                                   | Name    | Status | Host                                |
+--------------------------------------+---------+--------+-------------------------------------+
| 5f5c1db1-e9db-414b-a27e-5f046ea8a5fc | vm-TAG  | ACTIVE | overcloud-novacompute-1.localdomain |
| 0a571f41-c932-4486-a32f-da2ad7e56068 | vm-reg1 | ACTIVE | overcloud-novacompute-0.localdomain |
| 3a9f4957-1fa8-4fc8-a26f-1b5737dd513d | vm-reg2 | ACTIVE | overcloud-novacompute-1.localdomain |
+--------------------------------------+---------+--------+-------------------------------------+
[stack@puma33 ~]$ 

*** Fencing compute-0 again
[root@overcloud-controller-1 ~]# pcs stonith fence overcloud-novacompute-0

+--------------------------------------+---------+--------+-------------------------------------+
| ID                                   | Name    | Status | Host                                |
+--------------------------------------+---------+--------+-------------------------------------+
| 5f5c1db1-e9db-414b-a27e-5f046ea8a5fc | vm-TAG  | ACTIVE | overcloud-novacompute-1.localdomain |
| 0a571f41-c932-4486-a32f-da2ad7e56068 | vm-reg1 | ACTIVE | overcloud-novacompute-0.localdomain |
| 3a9f4957-1fa8-4fc8-a26f-1b5737dd513d | vm-reg2 | ACTIVE | overcloud-novacompute-1.localdomain |
+--------------------------------------+---------+--------+-------------------------------------+

* deleting the tagged instance and the tagged image, fencing compute-0 again
[root@overcloud-controller-1 ~]# pcs stonith fence overcloud-novacompute-0

[stack@puma33 ~]$ nova list --fields name,status,host
+--------------------------------------+---------+--------+-------------------------------------+
| ID                                   | Name    | Status | Host                                |
+--------------------------------------+---------+--------+-------------------------------------+
| 0a571f41-c932-4486-a32f-da2ad7e56068 | vm-reg1 | ACTIVE | overcloud-novacompute-1.localdomain |
| 3a9f4957-1fa8-4fc8-a26f-1b5737dd513d | vm-reg2 | ACTIVE | overcloud-novacompute-1.localdomain |
+--------------------------------------+---------+--------+-------------------------------------+
Comment 25 errata-xmlrpc 2016-11-04 00:48:05 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2373.html

Note You need to log in before you can comment on or make changes to this bug.