Description of problem: In RHEL OpenStack Platform 8 we wish to include support in the fence compute (Nova) agent support for: - The new OpenStack mark host down API call, where available, for telling NOva the host is down. - Only evacuating instances that are marked evacuable in their associated image properties or flavor extra specifications. Initial upstream work has been done: https://github.com/ClusterLabs/fence-agents/commit/e3d7ccd652edb5d0bd60c210b029ce73c5fa27e9 https://github.com/ClusterLabs/fence-agents/commit/d9f2f483611253ece7fb097a9119de20a22d9111 ...but an additional change is required to ensure *only* those instances that were marked evacuable are evacuated. Currently if *no* instances are found to be marked evacuable then *all* instances are considered evacuable which is not the behaviour we want out of the box (for those that do want all instances evacuated, tagging them as such is a simple matter).
Additional patch: https://github.com/beekhof/fence-agents/commit/06f592e
Merged upstream: https://github.com/ClusterLabs/fence-agents/commit/e4599e4 7.2 packages available at: http://people.redhat.com/abeekhof/instance-ha/
One last commit (to allow everything to be disabled): https://github.com/ClusterLabs/fence-agents/commit/cdbdd93 Marek: Could we get a build with these commits please?
New build with the last commits.
Failed using tagged image method, uploaded 2 cirros images and tagged one as "evacuable=true" then fenced one compute. corosync.log from the controller is attached. [stack@puma33 ~]$ glance image-show ce2c9c9b-223f-4808-b0ce-cec536587b85 +------------------+----------------------------------------------------------------------------------+ | Property | Value | +------------------+----------------------------------------------------------------------------------+ | checksum | 50bdc35edb03a38d91b1b071afb20a3c | | container_format | bare | | created_at | 2016-06-07T05:15:18Z | | direct_url | rbd://3e95f1ce-2c66-11e6-a8b8-009c02b08fc8/images/ce2c9c9b-223f-4808-b0ce- | | | cec536587b85/snap | | disk_format | qcow2 | | evacuable | true | | id | ce2c9c9b-223f-4808-b0ce-cec536587b85 | | min_disk | 0 | | min_ram | 0 | | name | cirros-tag | | owner | 155fd4c559e842c98f037d9f3257e8c5 | | protected | False | | size | 9761280 | | status | active | | tags | [] | | updated_at | 2016-06-07T05:16:03Z | | virtual_size | None | | visibility | private | +------------------+----------------------------------------------------------------------------------+ [stack@puma33 ~]$ glance image-show bbb0b3d8-1dcf-4ab7-bd86-1139f213b7c7 +------------------+----------------------------------------------------------------------------------+ | Property | Value | +------------------+----------------------------------------------------------------------------------+ | checksum | 50bdc35edb03a38d91b1b071afb20a3c | | container_format | bare | | created_at | 2016-06-07T05:15:08Z | | direct_url | rbd://3e95f1ce-2c66-11e6-a8b8-009c02b08fc8/images/bbb0b3d8-1dcf- | | | 4ab7-bd86-1139f213b7c7/snap | | disk_format | qcow2 | | id | bbb0b3d8-1dcf-4ab7-bd86-1139f213b7c7 | | min_disk | 0 | | min_ram | 0 | | name | cirros | | owner | 155fd4c559e842c98f037d9f3257e8c5 | | protected | False | | size | 9761280 | | status | active | | tags | [] | | updated_at | 2016-06-07T05:15:11Z | | virtual_size | None | | visibility | private | +------------------+----------------------------------------------------------------------------------+ [stack@puma33 ~]$ from the controller: [root@overcloud-controller-0 ~]# pcs stonith fence overcloud-novacompute-1 Node: overcloud-novacompute-1 fenced [stack@puma33 ~]$ nova list --fields name,status,host +--------------------------------------+------------+--------+-------------------------------------+ | ID | Name | Status | Host | +--------------------------------------+------------+--------+-------------------------------------+ | 659e9c99-dedb-494b-a4b0-c81680ca2c22 | vm-regular | ACTIVE | overcloud-novacompute-0.localdomain | | 595da4aa-5489-4eb7-99f2-eda275692423 | vm-tag | ACTIVE | overcloud-novacompute-0.localdomain | +--------------------------------------+------------+--------+-------------------------------------+ [stack@puma33 ~]$ nova list --fields name,status,host +--------------------------------------+------------+--------+-------------------------------------+ | ID | Name | Status | Host | +--------------------------------------+------------+--------+-------------------------------------+ | 659e9c99-dedb-494b-a4b0-c81680ca2c22 | vm-regular | ACTIVE | overcloud-novacompute-1.localdomain | | 595da4aa-5489-4eb7-99f2-eda275692423 | vm-tag | ACTIVE | overcloud-novacompute-1.localdomain | +--------------------------------------+------------+--------+-------------------------------------+ [stack@puma33 ~]$
Created attachment 1165487 [details] corosync log from controller-0 from the moment of fencing
You say 'uploaded 2 cirros images and tagged one as "evacuable=true" ' but neither of the glance image-show commands indicate a tag is set. Both have: tags | [] This is what it looks like for me: [stack@undercloud ~]$ glance image-show c99f990c-f05f-41ae-ac8c-934c8fa3e377 +----------------------+--------------------------------------+ | Property | Value | +----------------------+--------------------------------------+ | Property 'evacuable' | true | Also, we'd need more than just corosync.log Please do a sos report for all nodes.
New build with fix for indent issue: https://github.com/ClusterLabs/fence-agents/pull/78
Andrew, I took the bug again as the version of fence-agents was older than the "fixed in version" so my comment-13 not really relevant.. but just for the record, i did tagged it and showed it in comment-13 [stack@puma33 ~]$ glance image-show ce2c9c9b-223f-4808-b0ce-cec536587b85 +------------------+----------------------------------------------------------------------------------+ | Property | Value | +------------------+----------------------------------------------------------------------------------+ | checksum | 50bdc35edb03a38d91b1b071afb20a3c | | container_format | bare | | created_at | 2016-06-07T05:15:18Z | | direct_url | rbd://3e95f1ce-2c66-11e6-a8b8-009c02b08fc8/images/ce2c9c9b-223f-4808-b0ce- | | | cec536587b85/snap | | disk_format | qcow2 | | evacuable | true Which of the following is needed in order to test the bug? after updating just fence-agent-common and fence-agent-scsi which was a dependency the evacuation haven't accrued after the fencing action: fence-agents-all-4.0.11-36.el7.x86_64.rpm fence-agents-amt-ws-4.0.11-36.el7.x86_64.rpm fence-agents-apc-4.0.11-36.el7.x86_64.rpm fence-agents-apc-snmp-4.0.11-36.el7.x86_64.rpm fence-agents-bladecenter-4.0.11-36.el7.x86_64.rpm fence-agents-brocade-4.0.11-36.el7.x86_64.rpm fence-agents-cisco-mds-4.0.11-36.el7.x86_64.rpm fence-agents-cisco-ucs-4.0.11-36.el7.x86_64.rpm fence-agents-common-4.0.11-36.el7.x86_64.rpm fence-agents-compute-4.0.11-36.el7.x86_64.rpm fence-agents-drac5-4.0.11-36.el7.x86_64.rpm fence-agents-eaton-snmp-4.0.11-36.el7.x86_64.rpm fence-agents-emerson-4.0.11-36.el7.x86_64.rpm fence-agents-eps-4.0.11-36.el7.x86_64.rpm fence-agents-hpblade-4.0.11-36.el7.x86_64.rpm fence-agents-ibmblade-4.0.11-36.el7.x86_64.rpm fence-agents-ifmib-4.0.11-36.el7.x86_64.rpm fence-agents-ilo2-4.0.11-36.el7.x86_64.rpm fence-agents-ilo-moonshot-4.0.11-36.el7.x86_64.rpm fence-agents-ilo-mp-4.0.11-36.el7.x86_64.rpm fence-agents-ilo-ssh-4.0.11-36.el7.x86_64.rpm fence-agents-intelmodular-4.0.11-36.el7.x86_64.rpm fence-agents-ipdu-4.0.11-36.el7.x86_64.rpm fence-agents-ipmilan-4.0.11-36.el7.x86_64.rpm fence-agents-kdump-4.0.11-36.el7.x86_64.rpm fence-agents-mpath-4.0.11-36.el7.x86_64.rpm fence-agents-rhevm-4.0.11-36.el7.x86_64.rpm fence-agents-rsa-4.0.11-36.el7.x86_64.rpm fence-agents-rsb-4.0.11-36.el7.x86_64.rpm fence-agents-scsi-4.0.11-36.el7.x86_64.rpm fence-agents-virsh-4.0.11-36.el7.x86_64.rpm fence-agents-vmware-soap-4.0.11-36.el7.x86_64.rpm fence-agents-wti-4.0.11-36.el7.x86_64.rpm
fence-agents-common-4.0.11-36.el7.x86_64.rpm fence-agents-compute-4.0.11-36.el7.x86_64.rpm but it will need the "tag and" fix we talked about last week
Here's the final patch: https://github.com/ClusterLabs/fence-agents/commit/90dfc11 Oyvind: Can we get a new build please?
New build with the last patch.
*** Bug 1288312 has been marked as a duplicate of this bug. ***
Verified on RHEL-OSP director 9.0 puddle - 2016-06-03.1 using fence-agents-4.0.11-37.el7 [stack@puma33 ~]$ nova list --fields name,status,host +--------------------------------------+---------+--------+-------------------------------------+ | ID | Name | Status | Host | +--------------------------------------+---------+--------+-------------------------------------+ | 5f5c1db1-e9db-414b-a27e-5f046ea8a5fc | vm-TAG | ACTIVE | overcloud-novacompute-0.localdomain | | 0a571f41-c932-4486-a32f-da2ad7e56068 | vm-reg1 | ACTIVE | overcloud-novacompute-0.localdomain | | 3a9f4957-1fa8-4fc8-a26f-1b5737dd513d | vm-reg2 | ACTIVE | overcloud-novacompute-1.localdomain | +--------------------------------------+---------+--------+-------------------------------------+ *** Fencing compute-0 [root@overcloud-controller-1 ~]# pcs stonith fence overcloud-novacompute-0 [stack@puma33 ~]$ nova list --fields name,status,host +--------------------------------------+---------+--------+-------------------------------------+ | ID | Name | Status | Host | +--------------------------------------+---------+--------+-------------------------------------+ | 5f5c1db1-e9db-414b-a27e-5f046ea8a5fc | vm-TAG | ACTIVE | overcloud-novacompute-1.localdomain | | 0a571f41-c932-4486-a32f-da2ad7e56068 | vm-reg1 | ACTIVE | overcloud-novacompute-0.localdomain | | 3a9f4957-1fa8-4fc8-a26f-1b5737dd513d | vm-reg2 | ACTIVE | overcloud-novacompute-1.localdomain | +--------------------------------------+---------+--------+-------------------------------------+ [stack@puma33 ~]$ *** Fencing compute-0 again [root@overcloud-controller-1 ~]# pcs stonith fence overcloud-novacompute-0 +--------------------------------------+---------+--------+-------------------------------------+ | ID | Name | Status | Host | +--------------------------------------+---------+--------+-------------------------------------+ | 5f5c1db1-e9db-414b-a27e-5f046ea8a5fc | vm-TAG | ACTIVE | overcloud-novacompute-1.localdomain | | 0a571f41-c932-4486-a32f-da2ad7e56068 | vm-reg1 | ACTIVE | overcloud-novacompute-0.localdomain | | 3a9f4957-1fa8-4fc8-a26f-1b5737dd513d | vm-reg2 | ACTIVE | overcloud-novacompute-1.localdomain | +--------------------------------------+---------+--------+-------------------------------------+ * deleting the tagged instance and the tagged image, fencing compute-0 again [root@overcloud-controller-1 ~]# pcs stonith fence overcloud-novacompute-0 [stack@puma33 ~]$ nova list --fields name,status,host +--------------------------------------+---------+--------+-------------------------------------+ | ID | Name | Status | Host | +--------------------------------------+---------+--------+-------------------------------------+ | 0a571f41-c932-4486-a32f-da2ad7e56068 | vm-reg1 | ACTIVE | overcloud-novacompute-1.localdomain | | 3a9f4957-1fa8-4fc8-a26f-1b5737dd513d | vm-reg2 | ACTIVE | overcloud-novacompute-1.localdomain | +--------------------------------------+---------+--------+-------------------------------------+
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2373.html