Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1580448

Summary: NovaEvacuate Evacuate does not work.
Product: Red Hat Enterprise Linux 7 Reporter: marcio r m assis <marcio.miranda.assis>
Component: pacemakerAssignee: Andrew Beekhof <abeekhof>
Status: CLOSED NOTABUG QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.7-AltCC: abeekhof, cluster-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-25 01:54:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Controller01 message logs
none
Controller02 message logs none

Description marcio r m assis 2018-05-21 13:42:58 UTC
Hi,
I am facing a problem regarding the execution of the evacuation process of VMs in a cluster that involves Openstack. The setup used is described below:

* Three hosts with pacemaker installed: haproxy, controller01, and controller02.
* Two hosts with the pacemaker-remote: compute01 and compute02.

The operating system used is CentOS 7, and the packages used are:
* pacemaker-remote 1.1.18-11.el7
* pcs              0.9.162-5.el7.centos.1
* resource-agents  3.9.5-124.el7
* pacemaker        1.1.18-11.el7
* fence-agents-all 4.0.11-86.el7
* corosync         2.4.3-2.el7_5.1

I have a test instance running on compute01.
# nova list --fields name,status,host
+--------------------------------------+-------------------+--------+-----------+
| ID                                   | Name              | Status | Host      |
+--------------------------------------+-------------------+--------+-----------+
| 06220246-34d0-4a81-9573-8e295bb1c9b0 | provider-instance | ACTIVE | compute01 |
+--------------------------------------+-------------------+--------+-----------+

The expected behavior would be to evacuate the instance to compute02. However, during the process I get the log message:
# Could not query value of evacuate: Reply had attribute but no host values.

What would be the problem?
Thank you, best regards.

The following is the configuration and logs:

###############################################################################
[root@controller01 ~]# pcs status
Cluster name: os-pacemaker-cluster
Stack: corosync
Current DC: controller01 (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum
Last updated: Wed May 16 10:55:09 2018
Last change: Wed May 16 10:44:34 2018 by root via cibadmin on controller01

5 nodes configured
29 resources configured

Online: [ controller01 controller02 haproxy ]
RemoteOnline: [ compute01 compute02 ]

Full list of resources:

 Clone Set: neutron-server-clone [neutron-server]
     Started: [ controller01 controller02 ]
     Stopped: [ compute01 compute02 haproxy ]
 nova-evacuate	(ocf::openstack:NovaEvacuate):	Started controller01
 Clone Set: neutron-linuxbridge-agent-clone [neutron-linuxbridge-agent]
     Started: [ compute01 compute02 ]
     Stopped: [ controller01 controller02 haproxy ]
 Clone Set: libvirtd-clone [libvirtd]
     Started: [ compute01 compute02 ]
     Stopped: [ controller01 controller02 haproxy ]
 Clone Set: nova-compute-checkevacuate-clone [nova-compute-checkevacuate]
     Started: [ compute01 compute02 ]
     Stopped: [ controller01 controller02 haproxy ]
 Clone Set: nova-compute-clone [nova-compute]
     Started: [ compute01 compute02 ]
     Stopped: [ controller01 controller02 haproxy ]
 fence-nova	(stonith:fence_compute):	Started controller02
 compute01	(ocf::pacemaker:remote):	Started controller01
 compute02	(ocf::pacemaker:remote):	Started controller02

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled




###############################################################################
[root@controller01 ~]# pcs constraint
Location Constraints:
  Resource: fence-nova
    Constraint: location-fence-nova (resource-discovery=never)
      Rule: score=INFINITY
        Expression: ops_role eq controller
  Resource: libvirtd-clone
    Constraint: location-libvirtd-clone (resource-discovery=exclusive)
      Rule: score=INFINITY
        Expression: ops_role eq compute
  Resource: neutron-linuxbridge-agent-clone
    Constraint: location-neutron-linuxbridge-agent-clone (resource-discovery=exclusive)
      Rule: score=INFINITY
        Expression: ops_role eq compute
  Resource: neutron-server-clone
    Constraint: location-neutron-server-clone (resource-discovery=exclusive)
      Rule: score=INFINITY
        Expression: ops_role eq controller
  Resource: nova-compute-checkevacuate-clone
    Constraint: location-nova-compute-checkevacuate-clone (resource-discovery=exclusive)
      Rule: score=INFINITY
        Expression: ops_role eq compute
  Resource: nova-compute-clone
    Constraint: location-nova-compute-clone (resource-discovery=exclusive)
      Rule: score=INFINITY
        Expression: ops_role eq compute
  Resource: nova-evacuate
    Constraint: location-nova-evacuate (resource-discovery=never)
      Rule: score=INFINITY
        Expression: ops_role eq controller
Ordering Constraints:
  start neutron-server-clone then start neutron-linuxbridge-agent-clone (kind:Mandatory) (Options: require-all=false)
  start neutron-linuxbridge-agent-clone then start libvirtd-clone (kind:Mandatory)
  start nova-compute-checkevacuate-clone then start nova-compute-clone (kind:Mandatory) (Options: require-all=true)
  start nova-compute-clone then start nova-evacuate (kind:Mandatory) (Options: require-all=false)
  start libvirtd-clone then start nova-compute-clone (kind:Mandatory)
  start fence-nova then start nova-compute-clone (kind:Mandatory)
Colocation Constraints:
  libvirtd-clone with neutron-linuxbridge-agent-clone (score:INFINITY)
  nova-compute-clone with libvirtd-clone (score:INFINITY)
Ticket Constraints:


###############################################################################
[root@controller01 ~]# pcs property
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: os-pacemaker-cluster
 cluster-recheck-interval: 1min
 dc-version: 1.1.18-11.el7-2b07d5c5a9
 have-watchdog: false
 stonith-enabled: true
Node Attributes:
 compute01: ops_role=compute stonith-enabled=false
 compute02: ops_role=compute stonith-enabled=false
 controller01: ops_role=controller stonith-enabled=false
 controller02: ops_role=controller stonith-enabled=false
 haproxy: stonith-enabled=false

###############################################################################
[root@controller01 ~]# pcs resource show nova-evacuate
 Resource: nova-evacuate (class=ocf provider=openstack type=NovaEvacuate)
  Attributes: auth_url=http://10.15.28.200:35357/v3 no_shared_storage=1 password=Sair8go0ooWae8yi project_domain=default tenant_name=admin user_domain=default username=admin
  Operations: monitor interval=10 timeout=600 (nova-evacuate-monitor-interval-10)
              start interval=0s timeout=20 (nova-evacuate-start-interval-0s)
              stop interval=0s timeout=20 (nova-evacuate-stop-interval-0s)

[root@controller01 ~]# pcs resource show nova-compute-checkevacuate-clone
 Clone: nova-compute-checkevacuate-clone
  Meta Attrs: interleave=true 
  Resource: nova-compute-checkevacuate (class=ocf provider=openstack type=nova-compute-wait)
   Attributes: auth_url=http://10.15.28.200:35357/v3 domain= no_shared_storage=1 password=Sair8go0ooWae8yi tenant_name=admin username=admin
   Operations: monitor interval=10 timeout=20 (nova-compute-checkevacuate-monitor-interval-10)
               start interval=0s timeout=600 (nova-compute-checkevacuate-start-interval-0s)
               stop interval=0s timeout=300 (nova-compute-checkevacuate-stop-interval-0s)

###############################################################################
# pcs stonith show fence-nova
 Resource: fence-nova (class=stonith type=fence_compute)
  Attributes: auth_url=http://10.15.28.200:35357/v3 login=admin passwd=Sair8go0ooWae8yi project-domain=default record_only=1 tenant_name=admin user-domain=default
  Operations: monitor interval=60s (fence-nova-monitor-interval-60s)
 Target: compute01
   Level 1 - fence-nova
 Target: compute02
   Level 1 - fence-nova

###############################################################################
[root@compute01 ~]# systemctl status openstack-nova-compute
● openstack-nova-compute.service - Cluster Controlled openstack-nova-compute
   Loaded: loaded (/usr/lib/systemd/system/openstack-nova-compute.service; enabled; vendor preset: disabled)
  Drop-In: /run/systemd/system/openstack-nova-compute.service.d
           └─50-pacemaker.conf, unfence-20.conf
   Active: active (running) since Qua 2018-05-16 10:44:52 -03; 22min ago
  Process: 10856 ExecStartPost=/sbin/fence_compute -k http://10.15.28.200:35357/v3 -l admin -p Sair8go0ooWae8yi -t admin --no-shared-storage -o on -n compute01 (code=exited, status=0/SUCCESS)
 Main PID: 10814 (nova-compute)
   CGroup: /system.slice/openstack-nova-compute.service
           ├─10814 /usr/bin/python2 /usr/bin/nova-compute
           └─10906 /usr/bin/python2 /bin/privsep-helper --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --config-file /etc/nova/nova-compute.con...

Mai 16 10:44:38 compute01 systemd[1]: Starting Cluster Controlled openstack-nova-compute...
Mai 16 10:44:47 compute01 sudo[10893]:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/nova-rootwrap /etc/nova/rootwrap.conf privsep-helper --config-file /usr/share/...
Mai 16 10:44:52 compute01 systemd[1]: Started Cluster Controlled openstack-nova-compute.
Hint: Some lines were ellipsized, use -l to show in full.



[root@compute02 ~]# systemctl status openstack-nova-compute
● openstack-nova-compute.service - Cluster Controlled openstack-nova-compute
   Loaded: loaded (/usr/lib/systemd/system/openstack-nova-compute.service; enabled; vendor preset: disabled)
  Drop-In: /run/systemd/system/openstack-nova-compute.service.d
           └─50-pacemaker.conf, unfence-20.conf
   Active: active (running) since Qua 2018-05-16 10:44:52 -03; 21min ago
  Process: 5361 ExecStartPost=/sbin/fence_compute -k http://10.15.28.200:35357/v3 -l admin -p Sair8go0ooWae8yi -t admin --no-shared-storage -o on -n compute02 (code=exited, status=0/SUCCESS)
 Main PID: 5321 (nova-compute)
   CGroup: /system.slice/openstack-nova-compute.service
           └─5321 /usr/bin/python2 /usr/bin/nova-compute

Mai 16 10:44:38 compute02 systemd[1]: Starting Cluster Controlled openstack-nova-compute...
Mai 16 10:44:52 compute02 systemd[1]: Started Cluster Controlled openstack-nova-compute.

Comment 2 Andrew Beekhof 2018-05-22 01:10:22 UTC
I single log message with zero context is insufficient as a bug report.
Please include sos reports from all controllers as well as the method of crashing the compute and the time it occurred.

Comment 3 marcio r m assis 2018-05-23 18:06:05 UTC
Hi,
sorry by lack of information. To execute de crash I used:
echo 'c' > /proc/sysrq-trigger

After this, the controller found the event but can't excute the evacuate. In the controller01 /var/log/message is posted:
[ Could not query value of evacuate: reply had attribute name but no host values ]

It only happy when I use the pacemaker-remote. With only pacemaker all is fine.

Comment 4 marcio r m assis 2018-05-23 19:37:29 UTC
Created attachment 1440744 [details]
Controller01 message logs

Log from controller01 machine

Comment 5 marcio r m assis 2018-05-23 19:38:31 UTC
Created attachment 1440745 [details]
Controller02 message logs

Log from controller02 machine

Comment 6 Andrew Beekhof 2018-05-25 01:54:23 UTC
Looks like NovaEvacuate is working to me:

May 21 13:47:52 controller01 NovaEvacuate(nova-evacuate)[14644]: NOTICE: Initiating evacuation of compute01 with fence_evacuate
May 21 13:48:02 controller01 NovaEvacuate(nova-evacuate)[14644]: NOTICE: Completed evacuation of compute01

but later:

May 21 15:30:39 controller01 fence_evacuate: Evacuation of 06220246-34d0-4a81-9573-8e295bb1c9b0 on compute01 failed: Error while evacuating instance: Cannot 'evacuate' instance 06220246-34d0-4a81-9573-8e295bb1c9b0 while it is in task_state powering-off (HTTP 409) (Request-ID: req-bbfbf6ad-437d-4189-b438-21db70ff4edb)
May 21 15:30:39 controller01 fence_evacuate: Resurrection of instances from compute01 failed
May 21 15:30:39 controller01 NovaEvacuate(nova-evacuate)[12553]: WARNING: Evacuation of compute01 failed: 1

Ohhhh, i see now. This is wrong:

Target: compute01
   Level 1 - fence-nova
 Target: compute02
   Level 1 - fence-nova

You need to include power fencing as well