Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1356777

Summary: rhel-osp-director: scale down of computes fails after upgrade 8.0->9.0, some resources are unmanaged/stopped on controllers.
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: rhosp-directorAssignee: Ben Nemec <bnemec>
Status: CLOSED WORKSFORME QA Contact: Omri Hochman <ohochman>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 9.0 (Mitaka)CC: bnemec, dbecker, jason.dobies, jcoufal, mburns, mcornea, morazi, rhel-osp-director-maint, sasha, sclewis, tvignaud
Target Milestone: gaKeywords: Triaged
Target Release: 9.0 (Mitaka)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-02 13:19:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alexander Chuzhoy 2016-07-15 03:19:07 UTC
rhel-osp-director:  scale down of computes fails after upgrade 8.0->9.0


Environment:
openstack-tripleo-heat-templates-2.0.0-15.el7ost.noarch
openstack-tripleo-heat-templates-liberty-2.0.0-15.el7ost.noarch
openstack-tripleo-heat-templates-kilo-2.0.0-15.el7ost.noarch
instack-undercloud-4.0.0-7.el7ost.noarch
openstack-puppet-modules-8.1.2-1.el7ost.noarch


Steps to reproduce:
1. Deploy 8.0 with:
openstack overcloud deploy --templates --control-scale 3 --compute-scale 2 --ceph-storage-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server 10.5.26.10 --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml

2. Upgrade to 9.0 (including updating the images for OC nodes).



3. Try to scale down the computes with:
openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --ceph-storage-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server 10.5.26.10 --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml





Result:
2016-06-30 02:22:13 [overcloud-ComputeAllNodesValidationDeployment-rxtdysawxz72]: UPDATE_COMPLETE Stack UPDATE completed successfully                                                                   
2016-06-30 02:22:14 [ComputeAllNodesValidationDeployment]: UPDATE_COMPLETE state changed                                                                                                                
2016-06-30 02:53:15 [2]: SIGNAL_COMPLETE Unknown                                                                                                                                                        
2016-06-30 02:53:22 [1]: SIGNAL_COMPLETE Unknown                                                                                                                                                        
Stack overcloud UPDATE_FAILED                                                                                                                                                                           
Deployment failed:  Heat Stack update failed. 




pcs status outputs the following:
[root@overcloud-controller-0 ~]# pcs status  
Cluster name: tripleo_cluster                
Last updated: Fri Jul 15 03:18:01 2016          Last change: Fri Jul 15 01:50:34 2016 by root via cibadmin on overcloud-controller-0
Stack: corosync                                                                                                                     
Current DC: overcloud-controller-2 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum                                      
3 nodes and 127 resources configured                                                                                                

Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Full list of resources:

 ip-192.168.200.10      (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0 (unmanaged)
 ip-10.19.94.10 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1 (unmanaged)        
 ip-10.19.95.10 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-2 (unmanaged)        
 Clone Set: haproxy-clone [haproxy] (unmanaged)                                                   
     haproxy    (systemd:haproxy):      Started overcloud-controller-0 (unmanaged)                
     haproxy    (systemd:haproxy):      Started overcloud-controller-2 (unmanaged)                
     haproxy    (systemd:haproxy):      Started overcloud-controller-1 (unmanaged)                
 ip-192.168.0.6 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0 (unmanaged)        
 Master/Slave Set: galera-master [galera] (unmanaged)                                             
     galera     (ocf::heartbeat:galera):        FAILED Master overcloud-controller-0 (unmanaged)  
     galera     (ocf::heartbeat:galera):        Started overcloud-controller-2 (unmanaged)        
     galera     (ocf::heartbeat:galera):        Started overcloud-controller-1 (unmanaged)        
 Clone Set: memcached-clone [memcached] (unmanaged)                                               
     memcached  (systemd:memcached):    Started overcloud-controller-0 (unmanaged)                
     memcached  (systemd:memcached):    Started overcloud-controller-2 (unmanaged)                
     memcached  (systemd:memcached):    Started overcloud-controller-1 (unmanaged)                
 ip-10.19.94.11 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1 (unmanaged)        
 ip-10.19.184.180       (ocf::heartbeat:IPaddr2):       Started overcloud-controller-2 (unmanaged)
 Clone Set: rabbitmq-clone [rabbitmq] (unmanaged)                                                 
     rabbitmq   (ocf::heartbeat:rabbitmq-cluster):      Started overcloud-controller-0 (unmanaged)
     rabbitmq   (ocf::heartbeat:rabbitmq-cluster):      Started overcloud-controller-2 (unmanaged)
     rabbitmq   (ocf::heartbeat:rabbitmq-cluster):      Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-core-clone [openstack-core] (unmanaged)                                     
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]            
 Master/Slave Set: redis-master [redis] (unmanaged)                                               
     redis      (ocf::heartbeat:redis): Master overcloud-controller-0 (unmanaged)                 
     redis      (ocf::heartbeat:redis): Started overcloud-controller-2 (unmanaged)                
     redis      (ocf::heartbeat:redis): Started overcloud-controller-1 (unmanaged)                
 Clone Set: mongod-clone [mongod] (unmanaged)                                                     
     mongod     (systemd:mongod):       Started overcloud-controller-0 (unmanaged)                
     mongod     (systemd:mongod):       Started overcloud-controller-2 (unmanaged)                
     mongod     (systemd:mongod):       Started overcloud-controller-1 (unmanaged)                
 Clone Set: openstack-aodh-evaluator-clone [openstack-aodh-evaluator] (unmanaged)                 
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]            
 Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler] (unmanaged)                 
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]            
 Clone Set: neutron-l3-agent-clone [neutron-l3-agent] (unmanaged)                                 
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]            
 Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup] (unmanaged)                       
     neutron-netns-cleanup      (ocf::neutron:NetnsCleanup):    Started overcloud-controller-0 (unmanaged)
     neutron-netns-cleanup      (ocf::neutron:NetnsCleanup):    Started overcloud-controller-2 (unmanaged)
     neutron-netns-cleanup      (ocf::neutron:NetnsCleanup):    Started overcloud-controller-1 (unmanaged)
 Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup] (unmanaged)                                   
     neutron-ovs-cleanup        (ocf::neutron:OVSCleanup):      Started overcloud-controller-0 (unmanaged)
     neutron-ovs-cleanup        (ocf::neutron:OVSCleanup):      Started overcloud-controller-2 (unmanaged)
     neutron-ovs-cleanup        (ocf::neutron:OVSCleanup):      Started overcloud-controller-1 (unmanaged)
 openstack-cinder-volume        (systemd:openstack-cinder-volume):      Stopped (unmanaged)               
 Clone Set: openstack-heat-engine-clone [openstack-heat-engine] (unmanaged)                               
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]                    
 Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api] (unmanaged)                         
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]                    
 Clone Set: openstack-aodh-listener-clone [openstack-aodh-listener] (unmanaged)                           
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]                    
 Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent] (unmanaged)                             
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]                    
 Clone Set: openstack-gnocchi-metricd-clone [openstack-gnocchi-metricd] (unmanaged)                       
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]                    
 Clone Set: openstack-aodh-notifier-clone [openstack-aodh-notifier] (unmanaged)                           
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]                    
 Clone Set: openstack-heat-api-clone [openstack-heat-api] (unmanaged)                                     
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]                    
 Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector] (unmanaged)             
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]                    
 Clone Set: openstack-glance-api-clone [openstack-glance-api] (unmanaged)                                 
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]                    
 Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler] (unmanaged)                     
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]                    
 Clone Set: openstack-nova-api-clone [openstack-nova-api] (unmanaged)                                     
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]                    
 Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth] (unmanaged)                     
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]                    
 Clone Set: openstack-sahara-api-clone [openstack-sahara-api] (unmanaged)                                 
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]                    
 Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch] (unmanaged)               
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]                    
 Clone Set: openstack-sahara-engine-clone [openstack-sahara-engine] (unmanaged)                           
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]                    
 Clone Set: openstack-glance-registry-clone [openstack-glance-registry] (unmanaged)                       
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]                    
 Clone Set: openstack-gnocchi-statsd-clone [openstack-gnocchi-statsd] (unmanaged)                         
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]                    
 Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification] (unmanaged)
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-cinder-api-clone [openstack-cinder-api] (unmanaged)
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent] (unmanaged)
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent] (unmanaged)
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy] (unmanaged)
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: delay-clone [delay] (unmanaged)
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central] (unmanaged)
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: httpd-clone [httpd] (unmanaged)
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] (unmanaged)
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor] (unmanaged)
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-server-clone [neutron-server] (unmanaged)
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Failed Actions:
* galera_promote_0 on overcloud-controller-0 'unknown error' (1): call=35, status=complete, exitreason='Failed initial monitor action',
    last-rc-change='Thu Jul 14 23:09:32 2016', queued=0ms, exec=8811ms
* openstack-nova-scheduler_start_0 on overcloud-controller-0 'OCF_TIMEOUT' (198): call=102, status=Timed Out, exitreason='none',
    last-rc-change='Fri Jul 15 00:00:41 2016', queued=0ms, exec=199981ms
* openstack-nova-scheduler_start_0 on overcloud-controller-2 'OCF_TIMEOUT' (198): call=101, status=Timed Out, exitreason='none',
    last-rc-change='Fri Jul 15 00:00:41 2016', queued=0ms, exec=199993ms
* openstack-nova-scheduler_start_0 on overcloud-controller-1 'OCF_TIMEOUT' (198): call=102, status=Timed Out, exitreason='none',
    last-rc-change='Fri Jul 15 00:00:41 2016', queued=0ms, exec=199992ms


PCSD Status:
  overcloud-controller-0: Online
  overcloud-controller-1: Online
  overcloud-controller-2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Comment 5 Ben Nemec 2016-07-22 22:25:57 UTC
Was this deployment in a sane state before the upgrade/scale attempt?  I'm seeing issues starting galera right away in the logs, which makes me think the initial deployment failed.  At that point I wouldn't expect anything else to work.

Comment 9 Jay Dobies 2016-08-02 13:19:24 UTC
Based on Ben's comment and the inability to reproduce, closing this out. Please feel free to reopen if this issue arises again.

Comment 10 Alexander Chuzhoy 2016-09-15 21:18:09 UTC
Clearing the needinfo for now.