Bug 1289260
Summary: | rhel-osp-director: 7.1 GA openstack-heat resources are down on 1 controller in HA deployment | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> | ||||
Component: | rhosp-director | Assignee: | Marios Andreou <mandreou> | ||||
Status: | CLOSED WORKSFORME | QA Contact: | yeylon <yeylon> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.0 (Kilo) | CC: | jcoufal, jslagle, mburns, rhel-osp-director-maint, sasha, srevivo | ||||
Target Milestone: | y2 | ||||||
Target Release: | 7.0 (Kilo) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2015-12-09 19:00:46 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Alexander Chuzhoy
2015-12-07 18:52:11 UTC
Created attachment 1103338 [details]
/var/log dir from one controller.
Reproduced on the same env. W/A: [root@overcloud-controller-0 ~]# pcs resource cleanup openstack-heat-engine [root@overcloud-controller-0 ~]# pcs resource cleanup openstack-heat-api [root@overcloud-controller-0 ~]# pcs resource cleanup openstack-heat-cloudwatch After that al resources appear as started on all controllers. in anticipation of probably having to get the HA folks involved with this one, can you run a crm_report on each controller node, and either attach here or upload it somewhere for review? besides the heat resources, are the neutron resources behaving OK? i.e. in the pcs status output what are the neutron services/agents like... i see a lot of restarts for neutron-server in attached messages and *I believe* this environment has the "older" neutron constraints like: pcs constraint order show | grep neutron would include: start neutron-server-clone then start neutron-ovs-cleanup-clone (kind:Mandatory) I am getting ready to call it a day but will pickup tomorrow with any added info you can provide, thanks Hi sasha... I just had a go at reproducing this and was somewhat successful in that IO *did* see some services stopped in pcs status after 'Overcloud Deployed' was declared. However on my (virt) environment after a couple of minutes I get a clean pcs status. Does pacemaker eventually manage to bring eveything up on your env if you don't run the resource cleanup (for me was in the order of 2-4 minutes). More info on my env below. I have a 7.1 environment... my packages seem to match what you have listed - my env is 2015-10-05.1 puddle using the 2015-10-05.1 images I didn't see this on my initial deploy with just 3 control 2 compute. When I added 3 ceph like: openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --ceph-storage-scale 3 --libvirt-type qemu -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --ntp-server "0.fedora.pool.ntp.org" As soon as I saw the 'Overcloud Deployed', from controller 0 i see quite a few things stopped (including the heat resources you mention): [root@overcloud-controller-0 heat-admin]# pcs status | grep -ni stop -C 3 26- Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 27- Clone Set: neutron-l3-agent-clone [neutron-l3-agent] 28- Started: [ overcloud-controller-1 overcloud-controller-2 ] 29: Stopped: [ overcloud-controller-0 ] 30- Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier] 31: Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 32- Clone Set: openstack-heat-engine-clone [openstack-heat-engine] 33: Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 34- Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api] 35- Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 36- Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent] 37- Started: [ overcloud-controller-1 overcloud-controller-2 ] 38: Stopped: [ overcloud-controller-0 ] 39- Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup] 40- Started: [ overcloud-controller-1 overcloud-controller-2 ] 41: Stopped: [ overcloud-controller-0 ] 42- Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup] 43- Started: [ overcloud-controller-1 overcloud-controller-2 ] 44: Stopped: [ overcloud-controller-0 ] 45- Clone Set: openstack-heat-api-clone [openstack-heat-api] 46: Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 47- Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler] 48- Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 49- Clone Set: openstack-nova-api-clone [openstack-nova-api] 50- Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 51- Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch] 52: Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 53- Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector] 54- Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 55- Clone Set: openstack-keystone-clone [openstack-keystone] -- 59- Clone Set: openstack-glance-registry-clone [openstack-glance-registry] 60- Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 61- Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification] 62: Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 63- Clone Set: openstack-cinder-api-clone [openstack-cinder-api] 64- Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 65- Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent] 66- Started: [ overcloud-controller-1 overcloud-controller-2 ] 67: Stopped: [ overcloud-controller-0 ] 68- Clone Set: openstack-glance-api-clone [openstack-glance-api] 69- Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 70- Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent] 71- Started: [ overcloud-controller-1 overcloud-controller-2 ] 72: Stopped: [ overcloud-controller-0 ] 73- Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy] 74- Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 75- Clone Set: delay-clone [delay] -- 82- Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central] 83- Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 84- Clone Set: openstack-ceilometer-alarm-evaluator-clone [openstack-ceilometer-alarm-evaluator] 85: Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 86- Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] 87: Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 88- openstack-cinder-volume (systemd:openstack-cinder-volume): Started overcloud-controller-1 89- Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor] 90- Started: [ overcloud-controller-0 overcloud-controller-2 ] 91: Stopped: [ overcloud-controller-1 ] 92- 93-Failed Actions: 94-* neutron-openvswitch-agent_monitor_60000 on overcloud-controller-0 'not running' (7): call=212, status=complete, exitreason='none', but within about 2/3 minutes this cleared up and i got a 'green' pcs status. My undercloud packages fyi [stack@instack ~]$ rpm -qa | grep "heat\|instack" openstack-tripleo-heat-templates-0.8.6-71.el7ost.noarch openstack-heat-engine-2015.1.1-5.el7ost.noarch openstack-heat-api-cloudwatch-2015.1.1-5.el7ost.noarch openstack-heat-templates-0-0.6.20150605git.el7ost.noarch python-heatclient-0.6.0-1.el7ost.noarch instack-0.0.7-1.el7ost.noarch openstack-heat-api-2015.1.1-5.el7ost.noarch openstack-heat-common-2015.1.1-5.el7ost.noarch instack-undercloud-2.1.2-29.el7ost.noarch heat-cfntools-1.2.8-2.el7.noarch openstack-heat-api-cfn-2015.1.1-5.el7ost.noarch thanks, marios I just checked another deployment untouched for a few hours. No stopped resources. |