Bug 1238117
Summary: | Possible race condition causing neutron to have bad configuration state | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Graeme Gillies <ggillies> | ||||
Component: | openstack-tripleo-heat-templates | Assignee: | Marios Andreou <mandreou> | ||||
Status: | CLOSED ERRATA | QA Contact: | Itzik Brown <itbrown> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.0 (Kilo) | CC: | calfonso, gfidente, gkeegan, itbrown, jason.dobies, mandreou, mburns, mcornea, rbiba, rhel-osp-director-maint, rrosa, tfreger | ||||
Target Milestone: | ga | Keywords: | Triaged | ||||
Target Release: | Director | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | python-rdomanager-oscplugin-0.0.8-37.el7ost openstack-tripleo-heat-templates-0.8.6-40.el7ost | Doc Type: | Bug Fix | ||||
Doc Text: |
Previously, OpenStack was using the NeutronScale puppet resource that was enabled on controller nodes and tasked with rewriting the neutron agents' "host" entries to look like "neutron-n-0" on controller 0 or "neutron-n-1" on controller 1. This renaming was done toward the end of the deployment, when the corresponding neutron-scale resource was started by pacemaker. Mostly reported in VM environments, neutron would subsequently complain about not having enough L3 agents for L3 HA, and there would be inconsistency in the overcloud neutron agent-list. Consequently, in some cases, the error manifested itself in an error message from Neutron that there were not enough L3 agents to provide HA (the default minimum of 2). The "neutron agent-list" command on the overcloud would show inconsistency in the agents; for example, duplicate entries for each agent with both the original agent on host "overcloud-controller-1.localdomain" (typically shown "XXX") and the "newer" agent on host "neutron-n-1" (alive status ":-)", or at least eventually). In other cases, agent renaming would cause one of the neutron agents, openvswitch, to fail when there was only one controller, and then the rest of the agents under it would also fail to start as they were chained, resulting in no L3, metadata, or dhcp agents.
This problem has been fixed by ensuring that the native neutron L3 High Availability is used, and that enough DHCP agents per network are enabled for native neutron HA. The latter is a needed addition as it was previously statically set at two in all cases. This was added as a configurable parameter in the tripleo heat templates with a default value of '3' and also wired up to deploy in the oscplugin. The NeutronScale resource itself has been removed from the tripleo heat templates where the overcloud controller puppet manifest is kept. As a result, deployments made after this fix will not have the neutron-scale resource on controller nodes, which can be verified by the following commands:
1. On a controller node:
# pcs status | grep -n neutron -A 1
You should not see any "neutron-scale" clone set or resource definition.
2. On the undercloud:
$ source overcloudrc
$ neutron agent-list
All the neutron agents should be reported as being on a host with a name like "overcloud-controller-0.localdomain" or "overcloud-controller-2.localdomain" but not "neutron-n-0" or "neutron-n-2".
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2015-08-05 13:57:58 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1236578, 1237144, 1238750 | ||||||
Attachments: |
|
Description
Graeme Gillies
2015-07-01 08:58:41 UTC
NeutronScale is editing the neutron host setting but it might happen after some of the agents were started (due to start constraints provisioned after the resource is created) Giulio if this is a case of what/where neutron was started before pacemaker, how about the -> we added to get over the neutron-server startup race? I don't think this is a bug. I mean, I don't think there is a problem here with the naming of the l3 agents in neutron agent-list or with (eventually) the state of overcloud neutron in general. If I am wrong there is a path forward but I need that feedback asap, details below. Thanks! We are using NeutronScale. As Graeme points out, its sole purpose is to change the host entry in the various neutron.conf/ini files on a given host [5]. I *think* as long as all agents have the same value there, you can pretty much set it to whatever you want [1][2] (it certainly doesn't have to be a fqdn, consider what we have before NeutronScale, like "overcloud-compute-0.localdomain" on a compute host for example). If you waited a minute, the agents would all switch over and all is well again, like the example output below at [7] - note only the compute openvswitch agent retains the original host setting, since it isn't running NeutronScale. NeutronScale is important not just for setting the agents' host entry but because of the enforced constraints in the relevant pacemaker manifest [3] - it goes like keystone -> neutron-server -> neutron-scale -> neutron-ovs-cleanup -> neutron-ns-cleanup -> openvswitch-agent -> dhcp -> l3 -> metadata - so neutron-scale comes first after the server in the chain to startup all the agents (colocations etc). As I said above, I *think* all is well, assuming you've waited long enough for the agents to settle down after deploy (and critically for NeutronScale to startup on all the controllers and then the agents), usually within a minute (see for example the output at the related bug [6] comments #11 and #13 I think that is showing this inconsistent state the agents find themselves in, whilst neutron-scale starts up). I believe once that happens, overcloud neutron is functioning correctly. I haven't poked too much (I was able to do basic operations) somebody *please correct me* if this is not the case. If using the NeutronScale given host names for agents does not cause any problems, then perhaps the fix at [4] will help anyway; that sleeps a bit to avoid a (possibly) related bug [6]. At least at the point when we declare Overcloud Deployed (and postconfig) we can have a homogeneous list in the neutron agents host entries (we can even improve that patch to grep for the NeutronScale specific naming like 'neutron-n-0' before initialising neutron). Alternatively, we stop using NeutronScale. In fact we don't really need it, since we already have very good control over the hostnames and they should be safe enough for scaling ("overcloud-controller-0.localdomain", "overcloud-controller-1.localdomain"). Note that this would involve changing the startup constraints (probably neutron-server -> neutron-ovs-cleanup, just skip scale), mentioning this since we should only do it at this stage if we really need to (i guess this should be considered a significant change). Speaking of which, I still can't work out where we are pulling this particular resource agent in from, so I have pasted it in full at [5] (from one of my controllers) for reference if you are interested. Grateful for feedback and thoughts, thanks [1] http://docs.openstack.org/kilo/config-reference/content/section_networking-options-reference.html # "Hostname to be used by the neutron server, agents and services running on this machine. All the agents and services running on this machine must use the same host value." [2] ICEHOUSE http://docs.openstack.org/icehouse/config-reference/content/section_neutron.conf.html # "# host = myhost.com" [3] https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/manifests/overcloud_controller_pacemaker.pp#L884 [4] https://review.gerrithub.io/#/c/238320/2 [5] http://pastebin.test.redhat.com/294389 [6] https://bugzilla.redhat.com/show_bug.cgi?id=1236578 [7] [stack@instack ~]$ neutron agent-list +--------------------------------------+--------------------+---------------------------------+-------+----------------+---------------------------+ | id | agent_type | host | alive | admin_state_up | binary | +--------------------------------------+--------------------+---------------------------------+-------+----------------+---------------------------+ | 0227a273-fa2b-4cdb-86d9-e523cc63c0e7 | Metadata agent | neutron-n-1 | :-) | True | neutron-metadata-agent | | 03bf3be1-cf8f-4815-be0e-ea604b777581 | Open vSwitch agent | neutron-n-0 | :-) | True | neutron-openvswitch-agent | | 1cc1ed4c-0aa3-45d6-898b-c444d9f5de4e | Open vSwitch agent | neutron-n-1 | :-) | True | neutron-openvswitch-agent | | 23fb7353-ffcb-410a-b979-c40a416227c0 | DHCP agent | neutron-n-2 | :-) | True | neutron-dhcp-agent | | 2ae1d115-2160-41bd-b1c1-543f06dcadd2 | Metadata agent | neutron-n-2 | :-) | True | neutron-metadata-agent | | 5c6df943-46e3-4311-95ac-aea39e2406e5 | Open vSwitch agent | neutron-n-2 | :-) | True | neutron-openvswitch-agent | | 5fa8ac50-6fcf-41e1-9785-099d3eb7ee3b | L3 agent | neutron-n-0 | :-) | True | neutron-l3-agent | | 6bf6b59d-6c3b-4745-a056-d78503e6f5c6 | DHCP agent | neutron-n-0 | :-) | True | neutron-dhcp-agent | | 7f62c11e-9ab1-426b-886d-c9568b62eb66 | DHCP agent | neutron-n-1 | :-) | True | neutron-dhcp-agent | | 8382f230-4d95-4707-bad7-fc992a99ad6e | L3 agent | neutron-n-2 | :-) | True | neutron-l3-agent | | ba9bcfd1-2373-440a-bdb9-5cf167f6c936 | Metadata agent | neutron-n-0 | :-) | True | neutron-metadata-agent | | bcc09edc-38fd-4c57-aef0-f6eed6706052 | Open vSwitch agent | overcloud-compute-0.localdomain | :-) | True | neutron-openvswitch-agent | | e70945e8-3b6b-40b0-8ccb-c76de718d0cb | L3 agent | neutron-n-1 | :-) | True | neutron-l3-agent | Marios, it agree with you looks we could try drop NeutronScale; we should check with Neutron team. thanks gfidente to which end am about to get a review out in case we need it (for the templates, want to get rid of scale and deploy it to make sure we have exact syntax etc ready to go) so the review at https://review.openstack.org/#/c/198016 "Removes the NeutronScale resource from controller pcmk manifest" does what it claims to. I tested this locally, and things did not explode... so if we can and do want to go with removing NeutronScale then that is the way to do it. I applied to current downstream tripleo heat templates and got overcloud deployed (no post config complaints wrt 1236578, though that doesn't happen every time) and: [root@overcloud-controller-0 ~]# pcs status | grep neutron -A 2 Clone Set: neutron-l3-agent-clone [neutron-l3-agent] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier] -- Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-heat-api-clone [openstack-heat-api] -- Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-glance-api-clone [openstack-glance-api] -- Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy] -- Clone Set: neutron-server-clone [neutron-server] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: httpd-clone [httpd] [stack@instack ~]$ . overcloudrc [stack@instack ~]$ neutron agent-list +--------------------------------------+--------------------+------------------------------------+-------+----------------+---------------------------+ | id | agent_type | host | alive | admin_state_up | binary | +--------------------------------------+--------------------+------------------------------------+-------+----------------+---------------------------+ | 0463b2c7-4ab0-40cf-a105-10c96629265a | L3 agent | overcloud-controller-0.localdomain | :-) | True | neutron-l3-agent | | 203505b4-6284-485d-8aec-ca2d89c75033 | DHCP agent | overcloud-controller-0.localdomain | :-) | True | neutron-dhcp-agent | | 25d7f078-99ee-4cbd-b4a3-61f216738b5b | Open vSwitch agent | overcloud-compute-0.localdomain | :-) | True | neutron-openvswitch-agent | | 2e7d891d-f91f-436d-a37a-2e559ca8de3b | L3 agent | overcloud-controller-2.localdomain | :-) | True | neutron-l3-agent | | 54d5082b-a6b8-40ce-a9ee-b3013337f298 | Open vSwitch agent | overcloud-controller-2.localdomain | :-) | True | neutron-openvswitch-agent | | 6b6a4496-c81d-40b2-9bde-23706740f59e | Metadata agent | overcloud-controller-2.localdomain | :-) | True | neutron-metadata-agent | | 7048b446-2684-463e-b72d-4b4f7c1bfbe9 | Metadata agent | overcloud-controller-1.localdomain | :-) | True | neutron-metadata-agent | | 791e1269-ecc1-4af1-9c74-fbb929f2c00b | Open vSwitch agent | overcloud-controller-0.localdomain | :-) | True | neutron-openvswitch-agent | | 8c3347ac-c0b3-43f2-a123-ca83a2e3fdb0 | Open vSwitch agent | overcloud-controller-1.localdomain | :-) | True | neutron-openvswitch-agent | | a27ed414-42a8-4020-9f9b-d6884485569c | L3 agent | overcloud-controller-1.localdomain | :-) | True | neutron-l3-agent | | a2ddb44a-705f-47ba-a353-64fb4fee7b5a | DHCP agent | overcloud-controller-2.localdomain | :-) | True | neutron-dhcp-agent | | aaf0f96a-c921-4f74-ae7e-d9a31dba9b45 | DHCP agent | overcloud-controller-1.localdomain | :-) | True | neutron-dhcp-agent | | e40b635c-ddd3-4f9d-a21b-ddcaf492d63a | Metadata agent | overcloud-controller-0.localdomain | :-) | True | neutron-metadata-agent | +--------------------------------------+--------------------+------------------------------------+-------+----------------+---------------------------+ Hi Marios, I would make sure to check with the HA/Pacemaker team first before making this change. The reason iirc that the NeutronScale resource agent exists is that the hostname on all controller nodes needs to be the same in neutron. That way when we fail over an agent, it thinks it's the same l2/l3 agent and not a new one. In your example above, each l3 agent has a different name, which I believe will be problematic when it comes time for failover (though I am not an expert). It's worth noting that in my environment things didn't fix themselves easier (it wasn't due to it not finished settling). I had time to upload an image into my overcloud, boot an instance from it, notice network wasn't working, then I discovered the agents were wrong. However one thing that might help is when I was doing this testing I had no ntp enabled on the environment, and I did notice the clock on my controller was a bit off, not sure if this is related, but it could be something causing funnyness with the heat orchestration steps maybe? Regards, Graeme hi Graeme, I think it does the contrary, it ensures every cloned instance has a unique id (eg. neutron-n-{X,Y,Z}) in order to uniquely identify the instance; this is why using hostnames seemed safe (we get overcloud-controller-{X,Y,Z}). There could be other reasons why we can't rely on hostnames though, conversation is ongoing with HA and Neutron teams. (In reply to Giulio Fidente from comment #11) > hi Graeme, I think it does the contrary, it ensures every cloned instance > has a unique id (eg. neutron-n-{X,Y,Z}) in order to uniquely identify the > instance; this is why using hostnames seemed safe (we get > overcloud-controller-{X,Y,Z}). > > There could be other reasons why we can't rely on hostnames though, > conversation is ongoing with HA and Neutron teams. Oh ok my understanding must be out of date. Thank you for the correction (In reply to Graeme Gillies from comment #12) > (In reply to Giulio Fidente from comment #11) > > hi Graeme, I think it does the contrary, it ensures every cloned instance > > has a unique id (eg. neutron-n-{X,Y,Z}) in order to uniquely identify the > > instance; this is why using hostnames seemed safe (we get > > overcloud-controller-{X,Y,Z}). > > > > There could be other reasons why we can't rely on hostnames though, > > conversation is ongoing with HA and Neutron teams. > > Oh ok my understanding must be out of date. Thank you for the correction + thanks Graeme & Giulio, sorry for not responding Graeme I just agreed with what you said about checking with the folks that wrote NeutronScale and used it in the first place (jayg reached out last night waiting to hear back). If we continue to use NeutronScale then the fixup at https://review.gerrithub.io/#/c/238320/6 (which is meant to address the related https://bugzilla.redhat.com/show_bug.cgi?id=1236578) should help to alleviate some of the pain - the idea is to get at least two l3 agents with hosts that match the 'neutron-n-?' pattern - though given Graeme's comment we may want to up the sleep time (currently 2 mins ish total, doesn't factor in time to invoke neutron client and get a response) grateful for any review of the proposed fixup for this @ https://review.gerrithub.io/#/c/238320/8 In particular we can tweak the various parameters there if we continue to see this issue with the fix applied. thanks Created attachment 1050706 [details]
email conversation (about neutron scale, our setup, and if we can remove it) for context
The root cause and the best fix here, really is to remove NeutronScale, since we have (neutron) native HA - the bit we are missing was control over the dhcp_agents_per_network (and setting a minimum of 3) but we have reviews for that @ [1][2]. Once those land we can then safely land [3] - which removes NeutronScale. HOWEVER, before we land that, and since we landed [4] as a fix here in the meantime, we need [5] to revert it, otherwise we will timeout grepping on 'neutron-n-?' (but there is no NeutronScale so...). Thanks very much to Assaf for his advice, I copy/paste the email conversation (about neutron scale, our setup, and if we can remove it) for context at [6] as an attachment to this bug. [1] https://review.openstack.org/#/c/199102/ Adds the NeutronDhcpAgentsPerNetwork parameter, oscplugin [2] https://review.gerrithub.io/238893 Wires up NeutronDhcpAgentsPerNetwork parameter to deploy [3] https://review.openstack.org/#/c/198016/ Removes the NeutronScale resource from controller pcmk manifest [4] https://review.gerrithub.io/#/c/238320/8 Increase the sleep time while trying to get neutron l3 agents [5] https://review.gerrithub.io/239450 Remove search for l3_agent name since NeutronScale is gone [6] https://bugzilla.redhat.com/attachment.cgi?id=1050706 I just finished a run through with all the reviews from comment 18 applied (fixup the dhcp_agents_per_network, remove neutron scale, remove explicit grep for neutron-n-? in l3 agents). Deployed OK and [root@overcloud-controller-1 ~]# grep -ni 'dhcp_agents' /etc/neutron/* grep: /etc/neutron/conf.d: Is a directory /etc/neutron/neutron.conf:242:# dhcp_agents_per_network = 1 /etc/neutron/neutron.conf:243:dhcp_agents_per_network = 3 grep: /etc/neutron/plugins: Is a directory [root@overcloud-controller-1 ~]# pcs status | grep neutron -A 1 Clone Set: neutron-l3-agent-clone [neutron-l3-agent] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] -- Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] -- Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] -- Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] -- Clone Set: neutron-server-clone [neutron-server] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] [stack@instack ~]$ . overcloudrc [stack@instack ~]$ neutron agent-list +--------------------------------------+--------------------+------------------------------------+-------+----------------+---------------------------+ | id | agent_type | host | alive | admin_state_up | binary | +--------------------------------------+--------------------+------------------------------------+-------+----------------+---------------------------+ | 0295688d-8fae-4174-9e9c-5082d4c713e4 | Open vSwitch agent | overcloud-controller-0.localdomain | :-) | True | neutron-openvswitch-agent | | 17cece0a-c3e9-444a-be8a-322da0927fb7 | L3 agent | overcloud-controller-2.localdomain | :-) | True | neutron-l3-agent | | 1ff418ea-7acb-4327-b426-8258fa45ee83 | L3 agent | overcloud-controller-1.localdomain | :-) | True | neutron-l3-agent | | 2c786688-d4f4-47cd-ac4b-5ebe5eae30d3 | Metadata agent | overcloud-controller-0.localdomain | :-) | True | neutron-metadata-agent | | 36a2e098-283e-49e0-931a-93be248a47d2 | Open vSwitch agent | overcloud-compute-0.localdomain | :-) | True | neutron-openvswitch-agent | | 52c60e25-05da-46fa-972b-28c7d47b1016 | DHCP agent | overcloud-controller-2.localdomain | :-) | True | neutron-dhcp-agent | | 63247b37-df3f-44ed-9f1a-79e8b428ead0 | DHCP agent | overcloud-controller-0.localdomain | :-) | True | neutron-dhcp-agent | | a5724a0f-55ba-4909-925f-f5e850acac7c | Metadata agent | overcloud-controller-2.localdomain | :-) | True | neutron-metadata-agent | | a5ceb4cd-cd48-45fc-96ea-e58ed3d9968d | Open vSwitch agent | overcloud-controller-1.localdomain | :-) | True | neutron-openvswitch-agent | | bb3f4d92-d685-491b-b0ad-508825338f86 | DHCP agent | overcloud-controller-1.localdomain | :-) | True | neutron-dhcp-agent | | bf94215a-a197-48fe-a91b-6bc1ab282da8 | Metadata agent | overcloud-controller-1.localdomain | :-) | True | neutron-metadata-agent | | c51b72c9-48b1-4e1d-bba5-f4b01ac60ca4 | Open vSwitch agent | overcloud-controller-2.localdomain | :-) | True | neutron-openvswitch-agent | | e928210c-8ea9-41c9-a1ec-f01f245d4d2e | L3 agent | overcloud-controller-0.localdomain | :-) | True | neutron-l3-agent | +--------------------------------------+--------------------+------------------------------------+-------+----------------+---------------------------+ as an update to comment #18, since we now aren't doing overcloud network postconfig [1] we don't need to revert the sleep in the oscplugin, i -1 the relevant review @ https://review.gerrithub.io/#/c/239450/ [1] https://review.gerrithub.io/#/c/239833/1 With python-rdomanager-oscplugin-0.0.8-32.el7ost.noarch I get the following: +--------------------------------------+--------------------+------------------------------------+-------+----------------+---------------------------+ | id | agent_type | host | alive | admin_state_up | binary | +--------------------------------------+--------------------+------------------------------------+-------+----------------+---------------------------+ | 208b3dfa-331a-4bc3-b9b2-dc6e8ade5eae | DHCP agent | overcloud-controller-0.localdomain | xxx | True | neutron-dhcp-agent | | 86562501-06f4-4555-9b76-56c537e8c999 | DHCP agent | neutron-n-0 | :-) | True | neutron-dhcp-agent | | 96a0a04c-a336-4e0a-91bf-2fda613fe417 | L3 agent | neutron-n-0 | :-) | True | neutron-l3-agent | | 9c5e7886-afce-4eed-8591-306b89f9099f | Metadata agent | neutron-n-0 | xxx | True | neutron-metadata-agent | | a821979e-ae00-4a52-923a-c881673d6a7f | L3 agent | overcloud-controller-0.localdomain | xxx | True | neutron-l3-agent | | b50e6df3-2098-4461-bffd-c2294eedbed5 | Open vSwitch agent | neutron-n-0 | :-) | True | neutron-openvswitch-agent | | cf46086c-0183-4132-b9da-0c5eab82900b | Open vSwitch agent | overcloud-compute-0.localdomain | :-) | True | neutron-openvswitch-agent | | fd31250b-a517-453f-a542-06bb3bd183e1 | Open vSwitch agent | overcloud-compute-1.localdomain | :-) | True | neutron-openvswitch-agent | +--------------------------------------+--------------------+------------------------------------+-------+----------------+---------------------------+ thanks Itzik, that looks like what I get from today's poodle setup and it looks OK to me - except the metadata agent (not sure why xxx there? did the stack create complete?). In any case, the real fix remains removal of neutronscale as discussed above in comment 18 and also more recently in the dependent bug @ https://bugzilla.redhat.com/show_bug.cgi?id=1236578#c23 After talking with marios and hewbrocca waiting for openstack-tripleo-heat-templates-0.8.6-40.el7ost python-rdomanager-oscplugin-0.0.8-37.el7ost to verify thanks Itzik - I am still deploying today's poodle, but fyi I get: [root@instack rdomanager_oscplugin]# rpm -qa | grep rdomanager python-rdomanager-oscplugin-0.0.9-dev11.el7.centos.noarch [root@instack rdomanager_oscplugin]# rpm -qa | grep tripleo-heat openstack-tripleo-heat-templates-0.8.6-41.el7ost.noarch The osc-plugin version above (0.0.9) actually doesn't have the dhcp agent change yet (it landed at https://review.gerrithub.io/#/c/238893/ ) so not sure what mburns had in mind with the "python-rdomanager-oscplugin-0.0.8-37.el7ost" requirement (perhaps that is where we dropped the temp sleep fix). In any case, the good news is that the heat templates version above does have the required removal of neutronscale, so that alone should be enough to get a clear run here and the dependent bugs. I expect the final bit (making dhcp agents per network default to 3) should appear soon enough in a build, will ping mburns later thanks (In reply to marios from comment #24) > thanks Itzik - I am still deploying today's poodle, but fyi I get: > > [root@instack rdomanager_oscplugin]# rpm -qa | grep rdomanager > python-rdomanager-oscplugin-0.0.9-dev11.el7.centos.noarch > > [root@instack rdomanager_oscplugin]# rpm -qa | grep tripleo-heat > openstack-tripleo-heat-templates-0.8.6-41.el7ost.noarch > > The osc-plugin version above (0.0.9) actually doesn't have the dhcp agent > change yet (it landed at https://review.gerrithub.io/#/c/238893/ ) so not > sure what mburns had in mind with the > "python-rdomanager-oscplugin-0.0.8-37.el7ost" requirement (perhaps that is > where we dropped the temp sleep fix). > > In any case, the good news is that the heat templates version above does > have the required removal of neutronscale, so that alone should be enough to > get a clear run here and the dependent bugs. I expect the final bit (making > dhcp agents per network default to 3) should appear soon enough in a build, > will ping mburns later > > thanks The patch is included in the most recent builds. As long as you have 0.0.8-37 or newer, it's in there. The .centos build is not valid. yeah thanks Mike, I was mistakenly enabling the extra repos... once i did it properly I got python-rdomanager-oscplugin-0.0.8-38.el7ost.noarch and confirm it has all the things $ neutron agent-list +--------------------------------------+--------------------+------------------------------------+-------+----------------+---------------------------+ | id | agent_type | host | alive | admin_state_up | binary | +--------------------------------------+--------------------+------------------------------------+-------+----------------+---------------------------+ | 29c89c83-1788-4bcc-8e4b-5802c2c6b524 | Metadata agent | overcloud-controller-0.localdomain | :-) | True | neutron-metadata-agent | | 2efeee7c-a860-4b6a-8c34-d5fb553da0ec | DHCP agent | overcloud-controller-0.localdomain | :-) | True | neutron-dhcp-agent | | 3195a1d3-1fee-458c-95c6-3e47a84b6b1d | L3 agent | overcloud-controller-0.localdomain | :-) | True | neutron-l3-agent | | 4c600774-e509-4124-b314-4a185e67f900 | Open vSwitch agent | overcloud-controller-0.localdomain | :-) | True | neutron-openvswitch-agent | | 93dcfea5-188f-4c15-946d-dbac76452c8d | Open vSwitch agent | overcloud-compute-0.localdomain | :-) | True | neutron-openvswitch-agent | | c901a51a-84d7-48a1-9cdb-552f612041e9 | Open vSwitch agent | overcloud-compute-1.localdomain | :-) | True | neutron-openvswitch-agent | +--------------------------------------+--------------------+------------------------------------+-------+----------------+---------------------------+ Checked with: python-rdomanager-oscplugin-0.0.8-41.el7ost.noarch openstack-tripleo-heat-templates-0.8.6-44.el7ost.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2015:1549 |