Bug 1556686
Description
jamo luhrsen
2018-03-15 04:03:39 UTC
This issue should be analyzed deeper to understand where the problem really lies. The leak could be such that it will worsen over time and eventually cause real issues. It could be a bug in ODL, tempest, neutron, etc. Additionally, this bug causes headaches in CI environments that have expectations of the environment. Can you please attach logs when this happened, so that we can further analyse the bug? (In reply to Mike Kolesnik from comment #2) > Can you please attach logs when this happened, so that we can further > analyse the bug? there are some details in the upstream opendaylight jira that I gave a link to. what logs do you want? below is a reproduction and I've attached the neutron server log, dhcp-agent log and odl karaf.log reproduction (it takes a few iterations to hit the problem with this smaller tempest regex): Mon Mar 19 17:07:21 EDT 2018 (overcloud) [stack@undercloud-0 tempest-dir]$ openstack router list (overcloud) [stack@undercloud-0 tempest-dir]$ sudo tempest run --conf /etc/tempest/tempest.conf --regex tempest.api.network.admin.*snat running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \ ${PYTHON:-python} -m subunit.run discover -t /usr/lib/python2.7/site-packages/tempest /usr/lib/python2.7/site-packages/tempest/test_discover --list running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \ ${PYTHON:-python} -m subunit.run discover -t /usr/lib/python2.7/site-packages/tempest /usr/lib/python2.7/site-packages/tempest/test_discover --load-list /tmp/tmpewb2dI running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \ ${PYTHON:-python} -m subunit.run discover -t /usr/lib/python2.7/site-packages/tempest /usr/lib/python2.7/site-packages/tempest/test_discover --load-list /tmp/tmpWVaYrj {1} tempest.api.network.admin.test_routers.RoutersIpV6AdminTest.test_create_router_with_default_snat_value [3.352111s] ... ok {0} tempest.api.network.admin.test_routers.RoutersAdminTest.test_create_router_with_default_snat_value [3.675526s] ... ok {1} tempest.api.network.admin.test_routers.RoutersIpV6AdminTest.test_create_router_with_snat_explicit [5.881535s] ... ok {0} tempest.api.network.admin.test_routers.RoutersAdminTest.test_create_router_with_snat_explicit [6.039762s] ... ok {1} tempest.api.network.admin.test_routers.RoutersIpV6AdminTest.test_update_router_reset_gateway_without_snat [4.106241s] ... ok {0} tempest.api.network.admin.test_routers.RoutersAdminTest.test_update_router_reset_gateway_without_snat [3.959537s] ... ok {1} tempest.api.network.admin.test_routers.RoutersIpV6AdminTest.test_update_router_set_gateway_with_snat_explicit [3.090799s] ... ok {0} tempest.api.network.admin.test_routers.RoutersAdminTest.test_update_router_set_gateway_with_snat_explicit [3.249783s] ... ok {1} tempest.api.network.admin.test_routers.RoutersIpV6AdminTest.test_update_router_set_gateway_without_snat [3.781734s] ... ok {0} tempest.api.network.admin.test_routers.RoutersAdminTest.test_update_router_set_gateway_without_snat [3.575197s] ... ok ====== Totals ====== Ran: 10 tests in 30.0000 sec. - Passed: 10 - Skipped: 0 - Expected Fail: 0 - Unexpected Success: 0 - Failed: 0 Sum of execute time for each test: 40.7122 sec. ============== Worker Balance ============== - Worker 0 (5 tests) => 0:00:20.502541 - Worker 1 (5 tests) => 0:00:20.214550 (overcloud) [stack@undercloud-0 tempest-dir]$ openstack router list (overcloud) [stack@undercloud-0 tempest-dir]$ date Mon Mar 19 17:08:10 EDT 2018 (overcloud) [stack@undercloud-0 tempest-dir]$ openstack router list ^[[A^[[A^[[A^[[A (overcloud) [stack@undercloud-0 tempest-dir]$ sudo tempest run --conf /etc/tempest/tempest.conf --regex tempest.api.network.admin.*snat running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \ ${PYTHON:-python} -m subunit.run discover -t /usr/lib/python2.7/site-packages/tempest /usr/lib/python2.7/site-packages/tempest/test_discover --list running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \ ${PYTHON:-python} -m subunit.run discover -t /usr/lib/python2.7/site-packages/tempest /usr/lib/python2.7/site-packages/tempest/test_discover --load-list /tmp/tmpY36MZP running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \ ${PYTHON:-python} -m subunit.run discover -t /usr/lib/python2.7/site-packages/tempest /usr/lib/python2.7/site-packages/tempest/test_discover --load-list /tmp/tmpV566i5 {0} tempest.api.network.admin.test_routers.RoutersAdminTest.test_create_router_with_default_snat_value [4.249675s] ... ok {1} tempest.api.network.admin.test_routers.RoutersIpV6AdminTest.test_create_router_with_default_snat_value [4.377853s] ... ok {1} tempest.api.network.admin.test_routers.RoutersIpV6AdminTest.test_create_router_with_snat_explicit [6.316407s] ... ok {0} tempest.api.network.admin.test_routers.RoutersAdminTest.test_create_router_with_snat_explicit [7.588991s] ... ok {1} tempest.api.network.admin.test_routers.RoutersIpV6AdminTest.test_update_router_reset_gateway_without_snat [4.324835s] ... ok {0} tempest.api.network.admin.test_routers.RoutersAdminTest.test_update_router_reset_gateway_without_snat [4.410701s] ... ok {1} tempest.api.network.admin.test_routers.RoutersIpV6AdminTest.test_update_router_set_gateway_with_snat_explicit [4.006442s] ... ok {0} tempest.api.network.admin.test_routers.RoutersAdminTest.test_update_router_set_gateway_with_snat_explicit [3.591074s] ... ok {1} tempest.api.network.admin.test_routers.RoutersIpV6AdminTest.test_update_router_set_gateway_without_snat [3.553873s] ... ok {0} tempest.api.network.admin.test_routers.RoutersAdminTest.test_update_router_set_gateway_without_snat [3.163889s] ... ok ====== Totals ====== Ran: 10 tests in 32.0000 sec. - Passed: 10 - Skipped: 0 - Expected Fail: 0 - Unexpected Success: 0 - Failed: 0 Sum of execute time for each test: 45.5837 sec. ============== Worker Balance ============== - Worker 0 (5 tests) => 0:00:23.007013 - Worker 1 (5 tests) => 0:00:22.581773 (overcloud) [stack@undercloud-0 tempest-dir]$ openstack router list +--------------------------------------+--------------------------------+--------+-------+-------------+-------+----------------------------------+ | ID | Name | Status | State | Distributed | HA | Project | +--------------------------------------+--------------------------------+--------+-------+-------------+-------+----------------------------------+ | ee5f0f97-a8f9-407d-b912-8a3e8ed024d7 | tempest-snat-router-1366717702 | ACTIVE | UP | False | False | 6016d798aca9407295f66349f337c307 | +--------------------------------------+--------------------------------+--------+-------+-------------+-------+----------------------------------+ (overcloud) [stack@undercloud-0 tempest-dir]$ date Mon Mar 19 17:08:57 EDT 2018 Created attachment 1410156 [details]
dhcp-agent
Created attachment 1410157 [details]
opendaylight karaf.log
Created attachment 1410158 [details]
neutron server log
some extra info from an IRC chat: look at this: https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-upstream-stateful-carbon/460/robot-plugin/log_05_tempest.html.gz#s1-k2-k1-k5-k1-k4 that is carbon w/ ocata. that is the output of 'router list' after all our tempest tests are done. and... carbon w/ pike which has a bunch of leftover routers: https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-pike-upstream-stateful-carbon/244/robot-plugin/log_05_tempest.html.gz#s1-k2-k1-k5-k1-k4 (In reply to jamo luhrsen from comment #3) > (In reply to Mike Kolesnik from comment #2) > > Can you please attach logs when this happened, so that we can further > > analyse the bug? > > there are some details in the upstream opendaylight jira that I gave a link > to. what logs do you want? below is a reproduction and I've attached the > neutron server log, dhcp-agent log and odl karaf.log > > reproduction (it takes a few iterations to hit the problem with this smaller > tempest regex): > > Mon Mar 19 17:07:21 EDT 2018 > (overcloud) [stack@undercloud-0 tempest-dir]$ openstack router list > > (overcloud) [stack@undercloud-0 tempest-dir]$ sudo tempest run --conf > /etc/tempest/tempest.conf --regex tempest.api.network.admin.*snat ... snip ... > ============== > Worker Balance > ============== > - Worker 0 (5 tests) => 0:00:23.007013 > - Worker 1 (5 tests) => 0:00:22.581773 > (overcloud) [stack@undercloud-0 tempest-dir]$ openstack router list > +--------------------------------------+--------------------------------+---- > ----+-------+-------------+-------+----------------------------------+ > | ID | Name | > Status | State | Distributed | HA | Project | > +--------------------------------------+--------------------------------+---- > ----+-------+-------------+-------+----------------------------------+ > | ee5f0f97-a8f9-407d-b912-8a3e8ed024d7 | tempest-snat-router-1366717702 | > ACTIVE | UP | False | False | 6016d798aca9407295f66349f337c307 | > +--------------------------------------+--------------------------------+---- > ----+-------+-------------+-------+----------------------------------+ > (overcloud) [stack@undercloud-0 tempest-dir]$ date > Mon Mar 19 17:08:57 EDT 2018 This is weird, when I search for this router by name I see that it has the ID 092757a0-8e79-4471-9198-53927439abb6 and not that ID which is listed here. The ID ee5f0f97-a8f9-407d-b912-8a3e8ed024d7 is only listed as the device_id for a router gateway port, but I see no mention of it being created or accessed or deleted.. Also, I see that the router with the id 092757a0-8e79-4471-9198-53927439abb6 was deleted: 2018-03-19 21:08:32.090 108283 INFO neutron.wsgi [req-6949247f-e09d-403e-8b43-a946c9026335 f1e3d25d45a9482c899960ba39cd6b76 6016d798aca9407295f66349f337c307 - default default] 172.17.1.16 "DELETE /v2.0/routers/092757a0-8e79-4471-9198-53927439abb6 HTTP/1.1" status: 204 len: 168 time: 1.5183730 2018-03-19 21:08:32.102 108283 INFO networking_odl.journal.journal [req-6a1b0cc9-ebce-453a-a88f-7605aa601b82 - - - - -] Completed (Entry ID: 9256) - delete router 092757a0-8e79-4471-9198-53927439abb6 Perhaps there's two routers with the same name (Though I thought it's a random name)? Jamo, can you please attach a neutron & karaf log from when the server started and that has the reproduced behavior? Created attachment 1420519 [details]
4/11 repro karaf.log
Created attachment 1420520 [details]
4/11 repro server.log
(In reply to Mike Kolesnik from comment #11) > (In reply to jamo luhrsen from comment #3) > > (In reply to Mike Kolesnik from comment #2) > > > Can you please attach logs when this happened, so that we can further > > > analyse the bug? > > > > there are some details in the upstream opendaylight jira that I gave a link > > to. what logs do you want? below is a reproduction and I've attached the > > neutron server log, dhcp-agent log and odl karaf.log > > > > reproduction (it takes a few iterations to hit the problem with this smaller > > tempest regex): > > > > Mon Mar 19 17:07:21 EDT 2018 > > (overcloud) [stack@undercloud-0 tempest-dir]$ openstack router list > > > > (overcloud) [stack@undercloud-0 tempest-dir]$ sudo tempest run --conf > > /etc/tempest/tempest.conf --regex tempest.api.network.admin.*snat > ... snip ... > > ============== > > Worker Balance > > ============== > > - Worker 0 (5 tests) => 0:00:23.007013 > > - Worker 1 (5 tests) => 0:00:22.581773 > > (overcloud) [stack@undercloud-0 tempest-dir]$ openstack router list > > +--------------------------------------+--------------------------------+---- > > ----+-------+-------------+-------+----------------------------------+ > > | ID | Name | > > Status | State | Distributed | HA | Project | > > +--------------------------------------+--------------------------------+---- > > ----+-------+-------------+-------+----------------------------------+ > > | ee5f0f97-a8f9-407d-b912-8a3e8ed024d7 | tempest-snat-router-1366717702 | > > ACTIVE | UP | False | False | 6016d798aca9407295f66349f337c307 | > > +--------------------------------------+--------------------------------+---- > > ----+-------+-------------+-------+----------------------------------+ > > (overcloud) [stack@undercloud-0 tempest-dir]$ date > > Mon Mar 19 17:08:57 EDT 2018 > > This is weird, when I search for this router by name I see that it has the > ID 092757a0-8e79-4471-9198-53927439abb6 and not that ID which is listed here. > The ID ee5f0f97-a8f9-407d-b912-8a3e8ed024d7 is only listed as the device_id > for a router gateway port, but I see no mention of it being created or > accessed or deleted.. > > Also, I see that the router with the id 092757a0-8e79-4471-9198-53927439abb6 > was deleted: > > 2018-03-19 21:08:32.090 108283 INFO neutron.wsgi > [req-6949247f-e09d-403e-8b43-a946c9026335 f1e3d25d45a9482c899960ba39cd6b76 > 6016d798aca9407295f66349f337c307 - default default] 172.17.1.16 "DELETE > /v2.0/routers/092757a0-8e79-4471-9198-53927439abb6 HTTP/1.1" status: 204 > len: 168 time: 1.5183730 > 2018-03-19 21:08:32.102 108283 INFO networking_odl.journal.journal > [req-6a1b0cc9-ebce-453a-a88f-7605aa601b82 - - - - -] Completed (Entry ID: > 9256) - delete router 092757a0-8e79-4471-9198-53927439abb6 > > Perhaps there's two routers with the same name (Though I thought it's a > random name)? > > Jamo, can you please attach a neutron & karaf log from when the server > started and that has the reproduced behavior? Mike, attached a new server.log and karaf.log files for another reproduction (4/11 repro in attachment names). here's the command line output: (overcloud) [stack@undercloud-0 ~]$ tempest run --conf /etc/tempest/tempest.conf --regex tempest.api.network.admin.*snat running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \ ${PYTHON:-python} -m subunit.run discover -t /usr/lib/python2.7/site-packages/tempest /usr/lib/python2.7/site-packages/tempest/test_discover --list running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \ ${PYTHON:-python} -m subunit.run discover -t /usr/lib/python2.7/site-packages/tempest /usr/lib/python2.7/site-packages/tempest/test_discover --load-list /tmp/tmpf_1wkz running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \ ${PYTHON:-python} -m subunit.run discover -t /usr/lib/python2.7/site-packages/tempest /usr/lib/python2.7/site-packages/tempest/test_discover --load-list /tmp/tmpGjw8Mn {1} tempest.api.network.admin.test_routers.RoutersIpV6AdminTest.test_create_router_with_default_snat_value [4.264322s] ... ok {0} tempest.api.network.admin.test_routers.RoutersAdminTest.test_create_router_with_default_snat_value [4.437107s] ... ok {0} tempest.api.network.admin.test_routers.RoutersAdminTest.test_create_router_with_snat_explicit [6.570603s] ... ok {1} tempest.api.network.admin.test_routers.RoutersIpV6AdminTest.test_create_router_with_snat_explicit [7.621135s] ... ok {0} tempest.api.network.admin.test_routers.RoutersAdminTest.test_update_router_reset_gateway_without_snat [4.224652s] ... ok {1} tempest.api.network.admin.test_routers.RoutersIpV6AdminTest.test_update_router_reset_gateway_without_snat [4.536168s] ... ok {0} tempest.api.network.admin.test_routers.RoutersAdminTest.test_update_router_set_gateway_with_snat_explicit [4.679894s] ... ok {1} tempest.api.network.admin.test_routers.RoutersIpV6AdminTest.test_update_router_set_gateway_with_snat_explicit [4.522345s] ... ok {0} tempest.api.network.admin.test_routers.RoutersAdminTest.test_update_router_set_gateway_without_snat [4.244492s] ... ok {1} tempest.api.network.admin.test_routers.RoutersIpV6AdminTest.test_update_router_set_gateway_without_snat [4.016432s] ... ok ====== Totals ====== Ran: 10 tests in 33.0000 sec. - Passed: 10 - Skipped: 0 - Expected Fail: 0 - Unexpected Success: 0 - Failed: 0 Sum of execute time for each test: 49.1172 sec. ============== Worker Balance ============== - Worker 0 (5 tests) => 0:00:24.159222 - Worker 1 (5 tests) => 0:00:24.963881 (overcloud) [stack@undercloud-0 ~]$ openstack router list +--------------------------------------+-----------------------------+--------+-------+-------------+-------+----------------------------------+ | ID | Name | Status | State | Distributed | HA | Project | +--------------------------------------+-----------------------------+--------+-------+-------------+-------+----------------------------------+ | a1dd6ef4-a9a9-4747-bcff-3a3f80b4abb3 | tempest-snat-router-9379766 | ACTIVE | UP | False | False | ee0b40bcd52e4747a888f82e5107e122 | +--------------------------------------+-----------------------------+--------+-------+-------------+-------+----------------------------------+ Also, note that I stuffed a "JAMO: TEST START" and "JAMO: TEST END" string in the log files to show right before and after I ran the above tempest command. I'm sorry but the new log still doesn't contain the information.. For reference here's the JAMO lines: [mkolesni@localhost ~]$ grep -A1 JAMO /tmp/server.log JAMO: TEST START 2018-04-11 21:49:25.327 125358 DEBUG neutron.wsgi [-] (125358) accepted ('172.17.1.17', 55186) server /usr/lib/python2.7/site-packages/eventlet/wsgi.py:883 -- JAMO: TEST END 2018-04-11 21:50:03.378 125359 DEBUG neutron.wsgi [-] (125359) accepted ('172.17.1.17', 57274) server /usr/lib/python2.7/site-packages/eventlet/wsgi.py:883 As you can see if you try to grep the router ID it's only mentioned once (the time frame is indeed within the time of the test): [mkolesni@localhost ~]$ grep a1dd6ef4-a9a9-4747-bcff-3a3f80b4abb3 /tmp/server.log 2018-04-11 21:49:39.464 125359 DEBUG neutron.api.rpc.handlers.resources_rpc [req-80539b19-186d-4d2c-a341-2ba6af54e882 3713c00dea3247bca8359c5cb812fbde ee0b40bcd52e4747a888f82e5107e122 - - -] neutron.api.rpc.handlers.resources_rpc.ResourcesPushRpcApi method push called with arguments (<neutron_lib.context.Context object at 0x89f4d50>, [Port(admin_state_up=True,allowed_address_pairs=[],binding=PortBinding,binding_levels=[],created_at=2018-04-11T21:49:39Z,data_plane_status=<?>,description='',device_id='a1dd6ef4-a9a9-4747-bcff-3a3f80b4abb3',device_owner='network:router_gateway',dhcp_options=[],distributed_binding=None,dns=None,fixed_ips=[IPAllocation],id=aa8b6068-35e6-4a9a-94f5-866da44201ae,mac_address=fa:16:3e:7f:a9:9d,name='',network_id=103184c3-d46d-4028-978e-50ff00c97dad,project_id='',qos_policy_id=None,revision_number=3,security=PortSecurity(aa8b6068-35e6-4a9a-94f5-866da44201ae),security_group_ids=set([]),status='DOWN',updated_at=2018-04-11T21:49:39Z)], 'updated') {} wrapper /usr/lib/python2.7/site-packages/oslo_log/helpers.py:66 "device_id": "a1dd6ef4-a9a9-4747-bcff-3a3f80b4abb3" "device_id" : "a1dd6ef4-a9a9-4747-bcff-3a3f80b4abb3", [mkolesni@localhost ~]$ So there's no mention of the router being created nor deleted.. If we try to grep for the router name we get several matches: [mkolesni@localhost ~]$ grep tempest-snat-router-9379766 /tmp/server.log 2018-04-11 21:49:38.590 125359 DEBUG neutron.api.v2.base [req-80539b19-186d-4d2c-a341-2ba6af54e882 3713c00dea3247bca8359c5cb812fbde ee0b40bcd52e4747a888f82e5107e122 - default default] Request body: {u'router': {u'external_gateway_info': {u'network_id': u'103184c3-d46d-4028-978e-50ff00c97dad', u'enable_snat': False}, u'name': u'tempest-snat-router-9379766'}} prepare_request_body /usr/lib/python2.7/site-packages/neutron/api/v2/base.py:695 2018-04-11 21:49:39.861 125359 DEBUG neutron.api.v2.base [req-80539b19-186d-4d2c-a341-2ba6af54e882 3713c00dea3247bca8359c5cb812fbde ee0b40bcd52e4747a888f82e5107e122 - default default] Request body: {u'router': {u'external_gateway_info': {u'network_id': u'103184c3-d46d-4028-978e-50ff00c97dad', u'enable_snat': False}, u'name': u'tempest-snat-router-9379766'}} prepare_request_body /usr/lib/python2.7/site-packages/neutron/api/v2/base.py:695 "name": "tempest-snat-router-9379766" "name" : "tempest-snat-router-9379766", 2018-04-11 21:49:42.067 125357 DEBUG neutron.api.v2.base [req-af49b3b0-28e5-408f-af36-fed0b6085c5d 3713c00dea3247bca8359c5cb812fbde ee0b40bcd52e4747a888f82e5107e122 - default default] Request body: {u'router': {u'external_gateway_info': {u'network_id': u'103184c3-d46d-4028-978e-50ff00c97dad', u'enable_snat': True}, u'name': u'tempest-snat-router-9379766'}} prepare_request_body /usr/lib/python2.7/site-packages/neutron/api/v2/base.py:695 "name": "tempest-snat-router-9379766" "name" : "tempest-snat-router-9379766", [mkolesni@localhost ~]$ The first match is a create attempt that failed, the second is a successful attemp that created the router with ID 35fd120a-937b-4cd4-8aba-d9135775b233, and the third created the router with same name (not a unique property in neutron anyway) with the ID 408594a8-2ebe-4f41-93aa-3271883268e2. Mind you, both these routers were deleted and that can also be seen in the log. It's impossible to see when this "leftover" router was created or deleted or where it even came from, but it's not the other ones with the same name.. You could also probably see the router name in the tempest log and see it's reusing the name for the various routers (again the name doesn't matter, only thing that matters is the ID). So if you have any logs that actually capture the creation and attempted deletion of the router that would be great, otherwise it's impossible to understand where it comes from.. Also if you can please attach the tempest log as well so that we can perhaps get a better understanding of what's going on? there are random tempest tests failing with errors like: RBAC policy on object d5d7ca23-dd42-4ec9-a78e-64460f985bd6 cannot be removed because other objects depend on it.\nDetails: There are routers attached to this network that depend on this policy for access. or Details: {u'message': u'Unable to complete operation on network 0e9548e4-f181-4cd8-8ea7-d2287f5f3229. There are one or more ports still in use on the network.', u'type': u'NetworkInUse', u'detail': u''} I'm attaching datastore dumps and sosreports from the run I saw both of the above messages happening (https://url.corp.redhat.com/75d5ae3), sosreports clearly show multiple routers and security groups created by not removed by tempest tests (karaf and neutron logs inside the same package) NOTE: the above run #68 was done using Vishal's fix (from downstream gerrit: https://url.corp.redhat.com/7d6a080) for https://bugzilla.redhat.com/show_bug.cgi?id=1567507 and _no_ FIP issues were spotted in 2 consecutive runs (#68 and #66 downstream) with that fix and the left over resources may be more apparent since the FIP issue not occurring Created attachment 1435839 [details]
20180513_043751-test_delete_policies_while_tenant_attached_to_net-sosreport-controller-0
Created attachment 1435840 [details]
20180513_043751-test_delete_policies_while_tenant_attached_to_net-odltools-undercloud-0.redhat.local
Created attachment 1435841 [details]
20180513_043751-test_delete_policies_while_tenant_attached_to_net-sosreport-compute-0
Created attachment 1435842 [details]
20180513_043751-test_delete_policies_while_tenant_attached_to_net-sosreport-compute-1
Mike, what else do you need? Also, just double checking that you noticed the semi-reasonable reproduction steps. finally, this is happening in upstream ODL (last time I checked) and we keep a lot of logs for those jobs. We can sift through those as well. Thanks, JamO *** This bug has been marked as a duplicate of bug 1519783 *** After debugging this it seems this is the same issue seen in bug 1519783, where a router gets created on Neutron side but never synced to ODL. The main reason for this is that the recording to journal fails (probably due to a DB failure) and gets retried, so a new router gets created and the old router is left dangling on Neutron (in bug 1519783 the scale test tries to delete it and fails since it's not on ODL). This should be fixed by L3 flavors which should be available in OSP 14. |