Created attachment 1382568 [details] Reproducer Script and logs. Description of problem: When multiple host agents rapidly report_state for the first time we get StaleDataError and _update_segment_host_mapping_for_agent does not complete for all hosts. Attached is a file with logs as well as reproducer script and instruction on how to set up devstack environment similar to the one I am using. Version-Release number of selected component (if applicable): How reproducible: Every time with reproduces script that does report_state for 3x hosts. Steps to Reproduce: Run reproducer script with the delay, time.sleep(10), commented. __NOTE__: The reproducer script is included in the attachement with logs and a devstack instuctions to test agains upstream openstack Actual results: Results: * 2x StaleDataError * Only 1 attempt to add host to placement/host-aggregate. MariaDB [neutron]> MariaDB [neutron]> SELECT * FROM segmenthostmappings; +--------------------------------------+---------------------------------+ | segment_id | host | +--------------------------------------+---------------------------------+ | a974ae4c-1389-4e41-9ab9-820165c26acd | host2 | | a974ae4c-1389-4e41-9ab9-820165c26acd | routed-devstack.lab.example.com | | bc626d3d-5503-4875-9db8-e1bcfad35979 | host2 | | bc626d3d-5503-4875-9db8-e1bcfad35979 | routed-devstack.lab.example.com | | ec7717dd-8533-464f-a3c8-4ecc7dc08d10 | host2 | | ec7717dd-8533-464f-a3c8-4ecc7dc08d10 | routed-devstack.lab.example.com | +--------------------------------------+---------------------------------+ Conclusions: * 2x StaleDataError * 1x successful _update_segment_host_mapping after_create. Expected results: We should see 3x attempts to add to placement/host-aggregate, one for each host agent. And all 3 hosts should have entries in segmenthostmappings table in the database. Additional info: When running the reproducer script with the delay of 10 seconds between each agent update there is no issue. ------------------------------------------------------------------------------------------------------------ Run script with the delay, time.sleep(10), enabled. Results: * No StaleDataError * 3 attempts to add the host to placemenb/host-aggregate. MariaDB [neutron]> SELECT * FROM segmenthostmappings; +--------------------------------------+---------------------------------+ | segment_id | host | +--------------------------------------+---------------------------------+ | 11b9258f-8712-43b7-8f39-3eab627a8c7f | host0 | | 11b9258f-8712-43b7-8f39-3eab627a8c7f | host1 | | 11b9258f-8712-43b7-8f39-3eab627a8c7f | host2 | | 11b9258f-8712-43b7-8f39-3eab627a8c7f | routed-devstack.lab.example.com | | 89f96bee-424c-4ee2-8639-2ca8e07a70e6 | host0 | | 89f96bee-424c-4ee2-8639-2ca8e07a70e6 | host1 | | 89f96bee-424c-4ee2-8639-2ca8e07a70e6 | host2 | | 89f96bee-424c-4ee2-8639-2ca8e07a70e6 | routed-devstack.lab.example.com | | a7a7d2f4-c809-4ebb-916f-930c97fbec47 | host0 | | a7a7d2f4-c809-4ebb-916f-930c97fbec47 | host1 | | a7a7d2f4-c809-4ebb-916f-930c97fbec47 | host2 | | a7a7d2f4-c809-4ebb-916f-930c97fbec47 | routed-devstack.lab.example.com | +--------------------------------------+---------------------------------+ Conclution: * 3x successfull _update_segment_host_mapping after_create. ** NOTE: ** The RESP BODY: {"itemNotFound": {"message": "Compute host host1 could not be found.", "code": 404}} errors in the logs is expected, the fake host is not in Nova, so this is expeced.
Fix proposed here: https://review.openstack.org/#/c/534449/
Merged upstream.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086