Discovered via tempest test tempest.api.compute.servers.test_attach_interfaces.AttachInterfacesUnderV243Test.test_add_remove_fixed_ip. This test adds SECOND fixed IP to existing VM, and then waits for it to appear in list of it's IPs (the ".../servers/uuid/ips" request). Issue seems to be, that it's waiting for just count of IPs increasing (by exactly 1 before it consideres action to be done). But the IP which can appear there may be Floating IP. (The on added there during resource preparation at the beginning of test, see side-note at bottom.) Appears at least in OSP15: > openstack-tempest.noarch 1:21.0.0-0.20190726140456.702b21c.el8ost @rhelosp-15.0-trunk > python3-tempest.noarch 1:21.0.0-0.20190726140456.702b21c.el8ost @rhelosp-15.0-trunk There seems to be active wait-until loop, which checks the count of IPs, and in rare case tries to eliminate the floating IPs from counting (which could potentially prevent the issue). Problem is that it goes for eliminating Floating IP's from count only in that "rare" case, when more then on IP appeared during the check. > wait_until_+1_ip_is_there: > get_list_of_ips > if count not changed: > return false > elif count is +1: # this gets fullfilled by even floating IP appearing > return true # and exits here > filter_out_floating_ips_block_here # this is never reached in case "just" floating poped up and not both fixed and floating at once > return count is +1 To me it seems could be addressed by simple fix, by just removing the first 'count is +1 => return true' part. That way it will always go into block for eliminating floating IPs from counting. Given that floating IPs accounting is needs more expensive requests then just list of IPs, it could be slightly optimized by keeping also seen_ips_counter, updated every iteration before eliminating Floating IPs. and also doing quick exit 'return False' if count == seen_ips_counter. That way it should do expensive request / filtering only when any new IP pops in. (Or keep a set() of actual IPs and re-filter/evaluate when it differs from set(IPs) obtained by the first /servers/uuid/ips list.) (Side note: it can take even about 30-40 seconds since the original Server creation from the beginning of the test, before that Floating IP appears in the '/servers/uuid/ips' list, which happened to be in the middle of this test case, and so confusing it).
Created attachment 1615516 [details] log of test_add_remove_fixed_ip failing
The Fixed in version build won't make it to rhos-15 .. it's fixed in tempest 24 in higher rhos versions, f.e. 16.1's build openstack-tempest-24.0.0-0.20200615163500.c73e6b1.el8ost
The patch should have resolved the race condition in the test from the code logic perspective, moreover the issue hasn't occurred again lately - it didn't occur during my testing as well. The fix is part of the Fixed in version package which is available in rhos-16.1 repo via the latest symlink.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.3 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:5413