Bug 2218148
| Summary: | Designate DNS - "test_create_and_update_zone" scenario test fails on IPV6 setup | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Arkady Shtempler <ashtempl> |
| Component: | openstack-designate | Assignee: | Brent Eagles <beagles> |
| Status: | ASSIGNED --- | QA Contact: | Bruna Bonguardo <bbonguar> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 17.1 (Wallaby) | CC: | beagles, michjohn, scohen |
| Target Milestone: | z2 | Keywords: | Triaged |
| Target Release: | 17.1 | Flags: | ifrangs:
needinfo?
(beagles) |
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I found the relevant worker logs: 2023-06-28 09:28:41.632 7 DEBUG designate.worker.tasks.zone [req-1bd117da-016f-4b68-b48a-5f6e748d5e54 - - - - -] Found serial 1687944380 on 2620:52:0:13b8:5054:ff:fe3e:a9 for zone rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test. __call__ /usr/lib/python3.9/site-packages/designate/worker/tasks/zone.py:334 2023-06-28 09:28:41.634 7 DEBUG designate.worker.tasks.zone [req-1bd117da-016f-4b68-b48a-5f6e748d5e54 - - - - -] Found serial 1687944343 on 2620:52:0:13b8:5054:ff:fe3e:41 for zone rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test. __call__ /usr/lib/python3.9/site-packages/designate/worker/tasks/zone.py:334 2023-06-28 09:28:41.635 7 DEBUG designate.worker.tasks.zone [req-1bd117da-016f-4b68-b48a-5f6e748d5e54 - - - - -] Found serial 1687944380 on 2620:52:0:13b8:5054:ff:fe3e:9e for zone rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test. __call__ /usr/lib/python3.9/site-packages/designate/worker/tasks/zone.py:334 2023-06-28 09:28:41.635 7 DEBUG designate.worker.processing [req-1bd117da-016f-4b68-b48a-5f6e748d5e54 - - - - -] Finished Tasks ['PollForZone', 'PollForZone', 'PollForZone'] in 0.007219s run /usr/lib/python3.9/site-packages/designate/worker/processing.py:86 2023-06-28 09:28:41.636 7 DEBUG designate.worker.tasks.zone [req-1bd117da-016f-4b68-b48a-5f6e748d5e54 - - - - -] Results for polling rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test.-1687944380: DNSQueryResult(positives=2, no_zones=0, consensus_serial=1687944380, results=[1687944343, 1687944380, 1687944380]) parse_query_results /usr/lib/python3.9/site-packages/designate/worker/tasks/zone.py:299 2023-06-28 09:28:41.636 7 DEBUG designate.worker.tasks.zone [req-1bd117da-016f-4b68-b48a-5f6e748d5e54 - - - - -] Unsuccessful poll for rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test. on attempt 10 _do_poll /usr/lib/python3.9/site-packages/designate/worker/tasks/zone.py:407 2023-06-28 09:28:56.652 7 DEBUG designate.storage.impl_sqlalchemy [req-1bd117da-016f-4b68-b48a-5f6e748d5e54 - - - - -] Fetched zone <Zone id:'e974ba81-c2fd-410e-8677-076f49faaebe' type:'PRIMARY' name:'rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test.' pool_id:'794ccc2c-d751-44fe-b57f-8894c9f5c842' serial:'1687944380' action:'UPDATE' status:'PENDING'> _find_zones /usr/lib/python3.9/site-packages/designate/storage/impl_sqlalchemy/__init__.py:252 2023-06-28 09:28:56.653 7 INFO designate.worker.tasks.zone [req-1bd117da-016f-4b68-b48a-5f6e748d5e54 - - - - -] Could not find 1687944380 for rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test. on enough nameservers. 2023-06-28 09:28:56.653 7 DEBUG designate.worker.tasks.zone [req-1bd117da-016f-4b68-b48a-5f6e748d5e54 - - - - -] Updating status for rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test. to ERROR:UPDATE __call__ /usr/lib/python3.9/site-packages/designate/worker/tasks/zone.py:517 It looks like one out of the three BIND instances has not had a successful zone transfer to get serial 1687944380, this is why the zone was marked into ERROR and the test failed. Next I will dig in the mdns logs to see if there is a hint as to why that one BIND instance is not properly AXFR. Ok, this seems to be a source address issue.
So, in the log above you can see that BIND instance on 2620:52:0:13b8:5054:ff:fe3e:41 still has the old serial number: 1687944343 where the other two instances have the correct, updated serial number: 1687944380
Looking in the BIND logs I see the zone definition:
28-Jun-2023 09:25:44.379 general: info: received control channel command 'addzone rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test { type slave; masters { 2620:52:0:13b8:5054:ff:fe3e:8d port 16000; 2620:52:0:13b8:5054:ff:fe3e:8d port 16002; 2620:52:0:13b8:5054:ff:fe3e:8d port 16001;}; file "slave.rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test.e974ba81-c2fd-410e-8677-076f49faaebe"; };'
It lists the zone masters (Designate mini-DNS) as having IPs: 2620:52:0:13b8:5054:ff:fe3e:8d port 16000; 2620:52:0:13b8:5054:ff:fe3e:8d port 16002; 2620:52:0:13b8:5054:ff:fe3e:8d port 16001;
However, when the worker/mini-dns is sending the notify messages the source IP is different:
28-Jun-2023 09:26:21.393 notify: info: client @0x7f6b68044b58 2620:52:0:13b8:5054:ff:fe3e:9e#45612: received notify for zone 'rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test'
28-Jun-2023 09:26:21.393 general: info: zone rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test/IN: refused notify from non-master: 2620:52:0:13b8:5054:ff:fe3e:9e#45612
This causes the notify to be rejected and BIND to not trigger an AXFR.
On controller-0, one that did have the correct serial number, the notify also got rejected BUT the zone refresh timer happened to also expire in time:
28-Jun-2023 09:26:21.307 notify: info: client @0x7f5e100104c8 2620:52:0:13b8:5054:ff:fe3e:9e#54837: received notify for zone 'rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test'
28-Jun-2023 09:26:21.307 general: info: zone rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test/IN: refused notify from non-master: 2620:52:0:13b8:5054:ff:fe3e:9e#54837
Then the refresh interval expired:
28-Jun-2023 09:26:51.214 general: info: zone rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test/IN: refresh: retry limit for master 2620:52:0:13b8:5054:ff:fe3e:8d#16002 exceeded (source ::#0)
28-Jun-2023 09:26:51.214 xfer-in: info: zone rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test/IN: Transfer started.
28-Jun-2023 09:26:51.214 xfer-in: info: transfer of 'rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test/IN' from 2620:52:0:13b8:5054:ff:fe3e:8d#16002: connected using 2620:52:0:13b8:5054:ff:fe3e:a9#49965
28-Jun-2023 09:26:51.239 xfer-in: info: zone rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test/IN: transferred serial 1687944380
28-Jun-2023 09:26:51.239 xfer-in: info: transfer of 'rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test/IN' from 2620:52:0:13b8:5054:ff:fe3e:8d#16002: Transfer status: success
So, there is a mis-match with the expected IP addresses and those that are being used for the notify messages.
So, looking at this deeper, I see the following:
The allow-notify list on controller-0 has:
allow-notify { 2620:52:0:13b8:5054:ff:fe3e:a9; 2620:52:0:13b8:5054:ff:fe3e:1; 2620:52:0:13b8:5054:ff:fe3e:8b; 2620:52:0:13b8:5054:ff:fe3e:94; };
But if you look at controller-2, the br-ex has two IPs:
7: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 52:54:00:83:75:a8 brd ff:ff:ff:ff:ff:ff
inet6 2620:52:0:13b8:5054:ff:fe3e:41/64 scope global
valid_lft forever preferred_lft forever
inet6 2620:52:0:13b8:5054:ff:fe3e:94/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe83:75a8/64 scope link
valid_lft forever preferred_lft forever
The worker sent the notify from the :41 address, where the :94 address is in the allow-notify list.
|
It looks like we have a problem to Update zone on IPV6 setup. When I tried to run the scenario tests on IPV6 setup (RHOS-17.1-RHEL-9-20230621.n.1), I've found that: "designate_tempest_plugin.tests.scenario.v2.test_zones.ZonesTest.test_create_and_update_zone" is permanently fails with the same error (Zone gets into the ERROR status) designate_tempest_plugin.tests.scenario.v2.test_zones.ZonesTest.test_create_and_update_zone[id-cabd6334-ba37-11ec-9d8c-201e8823901f,slow] ----------------------------------------------------------------------------------------------------------------------------------------- Captured traceback: ~~~~~~~~~~~~~~~~~~~ Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/designate_tempest_plugin/tests/scenario/v2/test_zones.py", line 118, in test_create_and_update_zone self.client.update_zone( File "/usr/lib/python3.9/site-packages/designate_tempest_plugin/services/dns/json/base.py", line 39, in wrapper return f(*args, **kwargs) File "/usr/lib/python3.9/site-packages/designate_tempest_plugin/services/dns/v2/json/zones_client.py", line 184, in update_zone waiters.wait_for_zone_status(self, body['id'], wait_until) File "/usr/lib/python3.9/site-packages/designate_tempest_plugin/common/waiters.py", line 77, in wait_for_zone_status raise exceptions.InvalidStatusError('Zone', zone_id, designate_tempest_plugin.common.exceptions.InvalidStatusError: Zone with ID e974ba81-c2fd-410e-8677-076f49faaebe returned unexpected status ERROR I saw ERROR messages in Designate's logs, but I couldn't find the "root cause" ### Controller-0 ### /var/log/containers/designate/central.log 261:2023-06-28 09:28:56.671 9 DEBUG designate.central.service [req-fb348bed-15ec-4353-8362-7a8e44331ff8 64013221ce53447cb28b377321350fc1 80e9b6b7b4f54c0caf83b1dde3eec080 - - -] Setting zone e974ba81-c2fd-410e-8677-076f49faaebe, serial 1687944380: action UPDATE, status ERROR _update_zone_status /usr/lib/python3.9/site-packages/designate/central/service.py:2676 262:2023-06-28 09:28:56.671 9 DEBUG designate.storage.impl_sqlalchemy [req-fb348bed-15ec-4353-8362-7a8e44331ff8 64013221ce53447cb28b377321350fc1 80e9b6b7b4f54c0caf83b1dde3eec080 - - -] Updating zone <Zone id:'e974ba81-c2fd-410e-8677-076f49faaebe' type:'PRIMARY' name:'rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test.' pool_id:'794ccc2c-d751-44fe-b57f-8894c9f5c842' serial:'1687944380' action:'UPDATE' status:'ERROR'> update_zone /usr/lib/python3.9/site-packages/designate/storage/impl_sqlalchemy/__init__.py:300 269:2023-06-28 09:28:57.171 8 DEBUG designate.storage.impl_sqlalchemy [req-7c64f21b-9bc3-440f-abb0-ce2a38f5d287 64013221ce53447cb28b377321350fc1 80e9b6b7b4f54c0caf83b1dde3eec080 - - -] Fetched zone <Zone id:'e974ba81-c2fd-410e-8677-076f49faaebe' type:'PRIMARY' name:'rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test.' pool_id:'794ccc2c-d751-44fe-b57f-8894c9f5c842' serial:'1687944380' action:'UPDATE' status:'ERROR'> _find_zones /usr/lib/python3.9/site-packages/designate/storage/impl_sqlalchemy/__init__.py:252 ### Controller-1 ### /var/log/containers/designate/designate-api.log 70:2023-06-28 09:28:57.138 6 INFO designate.api.v2.controllers.zones [req-02b71775-ff6b-4c5b-9fc1-7d6cf018d945 64013221ce53447cb28b377321350fc1 80e9b6b7b4f54c0caf83b1dde3eec080 - - -] Retrieved <Zone id:'e974ba81-c2fd-410e-8677-076f49faaebe' type:'PRIMARY' name:'rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test.' pool_id:'794ccc2c-d751-44fe-b57f-8894c9f5c842' serial:'1687944380' action:'UPDATE' status:'ERROR'> ### Controller-2 ### /var/log/containers/designate/central.log 197:2023-06-28 09:28:57.132 9 DEBUG designate.storage.impl_sqlalchemy [req-02b71775-ff6b-4c5b-9fc1-7d6cf018d945 64013221ce53447cb28b377321350fc1 80e9b6b7b4f54c0caf83b1dde3eec080 - - -] Fetched zone <Zone id:'e974ba81-c2fd-410e-8677-076f49faaebe' type:'PRIMARY' name:'rand-create_and_update_zone-1127437962.rand-ZonesTest-1775534532.test.' pool_id:'794ccc2c-d751-44fe-b57f-8894c9f5c842' serial:'1687944380' action:'UPDATE' status:'ERROR'> _find_zones /usr/lib/python3.9/site-packages/designate/storage/impl_sqlalchemy/__init__.py:252 ### Notes ### 1) "Create" and "Delete" (with propagation check) are PASS: [stack@undercloud-0 tempest-dir]$ stestr run designate_tempest_plugin.tests.scenario.v2.test_zones.ZonesTest.test_zone_create_propagates_to_nameservers {0} designate_tempest_plugin.tests.scenario.v2.test_zones.ZonesTest.test_zone_create_propagates_to_nameservers [44.555598s] ... ok [stack@undercloud-0 tempest-dir]$ stestr run designate_tempest_plugin.tests.scenario.v2.test_zones.ZonesTest.test_zone_delete_propagates_to_nameservers {0} designate_tempest_plugin.tests.scenario.v2.test_zones.ZonesTest.test_zone_delete_propagates_to_nameservers [45.515440s] ... ok 2) To be able to run the tests you'll need Lilach's fix: https://review.opendev.org/c/openstack/designate-tempest-plugin/+/886006