Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2273947

Summary: [17.1] Ephemeral heat fails resolving rabbitmq without using fqdn
Product: Red Hat OpenStack Reporter: Martin Schuppert <mschuppe>
Component: osp-director-operator-containerAssignee: Martin Schuppert <mschuppe>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 17.1 (Wallaby)CC: dhughes, jschluet, mariel, mburns, pkomarov
Target Milestone: z4Keywords: Triaged
Target Release: 17.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: osp-director-operator-container-1.3.1-14 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2273948 2273950 (view as bug list) Environment:
Last Closed: 2024-05-29 19:51:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2273948, 2273950    

Description Martin Schuppert 2024-04-08 10:09:38 UTC
Description of problem:

when ephemeral heat gets a DNS REFUSED, it stops resolution, and it doesn't continue with using a search domain.
One way to get around this would be by using the actual full DNS name of the service: rabbitmq-default.openstack.svc.cluster.local. to the messaging transport_url (i.e. add the .cluster.local to make it fully qualified, and not depend on recursive resolution to return NXDOMAIN)

Version-Release number of selected component (if applicable):
17.1

Comment 1 Martin Schuppert 2024-04-08 10:17:24 UTC
some more details:

logs from heat when seen the issue:
~~~
2024-03-19 12:05:15.963 98 ERROR oslo.messaging._drivers.impl_rabbit [req-dc61c685-e410-4453-8ae9-639605d7882c admin admin - - -] Connection failed: [Errno -2] Name or service not known (retrying in 1.0 seconds): socket.gaierror: [Errno -2] Name or service not known
2024-03-19 12:05:16.977 98 ERROR oslo.messaging._drivers.impl_rabbit [req-dc61c685-e410-4453-8ae9-639605d7882c admin admin - - -] Connection failed: [Errno -2] Name or service not known (retrying in 3.0 seconds): socket.gaierror: [Errno -2] Name or service not known
~~~

using TCPdump we see
~~~
20:47:29.197760 IP 10.131.1.95.41893 > 172.30.0.10.53: 3477+ AAAA? rabbitmq-default.openstack.svc. (48)
20:47:29.202793 IP 172.30.0.10.53 > 10.131.1.95.41893: 3477 Refused- 0/0/0 (48)
20:47:29.203645 IP 10.131.1.95.54084 > 172.30.0.10.53: 44480+ A? rabbitmq-default.openstack.svc. (48)
20:47:29.204775 IP 172.30.0.10.53 > 10.131.1.95.54084: 44480 Refused- 0/0/0 (48)
20:47:32.720715 IP 10.131.1.95.52489 > 172.30.0.10.53: 41196+ AAAA? rabbitmq-default.openstack.svc. (48)
20:47:32.721715 IP 172.30.0.10.53 > 10.131.1.95.52489: 41196 Refused- 0/0/0 (48)
20:47:32.722360 IP 10.131.1.95.50224 > 172.30.0.10.53: 33821+ A? rabbitmq-default.openstack.svc. (48)
20:47:32.723457 IP 172.30.0.10.53 > 10.131.1.95.50224: 33821 Refused- 0/0/0 (48)
~~~


/etc/resolv.conf is correct configured with all search domains. Also manually testing works
~~~
sh-5.1$ curl -vv http://rabbitmq-default.openstack.svc:5672/
*   Trying 172.30.237.99:5672...
* Connected to rabbitmq-default.openstack.svc (172.30.237.99) port 5672 (#0)
> GET / HTTP/1.1
> Host: rabbitmq-default.openstack.svc:5672
> User-Agent: curl/7.76.1
> Accept: */*
>
* Received HTTP/0.9 when not allowed
* Closing connection 0
curl: (1) Received HTTP/0.9 when not allowed
~~~

Comment 13 errata-xmlrpc 2024-05-29 19:51:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenStack Platform 17.1 director Operator container images security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:2728