Bug 1241131 - DNS server is not accessible by different overcloud hosts
Summary: DNS server is not accessible by different overcloud hosts
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: Director
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ga
: Director
Assignee: James Slagle
QA Contact: Ofer Blaut
URL:
Whiteboard:
Keywords: Triaged
Depends On:
Blocks: 1191185 1243520
TreeView+ depends on / blocked
 
Reported: 2015-07-08 14:11 UTC by Ofer Blaut
Modified: 2015-09-09 13:54 UTC (History)
13 users (show)

(edit)
Nodes would end up out of synchronization due to a lack of access to NTP servers. This was because not all nodes routed access to the required servers (NTP, DNS, etc). This fix sets the Undercloud as a gateway for non-Controller nodes. This provides non-Controller nodes with access to external services such as DNS and NTP, which aids synchronization.
Clone Of:
(edit)
Last Closed: 2015-08-05 13:58:48 UTC


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2015:1549 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform director Release 2015-08-05 17:49:10 UTC
Gerrithub.io 239966 None None None Never

Description Ofer Blaut 2015-07-08 14:11:03 UTC
Description of problem:

Following try to validate NTP server bug - https://bugzilla.redhat.com/show_bug.cgi?id=1233916#c17

It seems that if DNS server address is located on external or internal networks, some overcloud hosts will not reach it since not every host have external or internal network 



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.can't ping DNS server configured during cloud deploy from CEPH/Compute nodes 
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Mike Burns 2015-07-08 17:20:19 UTC
Dan, any ideas for how to resolve this?

Comment 3 Dan Sneddon 2015-07-09 21:06:20 UTC
One possible fix for this is a system management network. I'm tracking that in BZ 1240395.

There is an upstream patch for that bug which is in review: https://review.openstack.org/#/c/199800/

That probably won't make it into GA, but there is a slim chance.

Comment 4 Omri Hochman 2015-07-14 17:08:54 UTC
One of the bad outcomes of it is that ntpd on the nodes is not able to be sync correctly - so each node get different timing.

Comment 5 Dan Sneddon 2015-07-14 18:13:25 UTC
The latest on this is that Dan Prince is working on a patch which would allow the ctlplane to be used with static IPs and an external gateway. That would provide a way for nodes to reach NTP without having to rely on the undercloud as a SPOF.

Comment 6 Dan Sneddon 2015-07-14 19:29:01 UTC
(In reply to Dan Sneddon from comment #5)
> The latest on this is that Dan Prince is working on a patch which would
> allow the ctlplane to be used with static IPs and an external gateway. That
> would provide a way for nodes to reach NTP without having to rely on the
> undercloud as a SPOF.

Should read "reach DNS", but the same approach will also provide access to NTP.

Comment 7 Dan Prince 2015-07-15 13:48:33 UTC
When using network isolation we have the default gateway (which is likely used to access the DNS server) on different networks. For compute/ceph roles the default gateway is going to be on the ctlplane network. For the controller node it would be on the external network (public traffic).

So long as you're router (either the undercloud, or a real router) can route traffic from the ctlplane to the external network (where the external DNS server resides) I think this should work fine.

If however you put your DNS server on one of the isolated networks (internal_api, storage, etc) then it would only be accessible by select roles, unless again you've gone and put routes in place on your ctlplane router to handle this.

Could we just treat as a missing route issue? As in there needs to be a route added somewhere (either the undercloud or the gateway router) to handle this traffic?

Comment 8 Hugh Brock 2015-07-15 14:10:42 UTC
So the problem with comment 7 is:

* it requires a router, of some kind

* the router needs to not serve DHCP because neutron is going to do that

* there needs to be nothing else on the subnet behind the router, because neutron is going to be serving dhcp on it

I'm having a hard time imagining how we will get our *own* IT to let us set that up in the lab, much less explaining to a customer that that's what we need.

I think a better solution is probably to get all nodes onto a network which has external connectivity, but where we do not serve DHCP. What's wrong with just using the external API network for this? The controllers already use that network as a gateway if I'm not mistaken.

Alternatively we could define a gateway for the internal API network, but that seems a little screwy.

Comment 9 Hugh Brock 2015-07-15 14:29:42 UTC
OK, having said comment 8 ...

Is the least invasive solution here not simply setting the undercloud up as the external gateway for everything but the controllers? It's already routing traffic on that subnet and serving dhcp... I don't think we should leave it this way but I think we could ship with it.

Comment 10 James Slagle 2015-07-15 16:37:39 UTC
ok, i believe the only patch needed to set this up is to enable ip forwarding on the undercloud. i've linked Ben's patches for that.

Comment 14 errata-xmlrpc 2015-08-05 13:58:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549


Note You need to log in before you can comment on or make changes to this bug.