Bug 2088526
| Summary: | DistributedComputeScaleOut node fails to deploy on a DCN site with storage | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Marian Krcmarik <mkrcmari> | ||||
| Component: | openstack-tripleo-heat-templates | Assignee: | Grzegorz Grasza <ggrasza> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Joe H. Rahme <jhakimra> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 17.0 (Wallaby) | CC: | abishop, dwilde, ggrasza, jkreger, johfulto, jschluet, jslagle, mburns, oblaut, ramishra, sbaker | ||||
| Target Milestone: | beta | Keywords: | Regression, Triaged | ||||
| Target Release: | 17.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | openstack-tripleo-heat-templates-14.3.1-0.20220719171711.feca772.el9ost | Doc Type: | No Doc Update | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2022-09-21 12:21:38 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Marian Krcmarik
2022-05-19 15:50:23 UTC
Looks like dcn-storage.yaml has ManageNetworks: False. Maybe the networks were not in network_data of the main stack or missing some fixes like https://review.opendev.org/c/openstack/tripleo-heat-templates/+/781572. In baremetal-deployment.yaml it looks like the DistributedComputeScaleOut networks list is missing the storage_mgmt network. Feel free to close this if that fixes the issue. (In reply to Steve Baker from comment #2) > In baremetal-deployment.yaml it looks like the DistributedComputeScaleOut > networks list is missing the storage_mgmt network. Feel free to close this > if that fixes the issue. That's intentional since the predefined role in tht does not have storage_mgmt network defined: https://github.com/openstack/tripleo-heat-templates/blob/master/roles/DistributedComputeScaleOut.yaml So my assumption is that there is no need for storage_mgmt network to be deployed on DistributedComputeScaleOut node. I'm not an expert in that specific architecture configuration, but wouldn't you still need storage network so you don't burden your primary network interfaces with the storage networking traffic? (In reply to Julia Kreger from comment #4) > I'm not an expert in that specific architecture configuration, but wouldn't > you still need storage network so you don't burden your primary network > interfaces with the storage networking traffic? Both roles have the "storage network" to unburden the primary interface. We're talking about which roles should have the storage_mgmt network which is another network for storage. (In reply to Marian Krcmarik from comment #3) > (In reply to Steve Baker from comment #2) > > In baremetal-deployment.yaml it looks like the DistributedComputeScaleOut > > networks list is missing the storage_mgmt network. Feel free to close this > > if that fixes the issue. > > That's intentional since the predefined role in tht does not have > storage_mgmt network defined: > https://github.com/openstack/tripleo-heat-templates/blob/master/roles/ > DistributedComputeScaleOut.yaml > So my assumption is that there is no need for storage_mgmt network to be > deployed on DistributedComputeScaleOut node. - DistributedComputeScaleOut has the storage network [1] - DistributedComputeHCIScaleOut has the storage network and the storage_mgmt network [2] - In Ceph terms, "storage network" is the "ceph public_network" and the storage_mgmt network is the "ceph cluster_network" [3] - You only need the storage_mgmt network if you are hosting OSDs so that they can replicate. - The main difference between DistributedComputeScaleOut and DistributedComputeHCIScaleOut is that the HCI one hosts OSDs. - Since DistributedComputeScaleOut does not host OSDs it shouldn't need the storage_mgmt network. Unless, the storage_mgmt network is being used for something other than just OSDs. The ServiceNetMap in the config-download directory could confirm which services are using which networks. [1] https://github.com/openstack/tripleo-heat-templates/blob/master/roles/DistributedComputeScaleOut.yaml#L15-L16 [2] https://github.com/openstack/tripleo-heat-templates/blob/master/roles/DistributedComputeHCIScaleOut.yaml#L16-L17 [3] https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/ Marian confirmed the following: - If the deployment is run without DistributedComputeScaleOut, then it doesn't hit this bug - If the DistributedComputeScaleOut node has the storage_mgmt network added, then it doesn't hit this bug (this can be a workaround until this BZ is closed) Also, we see TLS-e provided a certificate for each network, e.g. /etc/pki/tls/certs/haproxy/overcloud-haproxy-storage_mgmt.pem This script that failed is running on the DistributedComputeScaleOut node for each network in HAProxyNetworks [2] even if that network is not configured on the node. Maybe the line that provides the list for the loop [2] needs to be intersected with the list of actual networks that are configured on the running node. [1] https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/haproxy/haproxy-internal-tls-certmonger.j2.yaml#L172 I think John has explained the root cause, assigning to DFG:Security to look at changing "Certificate generation" to skip networks which are not configure on that node This is similar to, but not a duplicate of, BZ 2081698; solved by https://review.opendev.org/c/openstack/tripleo-heat-templates/+/841930/1/deployment/apache/apache-baremetal-puppet.j2.yaml Any updates to this BZ? Upstream Gerrit has been merged for quite some time, this BZ is still in NEW, though high severity? Upstream patch has merged on stable/wallaby. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543 |