Bug 1788536
| Summary: | spine/leaf DCN deployments require quoted storage network overrides | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Yuri Obshansky <yobshans> | ||||
| Component: | Ceph-Ansible | Assignee: | Guillaume Abrioux <gabrioux> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | Vasishta <vashastr> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 4.0 | CC: | aschoen, ceph-eng-bugs, emacchi, gabrioux, gfidente, gmeno, johfulto, mburns, nthomas, pasik, pgrist, slinaber, ykaul | ||||
| Target Milestone: | rc | Keywords: | Triaged | ||||
| Target Release: | 5.* | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-03-23 16:06:37 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1760354, 1802774 | ||||||
| Attachments: |
|
||||||
|
Description
Yuri Obshansky
2020-01-07 13:04:56 UTC
You had the following in your parameters for ceph-ansible: cluster_network: 172.18.1.0/24,172.18.2.0/24 public_network: 172.23.1.0/24,172.23.2.0/24 monitor_address_block: 172.23.1.0/24,172.23.2.0/24 for example: [stack@site-undercloud-0 dcn1]$ sudo grep monitor_address_block /var/lib/mistral/config-download-latest/ceph-ansible/group_vars/all.yml monitor_address_block: 172.23.1.0/24,172.23.2.0/24 [stack@site-undercloud-0 dcn1]$ As per the docs [1] they need to be passed with CephAnsibleExtraConfig to be overridden and then quoted. I added the following to your internal.yaml: CephAnsibleExtraConfig: cluster_network: '172.18.1.0/24,172.18.2.0/24' public_network: '172.23.1.0/24,172.23.2.0/24' monitor_address_block: '172.23.1.0/24,172.23.2.0/24' You had put CephAnsibleExtraConfig in nodes_data.yaml but you may only use this parameter once and it was already in your internal.yaml to set 'is_hci: true'so that's where I put it. I then ran a stack update. Your overcloud then failed with a new error message because the error in bug you reported was no longer happening [2]. The new error happened becasuse your host doesn't have the desired '172.23' or '172.18' IPs on it [3]. This however is not a ceph-ansible bug. It's a problem you're having with assigning the correct IPs to your hosts. When you determine what the correct IP should be on your host, quote that IP and override it as I have described above. It also looks like we need a doc bug for getting that in. Harold, who worked on bug 1740283, modified ceph-ansible during the 16 cycle so it would support these quoted values [4] you just need to quote them once you correctly configure your deployment to assign them. [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/spine_leaf_networking/index#assigning-routes-for-roles [2] "ok: [dcn1-computehci1-0] => (item=dcn1-computehci1-0) => changed=false ", " _monitor_addresses: '[{''name'': ''dcn1-computehci1-0'', ''addr'': AnsibleUndefined}]'", " item: dcn1-computehci1-0", "ok: [dcn1-computehci1-1] => (item=dcn1-computehci1-0) => changed=false ", "fatal: [dcn1-computehci1-0]: FAILED! => ", " msg: 'Unexpected templating type error occurred on ({{ _monitor_addresses | default([]) + [{ ''name'': item, ''addr'': hostvars[item][''ansible_all_ipv4_addresses''] | ips_in_ran ges(hostvars[item][''monitor_address_block''].split('','')) | first }] }}): must be str, not list'", "ok: [dcn1-computehci1-2] => (item=dcn1-computehci1-0) => changed=false ", "fatal: [dcn1-computehci1-1]: FAILED! => ", [3] 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:8b:2e:71 brd ff:ff:ff:ff:ff:ff inet 192.168.34.89/24 brd 192.168.34.255 scope global dynamic noprefixroute ens3 valid_lft 78942sec preferred_lft 78942sec inet6 fe80::5054:ff:fe8b:2e71/64 scope link valid_lft forever preferred_lft forever 3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:94:50:e1 brd ff:ff:ff:ff:ff:ff inet 172.16.20.66/24 brd 172.16.20.255 scope global dynamic noprefixroute ens4 valid_lft 2503sec preferred_lft 2503sec inet6 fe80::7beb:692b:fc54:fdd4/64 scope link noprefixroute valid_lft forever preferred_lft forever 4: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:83:ef:3c brd ff:ff:ff:ff:ff:ff inet 10.0.20.69/24 brd 10.0.20.255 scope global dynamic noprefixroute ens5 valid_lft 2759sec preferred_lft 2759sec inet6 2620:52:0:13b8::fe:63/128 scope global dynamic noprefixroute valid_lft 1985sec preferred_lft 1985sec inet6 fe80::b5b1:adc4:16af:f585/64 scope link noprefixroute valid_lft forever preferred_lft forever [4] https://github.com/ceph/ceph-ansible/commit/e695efcaf79909e2237197fd473117930e8d83e5#diff-d53302523567dc01b57c06bb371f1e3d New Summary after RCA: The Storage and StorageMgmt networks passed to ceph-ansible in spine/leaf deployments are passed as a list: public_network: 172.23.1.0/24,172.23.2.0/24 As per the error message in #1, ceph-ansible cannot parse the above. The workaround is to determine the appropriate network ceph-ansible should use and then pass it as an override and use quotes. CephAnsibleExtraConfig: public_network: '172.23.1.0/24,172.23.2.0/24' Though quoting was the recommended and documented method in the past, it should no longer be necessary in OSP16. The goal of this bug is to either modify ceph-ansible so it can manage the non-quoted value [1] or for TripleO to quote the data before it is passed to ceph-ansible. The next step is for the ceph-ansible team to provide input on which of the above options we should pursue (hence the needinfo to gabrioux) resetting product as it's ceph-ansible which requires the quotes. We documented the workaround on the openstack side for now in chapter 2 https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html-single/deploying_distributed_compute_nodes_with_separate_heat_stacks/index#proc_designing-your-separate-heat-stacks-deployment |