Bug 2016496

Summary: OSDs do not start during cephadm deployment with IPv6 unless user does workaround
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: John Fulton <johfulto>
Component: CephadmAssignee: Sebastian Wagner <sewagner>
Status: CLOSED WORKSFORME QA Contact: Sunil Kumar Nagaraju <sunnagar>
Severity: low Docs Contact: Karen Norteman <knortema>
Priority: unspecified    
Version: 5.0CC: gfidente, nojha
Target Milestone: ---   
Target Release: 5.2   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-17 12:06:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1820257    
Attachments:
Description Flags
output of "journalctl -u ceph-3ac0e06a-470c-11ec-babe-52540007205b@osd.0" none

Description John Fulton 2021-10-21 18:08:34 UTC
When using IPv6 for my public and cluster network my mon is able to bootstrap (if I use the fix from BZ 2002639). However I end up with 0 OSDs when I follow the standard procedure to add an OSD host by running a `ceph orch` command. Using journalctl of the OSD's systemd_unit shows it looking for an IPv4 address in an IPv6 network:

"unable to find any IPv4 address in networks 'fd00:fd00:fd00:3000::/64' interfaces"

I can work around this by running the following before adding any OSD hosts:

 ceph config set osd ms_bind_ipv4 false

Though we have a workaround, the need to do the above breaks automated deployments. Either cephadm should take care of the above for the user or pick_address.cc should detect that it has an IPv6 address and do the right thing without requiring the user to set ms_bind_ipv4 false.

Comment 1 John Fulton 2021-10-21 18:15:18 UTC
This bug has broken OSP17 IPv6 CI jobs with Ceph.

We have a workaround on the tripleo/director side [1] but we'd rather not merge a workaround into TripleO and request that Ceph fix this issue.

[1] https://review.opendev.org/c/openstack/tripleo-ansible/+/814064

Comment 2 Sebastian Wagner 2021-11-04 13:29:02 UTC
> unable to find any IPv4 address in networks 'fd00:fd00:fd00:3000::/64' interfaces ''

comes from pick_address.cc. Neha, do you think we can fix this there?

Comment 7 John Fulton 2021-11-16 19:02:40 UTC
Created attachment 1842147 [details]
output of "journalctl -u ceph-3ac0e06a-470c-11ec-babe-52540007205b"

Comment 8 John Fulton 2021-11-16 20:06:14 UTC
Setting ms_bind options automatically (in pick_address.cc, cephadm, or tripleo) based on detection of the environment is non-trivial and if done incorrectly could break dual stack support [1]. TripleO [2] with ceph-ansible [3] didn't have dual stack support for Ceph (pick either v4 or v6). cephadm shouldn't break dual stack support in order to go back to how it was. Instead, tripleo, like cephadm, shouldn't block dual stack and instead it should allow the user to pick the v4 and v6 binding options too.

[1] https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/#ipv4-ipv6-dual-stack-mode
[2]
https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ceph-ansible/ceph-base.yaml#L598-L602
[3]
https://github.com/ceph/ceph-ansible/blob/83a8dd5a6a1f9ffe43b2a75d7b49775e34c58f24/roles/ceph-config/templates/ceph.conf.j2#L10-L13

Comment 9 John Fulton 2021-11-17 12:06:43 UTC
If you want to make pick_address.cc set the ms_bind options automatically based on detection of the environment, then feel free to re-open this bug. Otherwise OpenStack users can deploy OSDs in v6 (only) by using the following:

parameter_defaults:
  CephConfigOverrides:
    global:
      ms_bind_ipv4: false
      ms_bind_ipv6: true