Bug 2016496 - OSDs do not start during cephadm deployment with IPv6 unless user does workaround
Summary: OSDs do not start during cephadm deployment with IPv6 unless user does workar...
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Cephadm
Version: 5.0
Hardware: Unspecified
OS: Linux
Target Milestone: ---
: 5.2
Assignee: Sebastian Wagner
QA Contact: Sunil Kumar Nagaraju
Karen Norteman
Depends On:
Blocks: 1820257
TreeView+ depends on / blocked
Reported: 2021-10-21 18:08 UTC by John Fulton
Modified: 2021-11-17 12:06 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2021-11-17 12:06:43 UTC

Attachments (Terms of Use)
output of "journalctl -u ceph-3ac0e06a-470c-11ec-babe-52540007205b@osd.0" (27.95 KB, text/plain)
2021-11-16 19:02 UTC, John Fulton
no flags Details

System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 52867 0 None None None 2021-10-21 18:08:33 UTC
Red Hat Issue Tracker RHCEPH-2078 0 None None None 2021-10-21 18:09:51 UTC
Red Hat Knowledge Base (Solution) 6427121 0 None None None 2021-10-21 18:08:33 UTC

Description John Fulton 2021-10-21 18:08:34 UTC
When using IPv6 for my public and cluster network my mon is able to bootstrap (if I use the fix from BZ 2002639). However I end up with 0 OSDs when I follow the standard procedure to add an OSD host by running a `ceph orch` command. Using journalctl of the OSD's systemd_unit shows it looking for an IPv4 address in an IPv6 network:

"unable to find any IPv4 address in networks 'fd00:fd00:fd00:3000::/64' interfaces"

I can work around this by running the following before adding any OSD hosts:

 ceph config set osd ms_bind_ipv4 false

Though we have a workaround, the need to do the above breaks automated deployments. Either cephadm should take care of the above for the user or pick_address.cc should detect that it has an IPv6 address and do the right thing without requiring the user to set ms_bind_ipv4 false.

Comment 1 John Fulton 2021-10-21 18:15:18 UTC
This bug has broken OSP17 IPv6 CI jobs with Ceph.

We have a workaround on the tripleo/director side [1] but we'd rather not merge a workaround into TripleO and request that Ceph fix this issue.

[1] https://review.opendev.org/c/openstack/tripleo-ansible/+/814064

Comment 2 Sebastian Wagner 2021-11-04 13:29:02 UTC
> unable to find any IPv4 address in networks 'fd00:fd00:fd00:3000::/64' interfaces ''

comes from pick_address.cc. Neha, do you think we can fix this there?

Comment 7 John Fulton 2021-11-16 19:02:40 UTC
Created attachment 1842147 [details]
output of "journalctl -u ceph-3ac0e06a-470c-11ec-babe-52540007205b"

Comment 8 John Fulton 2021-11-16 20:06:14 UTC
Setting ms_bind options automatically (in pick_address.cc, cephadm, or tripleo) based on detection of the environment is non-trivial and if done incorrectly could break dual stack support [1]. TripleO [2] with ceph-ansible [3] didn't have dual stack support for Ceph (pick either v4 or v6). cephadm shouldn't break dual stack support in order to go back to how it was. Instead, tripleo, like cephadm, shouldn't block dual stack and instead it should allow the user to pick the v4 and v6 binding options too.

[1] https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/#ipv4-ipv6-dual-stack-mode

Comment 9 John Fulton 2021-11-17 12:06:43 UTC
If you want to make pick_address.cc set the ms_bind options automatically based on detection of the environment, then feel free to re-open this bug. Otherwise OpenStack users can deploy OSDs in v6 (only) by using the following:

      ms_bind_ipv4: false
      ms_bind_ipv6: true

Note You need to log in before you can comment on or make changes to this bug.