Bug 1301701 - parsing of bind address from mon_host fails when using ipv6
Summary: parsing of bind address from mon_host fails when using ipv6
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 1.3.2
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 1.3.4
Assignee: Brad Hubbard
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks: 1302593 1309822
TreeView+ depends on / blocked
 
Reported: 2016-01-25 17:54 UTC by Giulio Fidente
Modified: 2022-02-21 18:16 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-20 20:56:40 UTC
Embargoed:


Attachments (Terms of Use)
ceph.conf (505 bytes, text/plain)
2016-02-15 16:32 UTC, Giulio Fidente
no flags Details
ceph.conf.ipv4 (471 bytes, text/plain)
2016-02-15 16:56 UTC, Giulio Fidente
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-3310 0 None None None 2022-02-21 18:16:54 UTC

Description Giulio Fidente 2016-01-25 17:54:11 UTC
Description of problem:
when using IPv6, the IP addresses of the monitors set in mon_host should be surrounded by brackets, but parsing seems to fail if a single IP is provided unless a trailing comma is deliberately added

the following config setting will work:
  mon_host = [fd00:fd00:fd00:3000::15],[fd00:fd00:fd00:3000::13],[fd00:fd00:fd00:3000::14]

the following config setting will also work:
  mon_host = [fd00:fd00:fd00:3000::13],

while the following *will not* work:
  mon_host = [fd00:fd00:fd00:3000::13]

resulting in any 'ceph' command to fail with lines like the following when trying to reach the ceph-mon process, even though it is listening on the socket ipv6:6789 as intended:

  11:28:34.057317 7f7643d77700  0 -- :/1019138 >> [fd00:fd00:fd00:3000::13]:6789/0 pipe(0x7f7638008280 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7638012910).fault


Version-Release number of selected component (if applicable):
ceph-0.94.1-13.el7cp.x86_64


How reproducible:
configure a ceph-mon with in mon_host a single ipv6 address in brackets, start the ceph-mon process and try to query it from the cli

Comment 2 Giulio Fidente 2016-01-26 12:58:35 UTC
I have to amend my initial comment, adding a comma at the end of the config value does not break parsing but it does not make it parse correctly a single ipv6 mon_host either

Comment 3 Brad Hubbard 2016-02-04 06:30:19 UTC
I tried to look at this today but there were some problems with the repos and I couldn't install a machine.

Giulio, Could you let me know what commands are being run, and in what sequence, to reproduce this so I can investigate this further?

Comment 4 Brad Hubbard 2016-02-05 03:36:47 UTC
Everything on the local host.

$ rpm -q ceph
ceph-0.94.3-3.el7cp.x86_64

$ cat /etc/ceph/ceph.conf
[global]
fsid = 2dde4672-72cb-49b2-b00f-ef911d01e645
ms bind ipv6 = true
mon initial members = dell-per320-07
mon host = [2620:52:0:42d4:46a8:42ff:fe3c:8393]

$ sudo ceph-authtool --create-keyring /tmp/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
$ sudo ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring --gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow'
$ sudo ceph-authtool /tmp/ceph.mon.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring
$ monmaptool --create --add dell-per320-07 [2620:52:0:42d4:46a8:42ff:fe3c:8393] --fsid 2dde4672-72cb-49b2-b00f-ef911d01e645 /tmp/monmap
$ sudo mkdir /var/lib/ceph/mon/ceph-dell-per320-07
$ sudo ceph-mon --mkfs -i dell-per320-07 --monmap /tmp/monmap --keyring /tmp/ceph.mon.keyring
$ sudo touch /var/lib/ceph/mon/ceph-dell-per320-07/done
$ sudo ceph-mon -i dell-per320-07 -d --debug_ms 2 --debug_mon 20 --pid-file /var/run/ceph/mon.dell-per320-07.pid -c /etc/ceph/ceph.conf --cluster ceph

# netstat -tlpn|grep ceph-mon
tcp6       0      0 2620:52:0:42d4:46a:6789 :::*                    LISTEN      30111/ceph-mon
# ceph -s
    cluster 2dde4672-72cb-49b2-b00f-ef911d01e645
     health HEALTH_ERR
            64 pgs stuck inactive
            64 pgs stuck unclean
            no osds
     monmap e1: 1 mons at {dell-per320-07=[2620:52:0:42d4:46a8:42ff:fe3c:8393]:6789/0}
            election epoch 2, quorum 0 dell-per320-07
     osdmap e1: 0 osds: 0 up, 0 in
      pgmap v2: 64 pgs, 1 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                  64 creating

Seems I can deploy a mon manually okay with a single IP in "mon host".

Can you give me the steps to reproduce your issue Giulio?

Comment 5 Giulio Fidente 2016-02-15 16:28:11 UTC
the problem is that ceph-mon won't find itself in the list of hosts given in 'mon host' and it will bind on the first local ip on the public network, instead of what we pass in 'mon host'

this will be found in the ceph-mon logs

  0 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff), process ceph-mon, pid 6053
  0 mon.overcloud-controller-0 does not exist in monmap, will attempt to join an existing cluster
  0 using public_addr [fd00:fd00:fd00:3000::10]:0/0 -> [fd00:fd00:fd00:3000::10]:6789/0

while 'mon_host' in ceph.conf had:

mon_host = [fd00:fd00:fd00:3000::14]


this happens deploying via puppet-ceph which does not create a monmap file, instead after the keys are created the following commands are given to start the monitor:

# ceph-mon --mkfs --id overcloud-controller-0 --keyring /tmp/ceph-mon-keyring-overcloud-controller-0
# touch /var/lib/ceph/mon/ceph-overcloud-controller-0/done
# /etc/init.d/ceph start mon.overcloud-controller-0

which results in the following process being spawned:

/usr/bin/ceph-mon -i overcloud-controller-0 --pid-file /var/run/ceph/mon.overcloud-controller-0.pid -c /etc/ceph/ceph.conf --cluster ceph -f

Comment 6 Giulio Fidente 2016-02-15 16:32:38 UTC
Created attachment 1127337 [details]
ceph.conf

attaching the sample ceph.conf file used to trigger error described in commeent #5

Comment 7 Giulio Fidente 2016-02-15 16:56:44 UTC
Created attachment 1127358 [details]
ceph.conf.ipv4

similar ceph.conf file, using ipv4 addresses, behaving differently ... the ceph-mon logs will just report:

0 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff), process ceph-mon, pid 9649
0 starting mon.overcloud-controller-0 rank 0 at 172.16.21.13:6789/0 mon_data /var/lib/ceph/mon/ceph-overcloud-controller-0 fsid 38fa466e-d402-11e5-9bca-525400be2cb1

Comment 8 Brad Hubbard 2016-02-17 05:34:39 UTC
Giulio,

Do you think a feature request for puppet-ceph to be able to precreate a monmap and pass it to the ceph-mon process might be appropriate here? Should we open this to discussion at https://bugs.launchpad.net/puppet-ceph ? Or are you convinced this is a Ceph bug?

Comment 9 Giulio Fidente 2016-02-17 08:11:08 UTC
hi Brad,

I've submitted a change for puppet-ceph to make it possible to set the 'public_addr' key in the monitor stanza [1]; we're using this feature in tripleo now and it works as expected, the mon daemon binds on the ip set by 'public_addr'

what seems a problem in ceph instead is that the same configuration behaves differently in IPv4 vs IPv6; it looks like the matching of its own local ip address against the list in 'mon_host' works only in IPv4 and in that case the use of 'public_addr' is not required, the bind address will be the one found in the 'mon_host' list

maybe this is an unwanted functionality or some side effect and use of public_addr should always be preferred?

1. https://review.openstack.org/#/c/280351/

Comment 10 Brad Hubbard 2016-02-17 23:07:32 UTC
(In reply to Giulio Fidente from comment #9)
> hi Brad,
> 
> I've submitted a change for puppet-ceph to make it possible to set the
> 'public_addr' key in the monitor stanza [1]; we're using this feature in
> tripleo now and it works as expected, the mon daemon binds on the ip set by
> 'public_addr'

Understood.

> 
> what seems a problem in ceph instead is that the same configuration behaves
> differently in IPv4 vs IPv6; it looks like the matching of its own local ip
> address against the list in 'mon_host' works only in IPv4 and in that case
> the use of 'public_addr' is not required, the bind address will be the one
> found in the 'mon_host' list

Let me investigate this aspect and report back.

> 
> maybe this is an unwanted functionality or some side effect and use of
> public_addr should always be preferred?

Let's find out.


Note You need to log in before you can comment on or make changes to this bug.