2002639 – OSP17 IPV6 ceph job is failing on task "Run cephadm bootstrap" with stderr Error EINVAL: Failed to connect to controller-0

Bug 2002639 - OSP17 IPV6 ceph job is failing on task "Run cephadm bootstrap" with stderr Error EINVAL: Failed to connect to controller-0

Summary: OSP17 IPV6 ceph job is failing on task "Run cephadm bootstrap" with stderr Er...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Cephadm
Sub Component:
Version:	5.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	5.0z1
Assignee:	Sebastian Wagner
QA Contact:	Alfredo
Docs Contact:	Karen Norteman
URL:
Whiteboard:
Depends On:
Blocks:	1820257
TreeView+	depends on / blocked

Reported:	2021-09-09 12:12 UTC by Sandeep Yadav
Modified:	2021-11-02 16:39 UTC (History)
CC List:	10 users (show)
Fixed In Version:	ceph-16.2.0-130.el8cp
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-11-02 16:39:21 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	ceph ceph pull 43029	None	None	None	2021-09-09 15:07:37 UTC
Red Hat Issue Tracker	RHCEPH-1644	None	None	None	2021-09-09 15:09:38 UTC
Red Hat Product Errata	RHBA-2021:4105	None	None	None	2021-11-02 16:39:47 UTC

Description Sandeep Yadav 2021-09-09 12:12:06 UTC

Description of problem:

OSP17 IPV6 ceph job is failing on task "Run cephadm bootstrap" with stderr Error EINVAL: Failed to connect to controller-0

Version-Release number of selected component (if applicable):

17

How reproducible:

Everytime in 17 Integration pipeline


Steps to Reproduce:
1. Deploy environment with ceph + IPV6


Actual results:

Deployment failing on task "Run cephadm bootstrap" with stderr Error EINVAL: Failed to connect to controller-0


Expected results:

Deployment should completes succesfully


Additional info:

Logs attached in private comments.

Comment 3 Sebastian Wagner 2021-09-09 15:00:28 UTC

more readable: 

➜  foo jq -r '.[]' j
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman|docker (/bin/podman) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: 3ba21fbf-2232-44b2-a8be-e37b61273af5
Verifying IP [fd00:fd00:fd00:3000::269] port 3300 ...
Verifying IP [fd00:fd00:fd00:3000::269] port 6789 ...
Mon IP [fd00:fd00:fd00:3000::269] is in CIDR network fd00:fd00:fd00:3000::/64
- internal network (--cluster-network) has not been provided, OSD replication will default to the public_network
Pulling container image undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-12...
Ceph version: ceph version 16.2.0-72.el8cp (1e802193e0b4084ffcdb2338dd09f08bbea54a1a) pacific (stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...
Setting mon public_network to fd00:fd00:fd00:3000::/64
Enabling IPv6 (ms_bind_ipv6) binding
Wrote config to /etc/ceph/ceph.conf
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Creating mgr...
Verifying port 9283 ...
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/15)...
mgr not available, waiting (2/15)...
mgr not available, waiting (3/15)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for mgr epoch 5...
mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Using provided ssh keys...
Adding host controller-0...
Non-zero exit code 22 from /bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph --init -e CONTAINER_IMAGE=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-12 -e NODE_NAME=controller-0 -e CEPH_USE_RANDOM_NONCE=1 -v /var/log/ceph/3ba21fbf-2232-44b2-a8be-e37b61273af5:/var/log/ceph:z -v /tmp/ceph-tmp71lj62nz:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp70z5o_6n:/etc/ceph/ceph.conf:z undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-12 orch host add controller-0 [fd00:fd00:fd00:3000::269]
/usr/bin/ceph: stderr Error EINVAL: Failed to connect to controller-0 ([fd00:fd00:fd00:3000::269]).
/usr/bin/ceph: stderr Please make sure that the host is reachable and accepts connections using the cephadm SSH key
/usr/bin/ceph: stderr 
/usr/bin/ceph: stderr To add the cephadm SSH key to the host:
/usr/bin/ceph: stderr > ceph cephadm get-pub-key > ~/ceph.pub
/usr/bin/ceph: stderr > ssh-copy-id -f -i ~/ceph.pub ceph-admin@[fd00:fd00:fd00:3000::269]
/usr/bin/ceph: stderr 
/usr/bin/ceph: stderr To check that the host is reachable:
/usr/bin/ceph: stderr > ceph cephadm get-ssh-config > ssh_config
/usr/bin/ceph: stderr > ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
/usr/bin/ceph: stderr > chmod 0600 ~/cephadm_private_key
/usr/bin/ceph: stderr > ssh -F ssh_config -i ~/cephadm_private_key ceph-admin@[fd00:fd00:fd00:3000::269]
ERROR: Failed to add host <controller-0>: Failed command: /bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph --init -e CONTAINER_IMAGE=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-12 -e NODE_NAME=controller-0 -e CEPH_USE_RANDOM_NONCE=1 -v /var/log/ceph/3ba21fbf-2232-44b2-a8be-e37b61273af5:/var/log/ceph:z -v /tmp/ceph-tmp71lj62nz:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp70z5o_6n:/etc/ceph/ceph.conf:z undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-12 orch host add controller-0 [fd00:fd00:fd00:3000::269]

This is going to get fixed by https://github.com/ceph/ceph/pull/43029

Comment 8 John Fulton 2021-10-07 21:06:46 UTC

I tested the fix for this bug and it let me get my mon running in IPv6 but unfortunately I hit a new IPv6 bug for the OSD. I reported it upstream here:

 https://tracker.ceph.com/issues/52867

Comment 9 Sebastian Wagner 2021-10-12 10:07:23 UTC

cc Daniel

Comment 11 Sebastian Wagner 2021-10-13 13:01:15 UTC

Just set requires_doc_text to - as this was caused by an internal CI issue

Comment 12 John Fulton 2021-10-14 20:42:24 UTC

We have a workaround on the tripleo/director side [1] but we'd rather not merge a workaround into TripleO. We request that Ceph have a fix for the upstream issue [2] either in pick_address.cc or cephadm or whatever you like. Should I open a downstream BZ to track this other issue?

[1] https://review.opendev.org/c/openstack/tripleo-ansible/+/814064
[2] https://tracker.ceph.com/issues/52867

Comment 13 John Fulton 2021-10-21 18:17:04 UTC

(In reply to John Fulton from comment #12)
> We have a workaround on the tripleo/director side [1] but we'd rather not
> merge a workaround into TripleO. We request that Ceph have a fix for the
> upstream issue [2] either in pick_address.cc or cephadm or whatever you
> like. Should I open a downstream BZ to track this other issue?
> 
> [1] https://review.opendev.org/c/openstack/tripleo-ansible/+/814064
> [2] https://tracker.ceph.com/issues/52867

 https://bugzilla.redhat.com/show_bug.cgi?id=2016496

Comment 15 errata-xmlrpc 2021-11-02 16:39:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4105

Note You need to log in before you can comment on or make changes to this bug.