Description of problem: OSP15 with -3cont_3comp_3ceph-ipv6-geneve + Undercloud SSL + Overcloud SSL https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DF%20Current%20release/job/DFG-df-deployment-15-virthost-3cont_3comp_3ceph-yes_UC_SSL-yes_OC_SSL-ceph-ipv6-geneve-RHELOSP-31824/5/artifact/ (undercloud) [stack@undercloud-0 ~]$ cat overcloud_deploy.sh #!/bin/bash openstack overcloud deploy \ --timeout 100 \ --templates /usr/share/openstack-tripleo-heat-templates \ --stack overcloud \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ -e /home/stack/virt/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml \ -e /home/stack/virt/network/network-environment-v6.yaml \ -e /home/stack/virt/network/dvr-override.yaml \ -e /home/stack/virt/enable-tls.yaml \ -e /home/stack/virt/inject-trust-anchor.yaml \ -e /home/stack/virt/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /home/stack/virt/hostnames.yml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e ~/containers-prepare-parameter.yaml \ --log-file overcloud_deployment_72.log ceph_image: rhceph-4.0-rhel8 ceph_namespace: docker-registry.upshift.redhat.com/ceph ceph_tag: latest "TASK [ceph-osd : wait for all osd to be up] ************************************", "task path: /usr/share/ceph-ansible/roles/ceph-osd/tasks/openstack_config.yml:2", "Wednesday 15 May 2019 09:23:02 +0000 (0:00:00.346) 0:06:42.075 ********* ", "FAILED - RETRYING: wait for all osd to be up (60 retries left).", "FAILED - RETRYING: wait for all osd to be up (59 retries left).", "FAILED - RETRYING: wait for all osd to be up (58 retries left).", "FAILED - RETRYING: wait for all osd to be up (57 retries left).", "FAILED - RETRYING: wait for all osd to be up (56 retries left).", "FAILED - RETRYING: wait for all osd to be up (55 retries left).", "FAILED - RETRYING: wait for all osd to be up (54 retries left).", "FAILED - RETRYING: wait for all osd to be up (53 retries left).", "FAILED - RETRYING: wait for all osd to be up (52 retries left).", "FAILED - RETRYING: wait for all osd to be up (51 retries left).", "FAILED - RETRYING: wait for all osd to be up (50 retries left).", "FAILED - RETRYING: wait for all osd to be up (49 retries left).", "FAILED - RETRYING: wait for all osd to be up (48 retries left).", "FAILED - RETRYING: wait for all osd to be up (47 retries left).", "FAILED - RETRYING: wait for all osd to be up (46 retries left).", "FAILED - RETRYING: wait for all osd to be up (45 retries left).", "FAILED - RETRYING: wait for all osd to be up (44 retries left).", "FAILED - RETRYING: wait for all osd to be up (43 retries left).", "FAILED - RETRYING: wait for all osd to be up (42 retries left).", "FAILED - RETRYING: wait for all osd to be up (41 retries left).", "FAILED - RETRYING: wait for all osd to be up (40 retries left).", "FAILED - RETRYING: wait for all osd to be up (39 retries left).", "FAILED - RETRYING: wait for all osd to be up (38 retries left).", "FAILED - RETRYING: wait for all osd to be up (37 retries left).", "FAILED - RETRYING: wait for all osd to be up (36 retries left).", "FAILED - RETRYING: wait for all osd to be up (35 retries left).", "FAILED - RETRYING: wait for all osd to be up (34 retries left).", "FAILED - RETRYING: wait for all osd to be up (33 retries left).", "FAILED - RETRYING: wait for all osd to be up (32 retries left).", "FAILED - RETRYING: wait for all osd to be up (31 retries left).", "FAILED - RETRYING: wait for all osd to be up (30 retries left).", "FAILED - RETRYING: wait for all osd to be up (29 retries left).", "FAILED - RETRYING: wait for all osd to be up (28 retries left).", "FAILED - RETRYING: wait for all osd to be up (27 retries left).", "FAILED - RETRYING: wait for all osd to be up (26 retries left).", "FAILED - RETRYING: wait for all osd to be up (25 retries left).", "FAILED - RETRYING: wait for all osd to be up (24 retries left).", "FAILED - RETRYING: wait for all osd to be up (23 retries left).", "FAILED - RETRYING: wait for all osd to be up (22 retries left).", "FAILED - RETRYING: wait for all osd to be up (21 retries left).", "FAILED - RETRYING: wait for all osd to be up (20 retries left).", "FAILED - RETRYING: wait for all osd to be up (19 retries left).", "FAILED - RETRYING: wait for all osd to be up (18 retries left).", "FAILED - RETRYING: wait for all osd to be up (17 retries left).", "FAILED - RETRYING: wait for all osd to be up (16 retries left).", "FAILED - RETRYING: wait for all osd to be up (15 retries left).", "FAILED - RETRYING: wait for all osd to be up (14 retries left).", "FAILED - RETRYING: wait for all osd to be up (13 retries left).", "FAILED - RETRYING: wait for all osd to be up (12 retries left).", "FAILED - RETRYING: wait for all osd to be up (11 retries left).", "FAILED - RETRYING: wait for all osd to be up (10 retries left).", "FAILED - RETRYING: wait for all osd to be up (9 retries left).", "FAILED - RETRYING: wait for all osd to be up (8 retries left).", "FAILED - RETRYING: wait for all osd to be up (7 retries left).", "FAILED - RETRYING: wait for all osd to be up (6 retries left).", "FAILED - RETRYING: wait for all osd to be up (5 retries left).", "FAILED - RETRYING: wait for all osd to be up (4 retries left).", "FAILED - RETRYING: wait for all osd to be up (3 retries left).", "FAILED - RETRYING: wait for all osd to be up (2 retries left).", "FAILED - RETRYING: wait for all osd to be up (1 retries left).", "fatal: [ceph-2]: FAILED! => changed=false ", " attempts: 60", " test \"$(podman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)[\"osdmap\"][\"osdmap\"][\"num_osds\"])')\" -gt 0 && test \"$(podman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)[\"osdmap\"][\"osdmap\"][\"num_osds\"])')\" = \"$(podman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)[\"osdmap\"][\"osdmap\"][\"num_up_osds\"])')\"", " delta: '0:00:01.895705'", " end: '2019-05-15 09:35:07.407103'", " rc: 1", " start: '2019-05-15 09:35:05.511398'", "NO MORE HOSTS LEFT *************************************************************", "PLAY RECAP *********************************************************************", "ceph-0 : ok=102 changed=10 unreachable=0 failed=0 skipped=184 rescued=0 ignored=0 ", "ceph-1 : ok=100 changed=10 unreachable=0 failed=0 skipped=179 rescued=0 ignored=0 ", "ceph-2 : ok=101 changed=10 unreachable=0 failed=1 skipped=178 rescued=0 ignored=0 ", "compute-0 : ok=31 changed=0 unreachable=0 failed=0 skipped=86 rescued=0 ignored=0 ", "compute-1 : ok=31 changed=0 unreachable=0 failed=0 skipped=86 rescued=0 ignored=0 ", "compute-2 : ok=31 changed=0 unreachable=0 failed=0 skipped=86 rescued=0 ignored=0 ", "controller-0 : ok=188 changed=21 unreachable=0 failed=0 skipped=296 rescued=0 ignored=0 ", "controller-1 : ok=174 changed=17 unreachable=0 failed=0 skipped=275 rescued=0 ignored=0 ", "controller-2 : ok=175 changed=18 unreachable=0 failed=0 skipped=286 rescued=0 ignored=0 ", [heat-admin@ceph-2 ~]$ test "$(podman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)["osdmap"]["osdmap"]["num_osds"])')" -gt 0 && test "$(po dman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)["osdmap"]["osdmap"]["num_osds"])')" = "$(podman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)["osdmap"]["osdmap"]["num_up_osds"])')" unable to exec into ceph-mon-controller-0: no container with name or ID ceph-mon-controller-0 found: no such container Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/lib64/python3.6/json/__init__.py", line 299, in load parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw) File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads return _default_decoder.decode(s) File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) -bash: test: : integer expression expected Last login: Wed May 15 09:50:47 2019 from 192.168.24.254 [heat-admin@ceph-2 ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a85fa3b88122 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 2 seconds ago Up 2 seconds ago ceph-osd-13 [heat-admin@ceph-2 ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES eed582a29ac8 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 2 seconds ago Up 2 seconds ago ceph-osd-8 a23b70cf6ad7 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 2 seconds ago Up 2 seconds ago ceph-osd-11 [heat-admin@ceph-2 ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a23b70cf6ad7 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 4 seconds ago Up 3 seconds ago ceph-osd-11 [heat-admin@ceph-2 ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 8b1c4e672354 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... Less than a second ago Created ceph-osd-5 [heat-admin@ceph-2 ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d8c5cfe99b77 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 1 second ago Up 1 second ago ceph-osd-13 716253a17f3d 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 2 seconds ago Up 2 seconds ago ceph-osd-5 093c3b1fafe6 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 6 seconds ago Up 6 seconds ago ceph-osd-11 [heat-admin@ceph-2 ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES [heat-admin@ceph-2 ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f3676598efe8 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... Less than a second ago Up Less than a second ago ceph-osd-2 [heat-admin@ceph-2 ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f3676598efe8 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 1 second ago Up 1 second ago ceph-osd-2 [heat-admin@ceph-2 ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f3676598efe8 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 2 seconds ago Up 2 seconds ago ceph-osd-2 [heat-admin@ceph-0 ~]$ sudo podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3ce8395e4569 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 3 seconds ago Up 3 seconds ago ceph-osd-1 923e91c3d554 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 3 seconds ago Up 3 seconds ago ceph-osd-12 [heat-admin@ceph-0 ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 458f4641c5cd 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 1 second ago Up 1 second ago ceph-osd-4 b8c939e1fa41 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 1 second ago Up 1 second ago ceph-osd-6 3ce8395e4569 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 6 seconds ago Up 6 seconds ago ceph-osd-1 [heat-admin@ceph-0 ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES [heat-admin@ceph-0 ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 2d6c0e8b0fcc 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... Less than a second ago Up Less than a second ago ceph-osd-9 [heat-admin@ceph-0 ~]$ [heat-admin@ceph-1 ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 8d9a0550c409 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... Less than a second ago Up Less than a second ago ceph-osd-7 529d79386a07 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 2 seconds ago Up 2 seconds ago ceph-osd-0 [heat-admin@ceph-1 ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 8d9a0550c409 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 1 second ago Up 1 second ago ceph-osd-7 529d79386a07 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 3 seconds ago Up 3 seconds ago ceph-osd-0 [heat-admin@ceph-1 ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 8d9a0550c409 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 2 seconds ago Up 2 seconds ago ceph-osd-7 [heat-admin@ceph-1 ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 8d9a0550c409 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 3 seconds ago Up 3 seconds ago ceph-osd-7 [heat-admin@ceph-1 ~]$ sudo systemctl list-units |grep ceph ceph-osd loaded activating auto-restart Ceph OSD ceph-osd loaded active running Ceph OSD ceph-osd loaded activating auto-restart Ceph OSD ceph-osd loaded active running Ceph OSD ceph-osd loaded activating auto-restart Ceph OSD system-ceph\x2dosd.slice loaded active active system-ceph\x2dosd.slice [heat-admin@ceph-1 ~]$ sudo podman logs ceph-osd-10 Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-10 Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-10 Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-10 Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-ba66ccd3-a3d9-4751-accc-89d84af36e9e/osd-data-fb5916b7-3095-4956-85c9-5b01cca7c4ba --path /var/lib/ceph/osd/ceph-10 --no-mon-config Running command: /bin/ln -snf /dev/ceph-ba66ccd3-a3d9-4751-accc-89d84af36e9e/osd-data-fb5916b7-3095-4956-85c9-5b01cca7c4ba /var/lib/ceph/osd/ceph-10/block Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-10/block Running command: /bin/chown -R ceph:ceph /dev/mapper/ceph--ba66ccd3--a3d9--4751--accc--89d84af36e9e-osd--data--fb5916b7--3095--4956--85c9--5b01cca7c4ba Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-10 --> ceph-volume lvm activate successful for osd ID: 10 2019-05-15 10:08:09 /opt/ceph-container/bin/entrypoint.sh: SUCCESS exec: PID 727845: spawning /usr/bin/ceph-osd --cluster ceph --setuser ceph --setgroup ceph -d -i 10 exec: Waiting 727845 to quit 2019-05-15 10:08:09.899 7f5951f4f080 0 set uid:gid to 167:167 (ceph:ceph) 2019-05-15 10:08:09.899 7f5951f4f080 0 ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable), process ceph-osd, pid 727845 2019-05-15 10:08:09.899 7f5951f4f080 0 pidfile_write: ignore empty --pid-file 2019-05-15 10:08:09.960 7f5951f4f080 1 bdev create path /var/lib/ceph/osd/ceph-10/block type kernel 2019-05-15 10:08:09.960 7f5951f4f080 1 bdev(0x56544dafa700 /var/lib/ceph/osd/ceph-10/block) open path /var/lib/ceph/osd/ceph-10/block 2019-05-15 10:08:09.960 7f5951f4f080 1 bdev(0x56544dafa700 /var/lib/ceph/osd/ceph-10/block) open size 10737418240 (0x280000000, 10 GiB) block_size 4096 (4 KiB) rotational discard not supported 2019-05-15 10:08:09.961 7f5951f4f080 1 bluestore(/var/lib/ceph/osd/ceph-10) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2 2019-05-15 10:08:09.961 7f5951f4f080 1 bdev create path /var/lib/ceph/osd/ceph-10/block type kernel 2019-05-15 10:08:09.961 7f5951f4f080 1 bdev(0x56544dafae00 /var/lib/ceph/osd/ceph-10/block) open path /var/lib/ceph/osd/ceph-10/block 2019-05-15 10:08:09.961 7f5951f4f080 1 bdev(0x56544dafae00 /var/lib/ceph/osd/ceph-10/block) open size 10737418240 (0x280000000, 10 GiB) block_size 4096 (4 KiB) rotational discard not supported 2019-05-15 10:08:09.961 7f5951f4f080 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-10/block size 10 GiB 2019-05-15 10:08:09.961 7f5951f4f080 1 bdev(0x56544dafae00 /var/lib/ceph/osd/ceph-10/block) close 2019-05-15 10:08:09.982 7f5951f4f080 1 bdev(0x56544dafa700 /var/lib/ceph/osd/ceph-10/block) close Version-Release number of selected component (if applicable): core_puddle: RHOS_TRUNK-15.0-RHEL-8-20190509.n.1 How reproducible: Always Steps to Reproduce: 1.Deploy OSP15 with 3ctrl+3comp+3ceph + ipv6+ geneve + SSL for UC and OC Actual results: failed to check OSD get started Expected results: Deployment passed Additional info:
https://www.spinics.net/lists/ceph-users/msg51954.html
ROOT CAUSE: If you start the OSD directly using the command embedded /usr/share/ceph-osd-run.sh then you can see why the OSD failed to start: 2019-05-21 13:02:46.115 7ff739a92080 -1 unable to find any IPv4 address in networks 'fd00:fd00:fd00:3000::/64' interfaces '' 2019-05-21 13:02:46.115 7ff739a92080 -1 Failed to pick public address. If you then add "ms bind ipv4 = false" under the [global] section of every /etc/ceph/ceph.conf as suggested [1], then the OSD has no problem starting. ceph-ansible will be updated to deal with this. [1] https://www.spinics.net/lists/ceph-users/msg51954.html
Hi John, do you know when downstream package will be available ?
WORKAROUND: Deploy your overcloud with 'openstack overcloud deploy ... -e foo.yaml ...' where foo.yaml contains the following: parameter_defaults: CephConfigOverrides: ms_bind_ipv4: false
(In reply to Artem Hrechanychenko from comment #8) > Hi John, > do you know when downstream package will be available ? I'll ask the assignee when the change will be backported to 4.0.z. We'll then need to it packaged.
Guillaume, I see the fix merged upstream [1] three weeks ago but that it's not yet backported to 4.0.z. For example, for the latest z=8 I can see, it's missing the fixing line [2]. Do you know when you'll have the backport done? Thanks, John [1] https://github.com/ceph/ceph-ansible/pull/4014 [2] https://github.com/ceph/ceph-ansible/blob/v4.0.0rc8/roles/ceph-config/templates/ceph.conf.j2#L13
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. Regards, Giri
Verified using ceph-ansible-4.0.6-1.el8cp.noarch ceph version 14.2.4-85.el8cp ansible-2.8.7-1.el8ae.noarch Moving to"VERIFIED" state
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0312