1491027 – ceph: All ospds are down upon successful deployment with ceph.

Bug 1491027 - ceph: All ospds are down upon successful deployment with ceph.

Summary: ceph: All ospds are down upon successful deployment with ceph.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	12.0 (Pike)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	Upstream M2
Target Release:	12.0 (Pike)
Assignee:	Giulio Fidente
QA Contact:	Yogev Rabl
Docs Contact:	Derek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-12 19:44 UTC by Alexander Chuzhoy
Modified:	2018-02-05 19:12 UTC (History)
CC List:	10 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-7.0.0-0.20170913050524.0rc2.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-12-13 22:08:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	501983	0	None	None	None	2017-09-15 01:52:26 UTC
Red Hat Product Errata	RHEA-2017:3462	0	normal	SHIPPED_LIVE	Red Hat OpenStack Platform 12.0 Enhancement Advisory	2018-02-16 01:43:25 UTC

Description Alexander Chuzhoy 2017-09-12 19:44:46 UTC

ceph: All ospds are down upon successful deployment with ceph.

Environment:
instack-undercloud-7.3.1-0.20170830213703.el7ost.noarch
openstack-tripleo-heat-templates-7.0.0-0.20170901051303.0rc1.el7ost.noarch
openstack-puppet-modules-11.0.0-0.20170828113154.el7ost.noarch

python-cephfs-10.2.7-32.el7cp.x86_64
libcephfs1-10.2.7-32.el7cp.x86_64
ceph-common-10.2.7-32.el7cp.x86_64
ceph-mon-10.2.7-32.el7cp.x86_64
ceph-radosgw-10.2.7-32.el7cp.x86_64
puppet-ceph-2.4.1-0.20170831071705.df3ed30.el7ost.noarch
ceph-selinux-10.2.7-32.el7cp.x86_64
ceph-mds-10.2.7-32.el7cp.x86_64
ceph-base-10.2.7-32.el7cp.x86_64



Steps to reproduce:
Deploy OC with:
openstack overcloud deploy --templates \
--libvirt-type kvm \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
-e /home/stack/templates/nodes_data.yaml \
-e  /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/enable-tls.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
-e /home/stack/inject-trust-anchor-hiera.yaml \
-e /home/stack/rhos12.yaml

Check the status of ceph:
[root@overcloud-cephstorage-0 ~]# ceph status
    cluster 6e4576e4-97e3-11e7-80f3-5254004f3ff4
     health HEALTH_ERR
            224 pgs are stuck inactive for more than 300 seconds
            224 pgs stuck inactive
            224 pgs stuck unclean
            no osds
     monmap e2: 3 mons at {overcloud-controller-0=172.17.3.22:6789/0,overcloud-controller-1=172.17.3.17:6789/0,overcloud-controller-2=172.17.3.23:6789/0}
            election epoch 6, quorum 0,1,2 overcloud-controller-1,overcloud-controller-0,overcloud-controller-2
     osdmap e6: 0 osds: 0 up, 0 in
            flags sortbitwise,require_jewel_osds
      pgmap v7: 224 pgs, 6 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                 224 creating





Note: the issue reproduced with this deployment command too (ipv6, no ssl):
openstack overcloud deploy --templates \
--libvirt-type kvm \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
-e /home/stack/templates/nodes_data.yaml \
-e  /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml \
-e /home/stack/virt/network/network-environment-v6.yaml \
-e /home/stack/rhos12.yaml

root@overcloud-cephstorage-0 ~]# ceph status
    cluster 3c8f8400-97e3-11e7-a649-525400b54f0e
     health HEALTH_ERR
            224 pgs are stuck inactive for more than 300 seconds
            224 pgs stuck inactive
            224 pgs stuck unclean
            no osds
     monmap e1: 1 mons at {overcloud-controller-0=[fd00:fd00:fd00:3000::16]:6789/0}
            election epoch 3, quorum 0 overcloud-controller-0
     osdmap e6: 0 osds: 0 up, 0 in
            flags sortbitwise,require_jewel_osds
      pgmap v7: 224 pgs, 6 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                 224 creating

Comment 2 Red Hat Bugzilla Rules Engine 2017-09-14 18:01:46 UTC

This bugzilla has been removed from the release since it has not been triaged, and needs to be reviewed for targeting another release.

Comment 5 Alexander Chuzhoy 2017-10-05 23:21:40 UTC

Environment:
openstack-tripleo-heat-templates-7.0.1-0.20170927205938.el7ost.noarch
ceph-ansible-3.0.0-0.1.rc15.el7cp.noarch

Reproducing the issue:


[root@overcloud-cephstorage-0 ~]# ceph status
    cluster 478b4720-aa0f-11e7-b94c-52540033fa46
     health HEALTH_ERR
            704 pgs are stuck inactive for more than 300 seconds
            704 pgs stuck inactive
            704 pgs stuck unclean
            no osds
     monmap e1: 3 mons at {overcloud-controller-0=172.17.3.18:6789/0,overcloud-controller-1=172.17.3.13:6789/0,overcloud-controller-2=172.17.3.25:6789/0}
            election epoch 6, quorum 0,1,2 overcloud-controller-1,overcloud-controller-0,overcloud-controller-2
     osdmap e6: 0 osds: 0 up, 0 in
            flags sortbitwise,require_jewel_osds,recovery_deletes
      pgmap v7: 704 pgs, 6 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                 704 creating


[root@overcloud-cephstorage-0 ~]# docker ps -a
CONTAINER ID        IMAGE                                                                             COMMAND                  CREATED             STATUS                         PORTS               NAMES
43f293c4f3e8        docker-registry.engineering.redhat.com/rhosp12/openstack-cron-docker:20171004.1   "kolla_start"            About an hour ago   Up About an hour                                   logrotate_crond
50105246a1cc        docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:latest                 "/entrypoint.sh"         About an hour ago   Exited (0) About an hour ago                       ceph-osd-prepare-overcloud-cephstorage-0-vdb
8ee231330dc9        docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:latest                 "/usr/bin/ceph --vers"   About an hour ago   Exited (0) About an hour ago                       amazing_hypatia

Comment 6 Alexander Chuzhoy 2017-10-06 16:38:38 UTC

Had successful deployment since with:
openstack-tripleo-heat-templates-7.0.1-0.20170927205938.el7ost.noarch
ceph-ansible-3.0.0-0.1.rc4.el7cp.noarch
puppet-ceph-2.4.2-0.20170927195215.718a5ff.el7ost.noarch
instack-undercloud-7.4.1-0.20170925172804.el7ost.noarch


Moving back to on_qa.

Comment 7 Yogev Rabl 2017-11-15 19:02:51 UTC

verified on openstack-tripleo-heat-templates-7.0.3-0.20171024200825.el7ost.noarch

Comment 10 errata-xmlrpc 2017-12-13 22:08:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462

Note You need to log in before you can comment on or make changes to this bug.