Bug 1491027

Summary: ceph: All ospds are down upon successful deployment with ceph.
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: openstack-tripleo-heat-templatesAssignee: Giulio Fidente <gfidente>
Status: CLOSED ERRATA QA Contact: Yogev Rabl <yrabl>
Severity: high Docs Contact: Derek <dcadzow>
Priority: high    
Version: 12.0 (Pike)CC: gfidente, jdurgin, johfulto, jomurphy, jschluet, lhh, mburns, nlevine, rhel-osp-director-maint, srevivo
Target Milestone: Upstream M2Keywords: AutomationBlocker, Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-7.0.0-0.20170913050524.0rc2.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-13 22:08:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alexander Chuzhoy 2017-09-12 19:44:46 UTC
ceph: All ospds are down upon successful deployment with ceph.

Environment:
instack-undercloud-7.3.1-0.20170830213703.el7ost.noarch
openstack-tripleo-heat-templates-7.0.0-0.20170901051303.0rc1.el7ost.noarch
openstack-puppet-modules-11.0.0-0.20170828113154.el7ost.noarch

python-cephfs-10.2.7-32.el7cp.x86_64
libcephfs1-10.2.7-32.el7cp.x86_64
ceph-common-10.2.7-32.el7cp.x86_64
ceph-mon-10.2.7-32.el7cp.x86_64
ceph-radosgw-10.2.7-32.el7cp.x86_64
puppet-ceph-2.4.1-0.20170831071705.df3ed30.el7ost.noarch
ceph-selinux-10.2.7-32.el7cp.x86_64
ceph-mds-10.2.7-32.el7cp.x86_64
ceph-base-10.2.7-32.el7cp.x86_64



Steps to reproduce:
Deploy OC with:
openstack overcloud deploy --templates \
--libvirt-type kvm \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
-e /home/stack/templates/nodes_data.yaml \
-e  /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/enable-tls.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
-e /home/stack/inject-trust-anchor-hiera.yaml \
-e /home/stack/rhos12.yaml

Check the status of ceph:
[root@overcloud-cephstorage-0 ~]# ceph status
    cluster 6e4576e4-97e3-11e7-80f3-5254004f3ff4
     health HEALTH_ERR
            224 pgs are stuck inactive for more than 300 seconds
            224 pgs stuck inactive
            224 pgs stuck unclean
            no osds
     monmap e2: 3 mons at {overcloud-controller-0=172.17.3.22:6789/0,overcloud-controller-1=172.17.3.17:6789/0,overcloud-controller-2=172.17.3.23:6789/0}
            election epoch 6, quorum 0,1,2 overcloud-controller-1,overcloud-controller-0,overcloud-controller-2
     osdmap e6: 0 osds: 0 up, 0 in
            flags sortbitwise,require_jewel_osds
      pgmap v7: 224 pgs, 6 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                 224 creating





Note: the issue reproduced with this deployment command too (ipv6, no ssl):
openstack overcloud deploy --templates \
--libvirt-type kvm \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
-e /home/stack/templates/nodes_data.yaml \
-e  /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml \
-e /home/stack/virt/network/network-environment-v6.yaml \
-e /home/stack/rhos12.yaml

root@overcloud-cephstorage-0 ~]# ceph status
    cluster 3c8f8400-97e3-11e7-a649-525400b54f0e
     health HEALTH_ERR
            224 pgs are stuck inactive for more than 300 seconds
            224 pgs stuck inactive
            224 pgs stuck unclean
            no osds
     monmap e1: 1 mons at {overcloud-controller-0=[fd00:fd00:fd00:3000::16]:6789/0}
            election epoch 3, quorum 0 overcloud-controller-0
     osdmap e6: 0 osds: 0 up, 0 in
            flags sortbitwise,require_jewel_osds
      pgmap v7: 224 pgs, 6 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                 224 creating

Comment 2 Red Hat Bugzilla Rules Engine 2017-09-14 18:01:46 UTC
This bugzilla has been removed from the release since it has not been triaged, and needs to be reviewed for targeting another release.

Comment 5 Alexander Chuzhoy 2017-10-05 23:21:40 UTC
Environment:
openstack-tripleo-heat-templates-7.0.1-0.20170927205938.el7ost.noarch
ceph-ansible-3.0.0-0.1.rc15.el7cp.noarch

Reproducing the issue:


[root@overcloud-cephstorage-0 ~]# ceph status
    cluster 478b4720-aa0f-11e7-b94c-52540033fa46
     health HEALTH_ERR
            704 pgs are stuck inactive for more than 300 seconds
            704 pgs stuck inactive
            704 pgs stuck unclean
            no osds
     monmap e1: 3 mons at {overcloud-controller-0=172.17.3.18:6789/0,overcloud-controller-1=172.17.3.13:6789/0,overcloud-controller-2=172.17.3.25:6789/0}
            election epoch 6, quorum 0,1,2 overcloud-controller-1,overcloud-controller-0,overcloud-controller-2
     osdmap e6: 0 osds: 0 up, 0 in
            flags sortbitwise,require_jewel_osds,recovery_deletes
      pgmap v7: 704 pgs, 6 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                 704 creating


[root@overcloud-cephstorage-0 ~]# docker ps -a
CONTAINER ID        IMAGE                                                                             COMMAND                  CREATED             STATUS                         PORTS               NAMES
43f293c4f3e8        docker-registry.engineering.redhat.com/rhosp12/openstack-cron-docker:20171004.1   "kolla_start"            About an hour ago   Up About an hour                                   logrotate_crond
50105246a1cc        docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:latest                 "/entrypoint.sh"         About an hour ago   Exited (0) About an hour ago                       ceph-osd-prepare-overcloud-cephstorage-0-vdb
8ee231330dc9        docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:latest                 "/usr/bin/ceph --vers"   About an hour ago   Exited (0) About an hour ago                       amazing_hypatia

Comment 6 Alexander Chuzhoy 2017-10-06 16:38:38 UTC
Had successful deployment since with:
openstack-tripleo-heat-templates-7.0.1-0.20170927205938.el7ost.noarch
ceph-ansible-3.0.0-0.1.rc4.el7cp.noarch
puppet-ceph-2.4.2-0.20170927195215.718a5ff.el7ost.noarch
instack-undercloud-7.4.1-0.20170925172804.el7ost.noarch


Moving back to on_qa.

Comment 7 Yogev Rabl 2017-11-15 19:02:51 UTC
verified on openstack-tripleo-heat-templates-7.0.3-0.20171024200825.el7ost.noarch

Comment 10 errata-xmlrpc 2017-12-13 22:08:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462