Bug 1572533

Summary: [UPGRADES] CephMgr rules not created
Product: Red Hat OpenStack Reporter: Yurii Prokulevych <yprokule>
Component: puppet-tripleoAssignee: RHOS Maint <rhos-maint>
Status: CLOSED DUPLICATE QA Contact: nlevinki <nlevinki>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: augol, ccamacho, emacchi, gfidente, jjoyce, jschluet, jstransk, sgolovat, slinaber, tvignaud
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-08 11:06:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yurii Prokulevych 2018-04-27 09:07:42 UTC
Description of problem:
-----------------------
After ceph upgrade `ceph status` reports HEALTH_WARN state
[root@controller-0 ~]# ceph status 
  cluster:
    id:     767c7d96-47c7-11e8-b115-52540022799b
    health: HEALTH_WARN
            Reduced data availability: 224 pgs inactive
 
  services:
    mon: 3 daemons, quorum controller-2,controller-1,controller-0
    mgr: controller-0(active), standbys: controller-2, controller-1
    osd: 3 osds: 3 up, 3 in
 
  data:
    pools:   6 pools, 224 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:     100.000% pgs unknown
             224 unknown

The issues seems to be due to missed iptables rules for CephMgr service

    "tripleo.ceph_mgr.firewall_rules": {
        "113 ceph_mgr": {
            "dport": [
                "6800-7300"
            ]
        }

After adding this rule on controller nodes:
ceph status
  cluster:
    id:     767c7d96-47c7-11e8-b115-52540022799b
    health: HEALTH_WARN
            too many PGs per OSD (224 > max 200)
 
  services:
    mon: 3 daemons, quorum controller-2,controller-1,controller-0
    mgr: controller-0(active), standbys: controller-2, controller-1
    osd: 3 osds: 3 up, 3 in
 
  data:
    pools:   6 pools, 224 pgs
    objects: 374 objects, 57628 kB
    usage:   541 MB used, 117 GB / 118 GB avail
    pgs:     224 active+clean
 
  io:
    client:   2984 B/s rd, 17907 B/s wr, 5 op/s rd, 32 op/s wr


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
puppet-tripleo-8.3.2-0.20180416191414.cb114de.el7ost.noarch


Steps to Reproduce:
-------------------
1. Perform upgrade from RHOS-12 to RHOS-13
2. Start ceph upgrade:
    openstack overcloud ceph-upgrade run \
            --stack overcloud \
            --templates \
            --container-registry-file /home/stack/composable_roles/docker-images.yaml \
            -e /home/stack/composable_roles/roles/nodes.yaml \
            -e /home/stack/composable_roles/internal.yaml \
            -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
            -e /home/stack/composable_roles/network/network-environment.yaml \
            -e /home/stack/composable_roles/enable-tls.yaml \
            -e /home/stack/composable_roles/inject-trust-anchor.yaml \
            -e /home/stack/composable_roles/public_vip.yaml \
            -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
            -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
            -e /home/stack/composable_roles/hostnames.yaml \
            -e /home/stack/composable_roles/debug.yaml \
            -e /home/stack/composable_roles/config_heat.yaml \
            -e /home/stack/composable_roles/docker-images.yaml \
            -e /usr/share/openstack-tripleo-heat-templates/environments/lifecycle/upgrade-converge.yaml \
            --roles-file /home/stack/composable_roles/roles/roles_data.yaml 2>&1

3. Check ceph's status after upgrade finishes

Comment 1 Giulio Fidente 2018-05-08 11:06:06 UTC

*** This bug has been marked as a duplicate of bug 1574424 ***