Bug 1576955

Summary: OSP 12 deployments or upgrades fail with Ceph 2.5-x containers and CephAnsibleExtraConfig: mon_use_fqdn: true
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Matt Flusche <mflusche>
Component: Ceph-AnsibleAssignee: Guillaume Abrioux <gabrioux>
Status: CLOSED EOL QA Contact: Vasishta <vashastr>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.5CC: adeza, anharris, aschoen, aschultz, ceph-eng-bugs, gabrioux, gfidente, gmeno, mburns, msufiyan, nthomas, sankarshan
Target Milestone: rc   
Target Release: 2.*   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-27 05:01:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
docker logs ceph-mon-overcloud-controller output
none
ceph-install-workflow log none

Description Matt Flusche 2018-05-10 20:16:00 UTC
Description of problem:

I was upgrading a functioning containerized OSP 12 environment that was running ceph 2.4 containers.  The update deployment failed and ceph broke.  The mon containers on the controller were running but not responsive.

The deployment will time-out at WorkflowTasks_Step2_Execution

I trace the issue down to the following deployment parameters.

parameter_defaults:
  CephAnsibleExtraConfig:
    mon_use_fqdn: true

With this parameter ceph 2.5x mon containers don't seem to function.  I tested with the container version 2.5-3 & 2.5-4.  ceph 2.4 containers function fine with this parameter; downgrading to the 2.4 containers will fix the deployment.

Version-Release number of selected component (if applicable):
ceph 2.5-3 and 2.5-4 containers

How reproducible:
100%

Steps to Reproduce:
1. deploy current osp12 with ceph-ansible and the following parameters
parameter_defaults:
  CephAnsibleExtraConfig:
    mon_use_fqdn: true

test deployment to reproduce:

openstack overcloud deploy \
--templates /usr/share/openstack-tripleo-heat-templates \
--ntp-server 192.168.0.10 \
--timeout 120 \
-e /home/stack/templates/env.yaml \
-e /home/stack/templates/overcloud_images.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml \
--log-file /tmp/deploy.log 

=======
env.yaml
=======
parameter_defaults:
  OvercloudCephStorageFlavor: ceph
  CephStorageCount: 1
  ControllerCount: 1
  ComputeCount: 1
  CloudDomain: example.com
  DockerInsecureRegistryAddress: 172.16.5.1:8787
  CephPoolDefaultSize: 1
  CephAnsibleDisksConfig:
    devices:
      - /dev/vdb
  CephAnsibleExtraConfig:
    mon_use_fqdn: true
=======


Removing "mon_use_fqdn: true" will result in a successful deployment.


I'll attach the ceph container log and ceph-ansible log from a new test deployment.

Comment 1 Matt Flusche 2018-05-10 20:22:06 UTC
Created attachment 1434550 [details]
docker logs ceph-mon-overcloud-controller output

Comment 2 Matt Flusche 2018-05-10 20:23:02 UTC
Created attachment 1434551 [details]
ceph-install-workflow log

Comment 7 Guillaume Abrioux 2018-06-04 11:23:54 UTC
I'm trying to reproduce in an env with RHEL VMs. I'll update this BZ accordingly.

Comment 8 Giulio Fidente 2018-06-04 11:37:03 UTC
*** Bug 1581593 has been marked as a duplicate of this bug. ***

Comment 9 Giridhar Ramaraju 2019-08-05 13:06:47 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 10 Giridhar Ramaraju 2019-08-05 13:09:24 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri