Bug 1613155

Summary: [ceph-ansible] Do not allow ceph cluster creation when mon_use_fqdn and mds_use_fqdn set to true
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Sidhant Agrawal <sagrawal>
Component: Ceph-AnsibleAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Sidhant Agrawal <sagrawal>
Severity: high Docs Contact: Aron Gunn <agunn>
Priority: unspecified    
Version: 3.1CC: agunn, aschoen, asriram, ceph-eng-bugs, edonnell, gabrioux, gmeno, hnallurv, kdreyer, nthomas, sankarshan, shan, tserlin
Target Milestone: rc   
Target Release: 3.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.1.0-0.1.rc18 Ubuntu: ceph-ansible_3.1.0~rc18-2redhat1 Doc Type: Bug Fix
Doc Text:
.Setting the `mon_use_fqdn` or the `mds_use_fqdn` options to `true` fails the Ceph Ansible playbook Starting with {product} 3.1, Red Hat no longer supports deployments with fully qualified domain names. If either the `mon_use_fqdn` or `mds_use_fqdn` options are set to `true`, then the Ceph Ansible playbook will fail. If the storage cluster is already configured with fully qualified domain names, then you must set the `use_fqdn_yes_i_am_sure` option to `true` in the `group_vars/all.yml` file.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-26 18:23:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1584264    
Attachments:
Description Flags
ansible-playbook logs none

Description Sidhant Agrawal 2018-08-07 07:01:38 UTC
Created attachment 1473856 [details]
ansible-playbook logs

Description of problem:
Ansible Playbook fails during installation when mon_use_fqdn: true in the following task:
TASK [ceph-mon : create ceph mgr keyring(s) when mon is containerized] *********************************************************************************************************
2018-08-05 05:38:25,749 p=30613 u=ubuntu |  task path: /usr/share/ceph-ansible/roles/ceph-mon/tasks/docker/main.yml:97
2018-08-05 05:38:25,749 p=30613 u=ubuntu |  Sunday 05 August 2018  05:38:25 +0000 (0:00:00.031)       0:10:58.337 ********* 
2018-08-05 05:38:25,821 p=30613 u=ubuntu |   [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: {{ groups.get(mgr_group_name, []) | length > 0 }}

2018-08-05 05:38:25,890 p=30613 u=ubuntu |  Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
2018-08-05 05:43:26,360 p=30613 u=ubuntu |  failed: [magna044] (item=magna006) => {
    "changed": false, 
    "cmd": [
        "docker", 
        "exec", 
        "ceph-mon-magna044", 
        "ceph", 
        "--cluster", 
        "ceph", 
        "auth", 
        "get-or-create", 
        "mgr.magna006", 
        "mon", 
        "allow profile mgr", 
        "osd", 
        "allow *", 
        "mds", 
        "allow *", 
        "-o", 
        "/etc/ceph/ceph.mgr.magna006.keyring"
    ], 
    "delta": "0:05:00.249756", 
    "end": "2018-08-05 05:43:26.338793", 
    "invocation": {
        "module_args": {
            "_raw_params": "docker exec ceph-mon-magna044 ceph --cluster ceph auth get-or-create mgr.magna006 mon 'allow profile mgr' osd 'allow *' mds 'allow *' -o /etc/ceph/ceph.mgr.magna006.keyring", 
            "_uses_shell": false, 
            "chdir": null, 
            "creates": "/etc/ceph/ceph.mgr.magna006.keyring", 
            "executable": null, 
            "removes": null, 
            "stdin": null, 
            "warn": true
        }
    }, 
    "item": "magna006", 
    "msg": "non-zero return code", 
    "rc": 1, 
    "start": "2018-08-05 05:38:26.089037", 
    "stderr": "2018-08-05 05:43:26.301639 7fa9e165b700  0 monclient(hunting): authenticate timed out after 300\n2018-08-05 05:43:26.301680 7fa9e165b700  0 librados: client.admin authentication error (110) Connection timed out\n[errno 110] error connecting to the cluster", 
    "stderr_lines": [
        "2018-08-05 05:43:26.301639 7fa9e165b700  0 monclient(hunting): authenticate timed out after 300", 
        "2018-08-05 05:43:26.301680 7fa9e165b700  0 librados: client.admin authentication error (110) Connection timed out", 
        "[errno 110] error connecting to the cluster"
    ], 
    "stdout": "", 
    "stdout_lines": []
}

Version-Release number of selected component (if applicable):

ceph-ansible-3.1.0-0.1.rc12.el7cp.noarch

How reproducible:
Always

Steps to Reproduce:
1. Follow doc to deploy containerised ceph cluster with mon_use_fqdn: true in all.yml
2. Run playbook

Actual results:
Deployment of containerised ceph cluster with mon_use_fqdn: true in all.yml fails.

Expected results:
Deployment of containerised ceph cluster with mon_use_fqdn: true in all.yml should succeed.

Additional info:

Comment 3 Sébastien Han 2018-08-07 14:23:56 UTC
This feature is not supported anymore, we only keep it alive for existing clusters. So it is not encouraged to use it on new deployment.

We need to reflect this on the doc, I don't see any bug here.
Thanks.

Comment 4 Harish NV Rao 2018-08-09 13:29:01 UTC
(In reply to leseb from comment #3)
> This feature is not supported anymore, we only keep it alive for existing
> clusters. So it is not encouraged to use it on new deployment.
> 
> We need to reflect this on the doc, I don't see any bug here.
> Thanks.

Based on above comment i am changing the component to documentation

Comment 5 Harish NV Rao 2018-08-09 13:30:46 UTC
(In reply to Harish NV Rao from comment #4)
> (In reply to leseb from comment #3)
> > This feature is not supported anymore, we only keep it alive for existing
> > clusters. So it is not encouraged to use it on new deployment.
> > 
> > We need to reflect this on the doc, I don't see any bug here.
> > Thanks.
> 
> Based on above comment i am changing the component to documentation

Sebastien, I saw your previous update on attaching a pr to this bz. Will this be fixed in 3.1 as part of ceph-ansible?

Comment 6 Sébastien Han 2018-08-09 13:37:37 UTC
It'll be fixed in the sense that we don't allow this kind of deployments anymore. So doc is still the right component.

Comment 7 Harish NV Rao 2018-08-09 13:48:37 UTC
In 3.1, we were able to deploy the baremetal RHEL based ceph cluster with 'mon_use_fqdn: true'. Is this option going to be blocked for both baremetal and container now?

Comment 8 Sébastien Han 2018-08-09 13:56:33 UTC
Yes Harish, this option is going to be blocked as of 3.1 for both container and non-container deployments.

Comment 9 Anjana Suparna Sriram 2018-08-13 11:31:39 UTC
(In reply to leseb from comment #8)
> Yes Harish, this option is going to be blocked as of 3.1 for both container
> and non-container deployments.

wouldn't this be part of the known issues for 3.1 release? If yes, kindly change the doc type and provide the relevant doc text for this bug.

Comment 10 Sébastien Han 2018-08-14 14:24:19 UTC
Sure, just did.

Comment 15 Harish NV Rao 2018-08-29 08:03:15 UTC
(In reply to leseb from comment #8)
> Yes Harish, this option is going to be blocked as of 3.1 for both container
> and non-container deployments.

Based on above, following needs to be done with this BZ
1) Change the summary to "[ceph-ansible] Do not allow ceph cluster creation when mon_use_fqdn and mds_use_fqdn set to true"
2) QE to verify the BZ by making sure that the cluster creation fails when 'mon_use_fqdn` and `mds_use_fqdn` are set to true.
3) Doc team to move this bug in RN from Known Issue section to the section which tells about issues fixed.

Comment 16 Sidhant Agrawal 2018-08-29 12:04:00 UTC
Verified with ceph-ansible-3.1.0-0.1.rc21.el7cp

The Ceph Ansible playbook fails if either the 'mon_use_fqdn' or 'mds_use_fqdn' options are set to 'true' in all.yml.

Moving the BZ to Verified.

Comment 19 errata-xmlrpc 2018-09-26 18:23:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819