Bug 1673687

Summary: Failure creating ceph.conf for mon - No first item, sequence was empty.
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Ian Pilcher <ipilcher>
Component: Ceph-AnsibleAssignee: Dimitri Savineau <dsavinea>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2CC: agunn, anharris, aschoen, ceph-eng-bugs, dsavinea, gabrioux, gfidente, gmeno, ipilcher, johfulto, nthomas, sankarshan, tchandra, tserlin
Target Milestone: z2   
Target Release: 3.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.2.10-1.el7cp Ubuntu: ceph-ansible_3.2.10-2redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-30 15:56:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1578730    
Attachments:
Description Flags
ansible.log
none
Deploy script
none
containers-prepare-parameter.yaml
none
overcloud-config.yaml
none
ceph-config.yaml
none
/var/lib/mistral
none
ceph-ansible run directory none

Description Ian Pilcher 2019-02-07 19:07:08 UTC
Attempting to deploy OSP 14 w/ Ceph and receiving the following error:

"TASK [ceph-config : generate ceph.conf configuration file] *********************", 
"Thursday 07 February 2019  12:34:20 -0600 (0:00:00.310)       0:01:17.616 ***** ", 
"fatal: [overcloud-controller-2]: FAILED! => {\"msg\": \"No first item, sequence was empty.\"}", 
"NO MORE HOSTS LEFT *************************************************************",

Comment 1 Ian Pilcher 2019-02-07 19:08:00 UTC
Created attachment 1527884 [details]
ansible.log

Comment 2 Ian Pilcher 2019-02-07 19:09:34 UTC
Created attachment 1527885 [details]
Deploy script

Comment 3 Ian Pilcher 2019-02-07 19:10:30 UTC
Created attachment 1527886 [details]
containers-prepare-parameter.yaml

Comment 4 Ian Pilcher 2019-02-07 19:12:58 UTC
Created attachment 1527887 [details]
overcloud-config.yaml

Comment 5 Ian Pilcher 2019-02-07 19:14:02 UTC
Created attachment 1527888 [details]
ceph-config.yaml

Comment 6 John Fulton 2019-02-07 19:29:25 UTC
can you attach a sosreport from your undercloud and a tarball of /var/lib/mistral/ ?

Comment 7 Ian Pilcher 2019-02-07 19:38:43 UTC
Tarball of /var/lib/mistral is at http://www.penurio.us/pub/var_lib_mistral.tar

Comment 8 John Fulton 2019-02-07 19:45:26 UTC
what version of ceph-ansible are you using?

that would have been in the sosreport which would be better, but i need to know the CA version.

Comment 9 Ian Pilcher 2019-02-07 19:49:27 UTC
Created attachment 1527901 [details]
/var/lib/mistral

Comment 10 Ian Pilcher 2019-02-07 19:49:56 UTC
(In reply to John Fulton from comment #8)
> what version of ceph-ansible are you using?
> 
> that would have been in the sosreport which would be better, but i need to
> know the CA version.

sosreport is coming.

Comment 11 Ian Pilcher 2019-02-07 19:52:44 UTC
sosreport is at http://www.penurio.us/pub/sosreport-undercloud-2019-02-07-jlllimi.tar.xz

Comment 12 John Fulton 2019-02-13 14:40:10 UTC
Created attachment 1534426 [details]
ceph-ansible run directory

Comment 13 John Fulton 2019-02-13 14:47:02 UTC
Guillaume,

This bug happened with ceph-ansible-3.2.5-1.el7cp.noarch you can see the ceph-ansible run directory by downloading it from:

 https://bugzilla.redhat.com/attachment.cgi?id=1534426

It contains the following after you untar it:

[fultonj@skagra ceph-ansible{master}]$ ls | sort 
ceph_ansible_command.log
extra_vars.yml
fetch_dir
group_vars
host_vars
inventory.yml
nodes_uuid_command.log
nodes_uuid_data.json
nodes_uuid_playbook.yml
[fultonj@skagra ceph-ansible{master}]$

Comment 16 Dimitri Savineau 2019-03-12 21:49:26 UTC
The issue comes from a TripleO misconfiguration

The ansible error refers to https://github.com/ceph/ceph-ansible/blob/stable-3.2/roles/ceph-config/templates/ceph.conf.j2#L83

Because TripleO is using the monitor_address_block variable to determine the mon ip address to bind, it try to find an ip address in the ansible ipaddress list fact (ansible_all_ipv4_addresses) that in part of the network defined in monitor_address_block.
If there's no match then an empty array will be returned by the ipaddr filter resulting the first filter to fail with 'The error was: No first item, sequence was empty'

----
$ grep monitor_address_block group_vars/all.yml 
monitor_address_block: 192.168.24.0/24
----

cluster_network and public_network use that network too.

But all overcloud nodes are configured with 192.168.19.0/24 network cidr (which is the default ctrlplane network).

----
{
  "overcloud-cephstorage-0": "[192.168.19.104]*,[overcloud-cephstorage-0.localdomain]*,[overcloud-cephstorage-0]*,[192.168.19.104]*,[overcloud-cephstorage-0.storage.localdomain]*,[overcloud-cephstorage-0.storage]*,[192.168.19.104]*,[overcloud-cephstorage-0.storagemgmt.localdomain]*,[overcloud-cephstorage-0.storagemgmt]*,[192.168.19.104]*,[overcloud-cephstorage-0.internalapi.localdomain]*,[overcloud-cephstorage-0.internalapi]*,[192.168.19.104]*,[overcloud-cephstorage-0.tenant.localdomain]*,[overcloud-cephstorage-0.tenant]*,[192.168.19.104]*,[overcloud-cephstorage-0.external.localdomain]*,[overcloud-cephstorage-0.external]*,[192.168.19.104]*,[overcloud-cephstorage-0.management.localdomain]*,[overcloud-cephstorage-0.management]*,[192.168.19.104]*,[overcloud-cephstorage-0.ctlplane.localdomain]*,[overcloud-cephstorage-0.ctlplane]*",
  "overcloud-cephstorage-1": "[192.168.19.107]*,[overcloud-cephstorage-1.localdomain]*,[overcloud-cephstorage-1]*,[192.168.19.107]*,[overcloud-cephstorage-1.storage.localdomain]*,[overcloud-cephstorage-1.storage]*,[192.168.19.107]*,[overcloud-cephstorage-1.storagemgmt.localdomain]*,[overcloud-cephstorage-1.storagemgmt]*,[192.168.19.107]*,[overcloud-cephstorage-1.internalapi.localdomain]*,[overcloud-cephstorage-1.internalapi]*,[192.168.19.107]*,[overcloud-cephstorage-1.tenant.localdomain]*,[overcloud-cephstorage-1.tenant]*,[192.168.19.107]*,[overcloud-cephstorage-1.external.localdomain]*,[overcloud-cephstorage-1.external]*,[192.168.19.107]*,[overcloud-cephstorage-1.management.localdomain]*,[overcloud-cephstorage-1.management]*,[192.168.19.107]*,[overcloud-cephstorage-1.ctlplane.localdomain]*,[overcloud-cephstorage-1.ctlplane]*",
  "overcloud-cephstorage-2": "[192.168.19.105]*,[overcloud-cephstorage-2.localdomain]*,[overcloud-cephstorage-2]*,[192.168.19.105]*,[overcloud-cephstorage-2.storage.localdomain]*,[overcloud-cephstorage-2.storage]*,[192.168.19.105]*,[overcloud-cephstorage-2.storagemgmt.localdomain]*,[overcloud-cephstorage-2.storagemgmt]*,[192.168.19.105]*,[overcloud-cephstorage-2.internalapi.localdomain]*,[overcloud-cephstorage-2.internalapi]*,[192.168.19.105]*,[overcloud-cephstorage-2.tenant.localdomain]*,[overcloud-cephstorage-2.tenant]*,[192.168.19.105]*,[overcloud-cephstorage-2.external.localdomain]*,[overcloud-cephstorage-2.external]*,[192.168.19.105]*,[overcloud-cephstorage-2.management.localdomain]*,[overcloud-cephstorage-2.management]*,[192.168.19.105]*,[overcloud-cephstorage-2.ctlplane.localdomain]*,[overcloud-cephstorage-2.ctlplane]*",
  "overcloud-compute-0": "[192.168.19.113]*,[overcloud-compute-0.localdomain]*,[overcloud-compute-0]*,[192.168.19.113]*,[overcloud-compute-0.storage.localdomain]*,[overcloud-compute-0.storage]*,[192.168.19.113]*,[overcloud-compute-0.storagemgmt.localdomain]*,[overcloud-compute-0.storagemgmt]*,[192.168.19.113]*,[overcloud-compute-0.internalapi.localdomain]*,[overcloud-compute-0.internalapi]*,[192.168.19.113]*,[overcloud-compute-0.tenant.localdomain]*,[overcloud-compute-0.tenant]*,[192.168.19.113]*,[overcloud-compute-0.external.localdomain]*,[overcloud-compute-0.external]*,[192.168.19.113]*,[overcloud-compute-0.management.localdomain]*,[overcloud-compute-0.management]*,[192.168.19.113]*,[overcloud-compute-0.ctlplane.localdomain]*,[overcloud-compute-0.ctlplane]*",
  "overcloud-compute-1": "[192.168.19.110]*,[overcloud-compute-1.localdomain]*,[overcloud-compute-1]*,[192.168.19.110]*,[overcloud-compute-1.storage.localdomain]*,[overcloud-compute-1.storage]*,[192.168.19.110]*,[overcloud-compute-1.storagemgmt.localdomain]*,[overcloud-compute-1.storagemgmt]*,[192.168.19.110]*,[overcloud-compute-1.internalapi.localdomain]*,[overcloud-compute-1.internalapi]*,[192.168.19.110]*,[overcloud-compute-1.tenant.localdomain]*,[overcloud-compute-1.tenant]*,[192.168.19.110]*,[overcloud-compute-1.external.localdomain]*,[overcloud-compute-1.external]*,[192.168.19.110]*,[overcloud-compute-1.management.localdomain]*,[overcloud-compute-1.management]*,[192.168.19.110]*,[overcloud-compute-1.ctlplane.localdomain]*,[overcloud-compute-1.ctlplane]*",
  "overcloud-controller-0": "[192.168.19.130]*,[overcloud-controller-0.localdomain]*,[overcloud-controller-0]*,[192.168.19.130]*,[overcloud-controller-0.storage.localdomain]*,[overcloud-controller-0.storage]*,[192.168.19.130]*,[overcloud-controller-0.storagemgmt.localdomain]*,[overcloud-controller-0.storagemgmt]*,[192.168.19.130]*,[overcloud-controller-0.internalapi.localdomain]*,[overcloud-controller-0.internalapi]*,[192.168.19.130]*,[overcloud-controller-0.tenant.localdomain]*,[overcloud-controller-0.tenant]*,[192.168.19.130]*,[overcloud-controller-0.external.localdomain]*,[overcloud-controller-0.external]*,[192.168.19.130]*,[overcloud-controller-0.management.localdomain]*,[overcloud-controller-0.management]*,[192.168.19.130]*,[overcloud-controller-0.ctlplane.localdomain]*,[overcloud-controller-0.ctlplane]*",
  "overcloud-controller-1": "[192.168.19.109]*,[overcloud-controller-1.localdomain]*,[overcloud-controller-1]*,[192.168.19.109]*,[overcloud-controller-1.storage.localdomain]*,[overcloud-controller-1.storage]*,[192.168.19.109]*,[overcloud-controller-1.storagemgmt.localdomain]*,[overcloud-controller-1.storagemgmt]*,[192.168.19.109]*,[overcloud-controller-1.internalapi.localdomain]*,[overcloud-controller-1.internalapi]*,[192.168.19.109]*,[overcloud-controller-1.tenant.localdomain]*,[overcloud-controller-1.tenant]*,[192.168.19.109]*,[overcloud-controller-1.external.localdomain]*,[overcloud-controller-1.external]*,[192.168.19.109]*,[overcloud-controller-1.management.localdomain]*,[overcloud-controller-1.management]*,[192.168.19.109]*,[overcloud-controller-1.ctlplane.localdomain]*,[overcloud-controller-1.ctlplane]*",
  "overcloud-controller-2": "[192.168.19.125]*,[overcloud-controller-2.localdomain]*,[overcloud-controller-2]*,[192.168.19.125]*,[overcloud-controller-2.storage.localdomain]*,[overcloud-controller-2.storage]*,[192.168.19.125]*,[overcloud-controller-2.storagemgmt.localdomain]*,[overcloud-controller-2.storagemgmt]*,[192.168.19.125]*,[overcloud-controller-2.internalapi.localdomain]*,[overcloud-controller-2.internalapi]*,[192.168.19.125]*,[overcloud-controller-2.tenant.localdomain]*,[overcloud-controller-2.tenant]*,[192.168.19.125]*,[overcloud-controller-2.external.localdomain]*,[overcloud-controller-2.external]*,[192.168.19.125]*,[overcloud-controller-2.management.localdomain]*,[overcloud-controller-2.management]*,[192.168.19.125]*,[overcloud-controller-2.ctlplane.localdomain]*,[overcloud-controller-2.ctlplane]*"
}
----

Only overcloud-controller-2 node fails because ceph-ansible deploys mons in container sequentially https://github.com/ceph/ceph-ansible/blob/stable-3.2/site-docker.yml.sample#L101

We propably need to modify ceph-validate role to add a test on that.

Comment 17 Ian Pilcher 2019-03-15 15:40:27 UTC
(In reply to Dimitri Savineau from comment #16)
> The issue comes from a TripleO misconfiguration

Do you mean that there's an error in the templates (very possible) or a bug in TripleO?

Comment 18 Dimitri Savineau 2019-03-15 16:00:08 UTC
> Do you mean that there's an error in the templates (very possible) or a bug in TripleO?
Probably a bug in TripleO.

You're ctlplane network was configured with 192.168.19.0/24 (I assume you can found this value configured in the undercloud.conf file) and the overlcoud networks are reusing that network according to the ansible log.
But the network cidr value generated by TripleO (via mistral I guess) as an input for ceph-ansible is wrong.
The value generated is still the default ctlplane network value (192.168.24.0/24) for public_network, cluster_networt and monitor_address_block and doesn't reflect the real value configured.

Comment 23 errata-xmlrpc 2019-04-30 15:56:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0911