.Installing and upgrading containerized Ceph fails
Using Full Qualified Domain Names (FQDN) in the `/etc/hostname` file for containerized Ceph deployments will fail when installing and upgrading Ceph.
When using the `ceph-ansible` playbook to install Ceph, the installation will fail with the following error message:
----
"msg": "The task includes an option with an undefined variable. The error was: 'osd_pool_default_pg_num' is undefined
----
To work around the installation failure, change the FQDN in the `/etc/hostname` file to the short host name on all nodes in the storage cluster. Next, rerun the `ceph-ansible` playbook to install Ceph.
When upgrading Ceph with the `rolling_update` playbook, the upgrade will fail with the following error message:
----
"FAILED - RETRYING: container | waiting for the containerized monitor to join the quorum"
----
To work around the upgrade failure, change the FQDN in the `/etc/hostname` file to the short host name on all nodes in the storage cluster. Next, restart the corresponding Ceph daemons running on each node in the storage cluster, then rerun the `rolling_update` playbook to upgrade Ceph.
DescriptionRamakrishnan Periyasamy
2018-02-16 12:01:23 UTC
Created attachment 1396996[details]
ansible-playbook logs.
Description of problem:
Ceph-ansible fails during upgrade of container cluster from 2.4 to 2.5 for waiting cluster to form quorum even though quorum is there
Thanks guillaume for debugging this.
Failure message: "FAILED - RETRYING: container | waiting for the containerized monitor to join the quorum"
2018-02-16 11:40:10,566 p=26854 u=ubuntu | FAILED - RETRYING: container | waiting for the containerized monitor to join the quorum... (5 retries left).Result was: {
"attempts": 1,
"changed": true,
"cmd": [
"docker",
"exec",
"ceph-mon-magna082",
"ceph",
"--cluster",
"slave",
"-s",
"--format",
"json"
],
"delta": "0:00:00.305037",
"end": "2018-02-16 11:40:10.527565",
"invocation": {
"module_args": {
"_raw_params": "docker exec ceph-mon-magna082 ceph --cluster \"slave\" -s --format json",
"_uses_shell": false,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"warn": true
}
},
"rc": 0,
"retries": 6,
"start": "2018-02-16 11:40:10.222528",
"stderr": "",
"stderr_lines": [],
"stdout": "\n{\"health\":{\"health\":{\"health_services\":[{\"mons\":[{\"name\":\"magna069.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4429200,\"kb_avail\":908013800,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:42.396679\",\"store_stats\":{\"bytes_total\":57521079,\"bytes_sst\":55423911,\"bytes_log\":2031616,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"},{\"name\":\"magna072.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4276336,\"kb_avail\":908166664,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:34.496080\",\"store_stats\":{\"bytes_total\":34111166,\"bytes_sst\":30965422,\"bytes_log\":3080192,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"},{\"name\":\"magna082.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4287284,\"kb_avail\":908155716,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:48.694097\",\"store_stats\":{\"bytes_total\":35160310,\"bytes_sst\":30965990,\"bytes_log\":4128768,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"}]}]},\"timechecks\":{\"epoch\":56,\"round\":2,\"round_status\":\"finished\",\"mons\":[{\"name\":\"magna069.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.000000,\"health\":\"HEALTH_OK\"},{\"name\":\"magna072.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.012728,\"health\":\"HEALTH_OK\"},{\"name\":\"magna082.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.042653,\"health\":\"HEALTH_OK\"}]},\"summary\":[],\"overall_status\":\"HEALTH_OK\",\"detail\":[]},\"fsid\":\"5362be02-bf26-4b66-ac09-7496cadcd801\",\"election_epoch\":56,\"quorum\":[0,1,2],\"quorum_names\":[\"magna069.ceph.redhat.com\",\"magna072.ceph.redhat.com\",\"magna082.ceph.redhat.com\"],\"monmap\":{\"epoch\":5,\"fsid\":\"5362be02-bf26-4b66-ac09-7496cadcd801\",\"modified\":\"2018-02-16 09:42:21.176234\",\"created\":\"2018-02-13 10:38:27.533432\",\"mons\":[{\"rank\":0,\"name\":\"magna069.ceph.redhat.com\",\"addr\":\"10.8.128.69:6789\\/0\"},{\"rank\":1,\"name\":\"magna072.ceph.redhat.com\",\"addr\":\"10.8.128.72:6789\\/0\"},{\"rank\":2,\"name\":\"magna082.ceph.redhat.com\",\"addr\":\"10.8.128.82:6789\\/0\"}]},\"osdmap\":{\"osdmap\":{\"epoch\":5261,\"num_osds\":8,\"num_up_osds\":8,\"num_in_osds\":8,\"full\":false,\"nearfull\":false,\"num_remapped_pgs\":0}},\"pgmap\":{\"pgs_by_state\":[{\"state_name\":\"active+clean\",\"count\":288}],\"version\":176347,\"num_pgs\":288,\"data_bytes\":5921356152,\"bytes_used\":18343018496,\"bytes_avail\":7947121070080,\"bytes_total\":7965464088576},\"fsmap\":{\"epoch\":5,\"id\":1,\"up\":1,\"in\":1,\"max\":1,\"by_rank\":[{\"filesystem_id\":1,\"rank\":0,\"name\":\"magna118\",\"status\":\"up:active\"}]}}",
"stdout_lines": [
"",
"{\"health\":{\"health\":{\"health_services\":[{\"mons\":[{\"name\":\"magna069.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4429200,\"kb_avail\":908013800,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:42.396679\",\"store_stats\":{\"bytes_total\":57521079,\"bytes_sst\":55423911,\"bytes_log\":2031616,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"},{\"name\":\"magna072.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4276336,\"kb_avail\":908166664,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:34.496080\",\"store_stats\":{\"bytes_total\":34111166,\"bytes_sst\":30965422,\"bytes_log\":3080192,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"},{\"name\":\"magna082.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4287284,\"kb_avail\":908155716,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:48.694097\",\"store_stats\":{\"bytes_total\":35160310,\"bytes_sst\":30965990,\"bytes_log\":4128768,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"}]}]},\"timechecks\":{\"epoch\":56,\"round\":2,\"round_status\":\"finished\",\"mons\":[{\"name\":\"magna069.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.000000,\"health\":\"HEALTH_OK\"},{\"name\":\"magna072.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.012728,\"health\":\"HEALTH_OK\"},{\"name\":\"magna082.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.042653,\"health\":\"HEALTH_OK\"}]},\"summary\":[],\"overall_status\":\"HEALTH_OK\",\"detail\":[]},\"fsid\":\"5362be02-bf26-4b66-ac09-7496cadcd801\",\"election_epoch\":56,\"quorum\":[0,1,2],\"quorum_names\":[\"magna069.ceph.redhat.com\",\"magna072.ceph.redhat.com\",\"magna082.ceph.redhat.com\"],\"monmap\":{\"epoch\":5,\"fsid\":\"5362be02-bf26-4b66-ac09-7496cadcd801\",\"modified\":\"2018-02-16 09:42:21.176234\",\"created\":\"2018-02-13 10:38:27.533432\",\"mons\":[{\"rank\":0,\"name\":\"magna069.ceph.redhat.com\",\"addr\":\"10.8.128.69:6789\\/0\"},{\"rank\":1,\"name\":\"magna072.ceph.redhat.com\",\"addr\":\"10.8.128.72:6789\\/0\"},{\"rank\":2,\"name\":\"magna082.ceph.redhat.com\",\"addr\":\"10.8.128.82:6789\\/0\"}]},\"osdmap\":{\"osdmap\":{\"epoch\":5261,\"num_osds\":8,\"num_up_osds\":8,\"num_in_osds\":8,\"full\":false,\"nearfull\":false,\"num_remapped_pgs\":0}},\"pgmap\":{\"pgs_by_state\":[{\"state_name\":\"active+clean\",\"count\":288}],\"version\":176347,\"num_pgs\":288,\"data_bytes\":5921356152,\"bytes_used\":18343018496,\"bytes_avail\":7947121070080,\"bytes_total\":7965464088576},\"fsmap\":{\"epoch\":5,\"id\":1,\"up\":1,\"in\":1,\"max\":1,\"by_rank\":[{\"filesystem_id\":1,\"rank\":0,\"name\":\"magna118\",\"status\":\"up:active\"}]}}"
]
}
2018-02-16 11:40:17,013 p=26854 u=ubuntu | [ERROR]: User interrupted execution
Version-Release number of selected component (if applicable):
ceph-ansible-3.0.25-1.el7cp.noarch
ansible-2.4.2.0-2.el7.noarch
How reproducible:
10/10
Steps to Reproduce:
1. Configure 2.4 cluster
2. update 2.5 ansible packges
3. upgrade using ceph-ansible, followed the official doc.
Actual results:
Even though quorum is there ansible fails with unable to form quorum error.
Expected results:
NA
Additional info:
NA
Observed the similar failure during installation [kernel updated]
Changed the hostname from FQDN to short hostname in /etc/hostname, the installation completed successfully.
Comment 26Guillaume Abrioux
2018-02-19 17:19:05 UTC
v3.0.26 should fix this issue
Comment 40Guillaume Abrioux
2018-02-20 13:42:04 UTC
Hi Ramakrishnan,
the initial error reported in this BZ is fixed in ceph-ansible v3.0.26
the error reported in c28 is fixed in the container image ceph-2-rhel-7-docker-candidate-62031-20180220125431
Comment 43Ramakrishnan Periyasamy
2018-02-20 14:01:13 UTC
(In reply to Guillaume Abrioux from comment #40)
> Hi Ramakrishnan,
>
> the initial error reported in this BZ is fixed in ceph-ansible v3.0.26
> the error reported in c28 is fixed in the container image
> ceph-2-rhel-7-docker-candidate-62031-20180220125431
Thanks for the update Guillaume and thanks for your time to troubleshoot the issue and explaining me about the problem.
Comment 54Ramakrishnan Periyasamy
2018-02-21 06:00:08 UTC
Hi Aron,
Provided doc text for release notes, clearing the needinfo tag.
Regards,
Ramakrishnan
Created attachment 1396996 [details] ansible-playbook logs. Description of problem: Ceph-ansible fails during upgrade of container cluster from 2.4 to 2.5 for waiting cluster to form quorum even though quorum is there Thanks guillaume for debugging this. Failure message: "FAILED - RETRYING: container | waiting for the containerized monitor to join the quorum" 2018-02-16 11:40:10,566 p=26854 u=ubuntu | FAILED - RETRYING: container | waiting for the containerized monitor to join the quorum... (5 retries left).Result was: { "attempts": 1, "changed": true, "cmd": [ "docker", "exec", "ceph-mon-magna082", "ceph", "--cluster", "slave", "-s", "--format", "json" ], "delta": "0:00:00.305037", "end": "2018-02-16 11:40:10.527565", "invocation": { "module_args": { "_raw_params": "docker exec ceph-mon-magna082 ceph --cluster \"slave\" -s --format json", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true } }, "rc": 0, "retries": 6, "start": "2018-02-16 11:40:10.222528", "stderr": "", "stderr_lines": [], "stdout": "\n{\"health\":{\"health\":{\"health_services\":[{\"mons\":[{\"name\":\"magna069.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4429200,\"kb_avail\":908013800,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:42.396679\",\"store_stats\":{\"bytes_total\":57521079,\"bytes_sst\":55423911,\"bytes_log\":2031616,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"},{\"name\":\"magna072.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4276336,\"kb_avail\":908166664,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:34.496080\",\"store_stats\":{\"bytes_total\":34111166,\"bytes_sst\":30965422,\"bytes_log\":3080192,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"},{\"name\":\"magna082.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4287284,\"kb_avail\":908155716,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:48.694097\",\"store_stats\":{\"bytes_total\":35160310,\"bytes_sst\":30965990,\"bytes_log\":4128768,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"}]}]},\"timechecks\":{\"epoch\":56,\"round\":2,\"round_status\":\"finished\",\"mons\":[{\"name\":\"magna069.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.000000,\"health\":\"HEALTH_OK\"},{\"name\":\"magna072.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.012728,\"health\":\"HEALTH_OK\"},{\"name\":\"magna082.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.042653,\"health\":\"HEALTH_OK\"}]},\"summary\":[],\"overall_status\":\"HEALTH_OK\",\"detail\":[]},\"fsid\":\"5362be02-bf26-4b66-ac09-7496cadcd801\",\"election_epoch\":56,\"quorum\":[0,1,2],\"quorum_names\":[\"magna069.ceph.redhat.com\",\"magna072.ceph.redhat.com\",\"magna082.ceph.redhat.com\"],\"monmap\":{\"epoch\":5,\"fsid\":\"5362be02-bf26-4b66-ac09-7496cadcd801\",\"modified\":\"2018-02-16 09:42:21.176234\",\"created\":\"2018-02-13 10:38:27.533432\",\"mons\":[{\"rank\":0,\"name\":\"magna069.ceph.redhat.com\",\"addr\":\"10.8.128.69:6789\\/0\"},{\"rank\":1,\"name\":\"magna072.ceph.redhat.com\",\"addr\":\"10.8.128.72:6789\\/0\"},{\"rank\":2,\"name\":\"magna082.ceph.redhat.com\",\"addr\":\"10.8.128.82:6789\\/0\"}]},\"osdmap\":{\"osdmap\":{\"epoch\":5261,\"num_osds\":8,\"num_up_osds\":8,\"num_in_osds\":8,\"full\":false,\"nearfull\":false,\"num_remapped_pgs\":0}},\"pgmap\":{\"pgs_by_state\":[{\"state_name\":\"active+clean\",\"count\":288}],\"version\":176347,\"num_pgs\":288,\"data_bytes\":5921356152,\"bytes_used\":18343018496,\"bytes_avail\":7947121070080,\"bytes_total\":7965464088576},\"fsmap\":{\"epoch\":5,\"id\":1,\"up\":1,\"in\":1,\"max\":1,\"by_rank\":[{\"filesystem_id\":1,\"rank\":0,\"name\":\"magna118\",\"status\":\"up:active\"}]}}", "stdout_lines": [ "", "{\"health\":{\"health\":{\"health_services\":[{\"mons\":[{\"name\":\"magna069.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4429200,\"kb_avail\":908013800,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:42.396679\",\"store_stats\":{\"bytes_total\":57521079,\"bytes_sst\":55423911,\"bytes_log\":2031616,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"},{\"name\":\"magna072.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4276336,\"kb_avail\":908166664,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:34.496080\",\"store_stats\":{\"bytes_total\":34111166,\"bytes_sst\":30965422,\"bytes_log\":3080192,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"},{\"name\":\"magna082.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4287284,\"kb_avail\":908155716,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:48.694097\",\"store_stats\":{\"bytes_total\":35160310,\"bytes_sst\":30965990,\"bytes_log\":4128768,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"}]}]},\"timechecks\":{\"epoch\":56,\"round\":2,\"round_status\":\"finished\",\"mons\":[{\"name\":\"magna069.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.000000,\"health\":\"HEALTH_OK\"},{\"name\":\"magna072.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.012728,\"health\":\"HEALTH_OK\"},{\"name\":\"magna082.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.042653,\"health\":\"HEALTH_OK\"}]},\"summary\":[],\"overall_status\":\"HEALTH_OK\",\"detail\":[]},\"fsid\":\"5362be02-bf26-4b66-ac09-7496cadcd801\",\"election_epoch\":56,\"quorum\":[0,1,2],\"quorum_names\":[\"magna069.ceph.redhat.com\",\"magna072.ceph.redhat.com\",\"magna082.ceph.redhat.com\"],\"monmap\":{\"epoch\":5,\"fsid\":\"5362be02-bf26-4b66-ac09-7496cadcd801\",\"modified\":\"2018-02-16 09:42:21.176234\",\"created\":\"2018-02-13 10:38:27.533432\",\"mons\":[{\"rank\":0,\"name\":\"magna069.ceph.redhat.com\",\"addr\":\"10.8.128.69:6789\\/0\"},{\"rank\":1,\"name\":\"magna072.ceph.redhat.com\",\"addr\":\"10.8.128.72:6789\\/0\"},{\"rank\":2,\"name\":\"magna082.ceph.redhat.com\",\"addr\":\"10.8.128.82:6789\\/0\"}]},\"osdmap\":{\"osdmap\":{\"epoch\":5261,\"num_osds\":8,\"num_up_osds\":8,\"num_in_osds\":8,\"full\":false,\"nearfull\":false,\"num_remapped_pgs\":0}},\"pgmap\":{\"pgs_by_state\":[{\"state_name\":\"active+clean\",\"count\":288}],\"version\":176347,\"num_pgs\":288,\"data_bytes\":5921356152,\"bytes_used\":18343018496,\"bytes_avail\":7947121070080,\"bytes_total\":7965464088576},\"fsmap\":{\"epoch\":5,\"id\":1,\"up\":1,\"in\":1,\"max\":1,\"by_rank\":[{\"filesystem_id\":1,\"rank\":0,\"name\":\"magna118\",\"status\":\"up:active\"}]}}" ] } 2018-02-16 11:40:17,013 p=26854 u=ubuntu | [ERROR]: User interrupted execution Version-Release number of selected component (if applicable): ceph-ansible-3.0.25-1.el7cp.noarch ansible-2.4.2.0-2.el7.noarch How reproducible: 10/10 Steps to Reproduce: 1. Configure 2.4 cluster 2. update 2.5 ansible packges 3. upgrade using ceph-ansible, followed the official doc. Actual results: Even though quorum is there ansible fails with unable to form quorum error. Expected results: NA Additional info: NA