Created attachment 1396996 [details] ansible-playbook logs. Description of problem: Ceph-ansible fails during upgrade of container cluster from 2.4 to 2.5 for waiting cluster to form quorum even though quorum is there Thanks guillaume for debugging this. Failure message: "FAILED - RETRYING: container | waiting for the containerized monitor to join the quorum" 2018-02-16 11:40:10,566 p=26854 u=ubuntu | FAILED - RETRYING: container | waiting for the containerized monitor to join the quorum... (5 retries left).Result was: { "attempts": 1, "changed": true, "cmd": [ "docker", "exec", "ceph-mon-magna082", "ceph", "--cluster", "slave", "-s", "--format", "json" ], "delta": "0:00:00.305037", "end": "2018-02-16 11:40:10.527565", "invocation": { "module_args": { "_raw_params": "docker exec ceph-mon-magna082 ceph --cluster \"slave\" -s --format json", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true } }, "rc": 0, "retries": 6, "start": "2018-02-16 11:40:10.222528", "stderr": "", "stderr_lines": [], "stdout": "\n{\"health\":{\"health\":{\"health_services\":[{\"mons\":[{\"name\":\"magna069.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4429200,\"kb_avail\":908013800,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:42.396679\",\"store_stats\":{\"bytes_total\":57521079,\"bytes_sst\":55423911,\"bytes_log\":2031616,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"},{\"name\":\"magna072.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4276336,\"kb_avail\":908166664,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:34.496080\",\"store_stats\":{\"bytes_total\":34111166,\"bytes_sst\":30965422,\"bytes_log\":3080192,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"},{\"name\":\"magna082.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4287284,\"kb_avail\":908155716,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:48.694097\",\"store_stats\":{\"bytes_total\":35160310,\"bytes_sst\":30965990,\"bytes_log\":4128768,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"}]}]},\"timechecks\":{\"epoch\":56,\"round\":2,\"round_status\":\"finished\",\"mons\":[{\"name\":\"magna069.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.000000,\"health\":\"HEALTH_OK\"},{\"name\":\"magna072.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.012728,\"health\":\"HEALTH_OK\"},{\"name\":\"magna082.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.042653,\"health\":\"HEALTH_OK\"}]},\"summary\":[],\"overall_status\":\"HEALTH_OK\",\"detail\":[]},\"fsid\":\"5362be02-bf26-4b66-ac09-7496cadcd801\",\"election_epoch\":56,\"quorum\":[0,1,2],\"quorum_names\":[\"magna069.ceph.redhat.com\",\"magna072.ceph.redhat.com\",\"magna082.ceph.redhat.com\"],\"monmap\":{\"epoch\":5,\"fsid\":\"5362be02-bf26-4b66-ac09-7496cadcd801\",\"modified\":\"2018-02-16 09:42:21.176234\",\"created\":\"2018-02-13 10:38:27.533432\",\"mons\":[{\"rank\":0,\"name\":\"magna069.ceph.redhat.com\",\"addr\":\"10.8.128.69:6789\\/0\"},{\"rank\":1,\"name\":\"magna072.ceph.redhat.com\",\"addr\":\"10.8.128.72:6789\\/0\"},{\"rank\":2,\"name\":\"magna082.ceph.redhat.com\",\"addr\":\"10.8.128.82:6789\\/0\"}]},\"osdmap\":{\"osdmap\":{\"epoch\":5261,\"num_osds\":8,\"num_up_osds\":8,\"num_in_osds\":8,\"full\":false,\"nearfull\":false,\"num_remapped_pgs\":0}},\"pgmap\":{\"pgs_by_state\":[{\"state_name\":\"active+clean\",\"count\":288}],\"version\":176347,\"num_pgs\":288,\"data_bytes\":5921356152,\"bytes_used\":18343018496,\"bytes_avail\":7947121070080,\"bytes_total\":7965464088576},\"fsmap\":{\"epoch\":5,\"id\":1,\"up\":1,\"in\":1,\"max\":1,\"by_rank\":[{\"filesystem_id\":1,\"rank\":0,\"name\":\"magna118\",\"status\":\"up:active\"}]}}", "stdout_lines": [ "", "{\"health\":{\"health\":{\"health_services\":[{\"mons\":[{\"name\":\"magna069.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4429200,\"kb_avail\":908013800,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:42.396679\",\"store_stats\":{\"bytes_total\":57521079,\"bytes_sst\":55423911,\"bytes_log\":2031616,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"},{\"name\":\"magna072.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4276336,\"kb_avail\":908166664,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:34.496080\",\"store_stats\":{\"bytes_total\":34111166,\"bytes_sst\":30965422,\"bytes_log\":3080192,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"},{\"name\":\"magna082.ceph.redhat.com\",\"kb_total\":961297424,\"kb_used\":4287284,\"kb_avail\":908155716,\"avail_percent\":94,\"last_updated\":\"2018-02-16 11:39:48.694097\",\"store_stats\":{\"bytes_total\":35160310,\"bytes_sst\":30965990,\"bytes_log\":4128768,\"bytes_misc\":65552,\"last_updated\":\"0.000000\"},\"health\":\"HEALTH_OK\"}]}]},\"timechecks\":{\"epoch\":56,\"round\":2,\"round_status\":\"finished\",\"mons\":[{\"name\":\"magna069.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.000000,\"health\":\"HEALTH_OK\"},{\"name\":\"magna072.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.012728,\"health\":\"HEALTH_OK\"},{\"name\":\"magna082.ceph.redhat.com\",\"skew\":0.000000,\"latency\":0.042653,\"health\":\"HEALTH_OK\"}]},\"summary\":[],\"overall_status\":\"HEALTH_OK\",\"detail\":[]},\"fsid\":\"5362be02-bf26-4b66-ac09-7496cadcd801\",\"election_epoch\":56,\"quorum\":[0,1,2],\"quorum_names\":[\"magna069.ceph.redhat.com\",\"magna072.ceph.redhat.com\",\"magna082.ceph.redhat.com\"],\"monmap\":{\"epoch\":5,\"fsid\":\"5362be02-bf26-4b66-ac09-7496cadcd801\",\"modified\":\"2018-02-16 09:42:21.176234\",\"created\":\"2018-02-13 10:38:27.533432\",\"mons\":[{\"rank\":0,\"name\":\"magna069.ceph.redhat.com\",\"addr\":\"10.8.128.69:6789\\/0\"},{\"rank\":1,\"name\":\"magna072.ceph.redhat.com\",\"addr\":\"10.8.128.72:6789\\/0\"},{\"rank\":2,\"name\":\"magna082.ceph.redhat.com\",\"addr\":\"10.8.128.82:6789\\/0\"}]},\"osdmap\":{\"osdmap\":{\"epoch\":5261,\"num_osds\":8,\"num_up_osds\":8,\"num_in_osds\":8,\"full\":false,\"nearfull\":false,\"num_remapped_pgs\":0}},\"pgmap\":{\"pgs_by_state\":[{\"state_name\":\"active+clean\",\"count\":288}],\"version\":176347,\"num_pgs\":288,\"data_bytes\":5921356152,\"bytes_used\":18343018496,\"bytes_avail\":7947121070080,\"bytes_total\":7965464088576},\"fsmap\":{\"epoch\":5,\"id\":1,\"up\":1,\"in\":1,\"max\":1,\"by_rank\":[{\"filesystem_id\":1,\"rank\":0,\"name\":\"magna118\",\"status\":\"up:active\"}]}}" ] } 2018-02-16 11:40:17,013 p=26854 u=ubuntu | [ERROR]: User interrupted execution Version-Release number of selected component (if applicable): ceph-ansible-3.0.25-1.el7cp.noarch ansible-2.4.2.0-2.el7.noarch How reproducible: 10/10 Steps to Reproduce: 1. Configure 2.4 cluster 2. update 2.5 ansible packges 3. upgrade using ceph-ansible, followed the official doc. Actual results: Even though quorum is there ansible fails with unable to form quorum error. Expected results: NA Additional info: NA
Observed the similar failure during installation [kernel updated] Changed the hostname from FQDN to short hostname in /etc/hostname, the installation completed successfully.
v3.0.26 should fix this issue
Hi Ramakrishnan, the initial error reported in this BZ is fixed in ceph-ansible v3.0.26 the error reported in c28 is fixed in the container image ceph-2-rhel-7-docker-candidate-62031-20180220125431
(In reply to Guillaume Abrioux from comment #40) > Hi Ramakrishnan, > > the initial error reported in this BZ is fixed in ceph-ansible v3.0.26 > the error reported in c28 is fixed in the container image > ceph-2-rhel-7-docker-candidate-62031-20180220125431 Thanks for the update Guillaume and thanks for your time to troubleshoot the issue and explaining me about the problem.
Hi Aron, Provided doc text for release notes, clearing the needinfo tag. Regards, Ramakrishnan