Description of problem: When debugging some of the ansible test failure, I noticed that ceph_keys fails but because monitor is not in quorum, in such cases its better to stop the rest of the playbooks to run since the errors that come out will be just confusing and hard to debug Full logs: http://magna002.ceph.redhat.com/vasu-2017-09-21_19:20:48-smoke-luminous---basic-multi/274653/teuthology.log 2017-09-21T23:18:25.920 INFO:teuthology.orchestra.run.pluto007.stdout: 2017-09-21T23:18:25.921 INFO:teuthology.orchestra.run.pluto007.stdout:TASK [ceph-mon : include ceph_keys.yml] **************************************** 2017-09-21T23:18:25.921 INFO:teuthology.orchestra.run.pluto007.stdout:task path: /home/ubuntu/ceph-ansible/roles/ceph-mon/tasks/main.yml:13 2017-09-21T23:18:25.921 INFO:teuthology.orchestra.run.pluto007.stdout:included: /home/ubuntu/ceph-ansible/roles/ceph-mon/tasks/ceph_keys.yml for pluto004.ceph.redhat.com, pluto005.ceph.redhat.com, pluto007.ceph.redhat.com 2017-09-21T23:18:25.921 INFO:teuthology.orchestra.run.pluto007.stdout: 2017-09-21T23:18:25.921 INFO:teuthology.orchestra.run.pluto007.stdout:TASK [ceph-mon : collect admin and bootstrap keys] ***************************** 2017-09-21T23:18:25.921 INFO:teuthology.orchestra.run.pluto007.stdout:task path: /home/ubuntu/ceph-ansible/roles/ceph-mon/tasks/ceph_keys.yml:2 2017-09-21T23:18:25.922 INFO:teuthology.orchestra.run.pluto007.stdout:ok: [pluto004.ceph.redhat.com] => { 2017-09-21T23:18:25.922 INFO:teuthology.orchestra.run.pluto007.stdout: "changed": false, 2017-09-21T23:18:25.922 INFO:teuthology.orchestra.run.pluto007.stdout: "cmd": [ 2017-09-21T23:18:25.922 INFO:teuthology.orchestra.run.pluto007.stdout: "ceph-create-keys", 2017-09-21T23:18:25.922 INFO:teuthology.orchestra.run.pluto007.stdout: "--cluster", 2017-09-21T23:18:25.922 INFO:teuthology.orchestra.run.pluto007.stdout: "ceph", 2017-09-21T23:18:25.922 INFO:teuthology.orchestra.run.pluto007.stdout: "-i", 2017-09-21T23:18:25.923 INFO:teuthology.orchestra.run.pluto007.stdout: "pluto004" 2017-09-21T23:18:25.923 INFO:teuthology.orchestra.run.pluto007.stdout: ], 2017-09-21T23:18:25.923 INFO:teuthology.orchestra.run.pluto007.stdout: "delta": "0:00:02.422280", 2017-09-21T23:18:25.923 INFO:teuthology.orchestra.run.pluto007.stdout: "end": "2017-09-22 03:20:13.828327", 2017-09-21T23:18:25.923 INFO:teuthology.orchestra.run.pluto007.stdout: "failed": false, 2017-09-21T23:18:25.923 INFO:teuthology.orchestra.run.pluto007.stdout: "failed_when_result": false, 2017-09-21T23:18:25.923 INFO:teuthology.orchestra.run.pluto007.stdout: "rc": 0, 2017-09-21T23:18:25.924 INFO:teuthology.orchestra.run.pluto007.stdout: "start": "2017-09-22 03:20:11.406047" 2017-09-21T23:18:25.924 INFO:teuthology.orchestra.run.pluto007.stdout:} 2017-09-21T23:18:25.924 INFO:teuthology.orchestra.run.pluto007.stdout: 2017-09-21T23:18:25.924 INFO:teuthology.orchestra.run.pluto007.stdout:STDERR: 2017-09-21T23:18:25.924 INFO:teuthology.orchestra.run.pluto007.stdout: 2017-09-21T23:18:25.924 INFO:teuthology.orchestra.run.pluto007.stdout:INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing' 2017-09-21T23:18:25.924 INFO:teuthology.orchestra.run.pluto007.stdout:INFO:ceph-create-keys:Talking to monitor... 2017-09-21T23:18:25.925 INFO:teuthology.orchestra.run.pluto007.stdout:Error ENOENT: failed to find client.admin in keyring 2017-09-21T23:18:25.925 INFO:teuthology.orchestra.run.pluto007.stdout:INFO:ceph-create-keys:Talking to monitor... 2017-09-21T23:18:25.925 INFO:teuthology.orchestra.run.pluto007.stdout:INFO:ceph-create-keys:Talking to monitor... 2017-09-21T23:18:25.925 INFO:teuthology.orchestra.run.pluto007.stdout:INFO:ceph-create-keys:Talking to monitor... 2017-09-21T23:18:25.925 INFO:teuthology.orchestra.run.pluto007.stdout:INFO:ceph-create-keys:Talking to monitor...
Do you know why the mons were not in quorum? Is there a bug in ceph-ansible or are you suggesting that we should check if mons are not in quorum and then fail? Thanks!
Sebastein, That is what I am suggesting to fail during ceph_keys playbook ( else it will eventually timeout after 10minutes trying to wait for quorum eg: create-keys:Talking to monitor...) I am not sure why the mons were not in quorum, but it would be helpful to be fatal during earlier checks since the other failures later on might not be that useful to debug.
This is hard to recreate, but I haven't seen this in past couple of sanity runs so I will close this as sanit only verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3387