1468741 – [RHCeph 2.3Async/10.2.7-28] ceph-ansible unable to find a keyring on /etc/ceph/ceph.client.admin.keyring

Bug 1468741 - [RHCeph 2.3Async/10.2.7-28] ceph-ansible unable to find a keyring on /etc/ceph/ceph.client.admin.keyring

Summary: [RHCeph 2.3Async/10.2.7-28] ceph-ansible unable to find a keyring on /etc/cep...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	2.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	medium
Target Milestone:	rc
Target Release:	2.3
Assignee:	Guillaume Abrioux
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-07-07 19:26 UTC by Vasu Kulkarni
Modified:	2022-02-21 18:05 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-07-12 14:25:05 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Vasu Kulkarni 2017-07-07 19:26:07 UTC

Description of problem:

Smoke is broken for few ceph-ansible tests, althogh ceph-ansible hasn't changed in this async build. Note this tests worked fine in 10.2.7-27. 


2017-07-07T14:26:18.207 INFO:teuthology.orchestra.run.clara011.stdout:
2017-07-07T14:26:18.207 INFO:teuthology.orchestra.run.clara011.stdout:ceph version 10.2.7-28.el7cp (216cda64fd9a9b43c4b0c2f8c402d36753ee35f7)
2017-07-07T14:26:18.207 INFO:teuthology.orchestra.run.clara011.stdout:
2017-07-07T14:26:18.208 INFO:teuthology.orchestra.run.clara011.stdout:TASK [ceph.ceph-common : is ceph running already?] *****************************
2017-07-07T14:26:18.208 INFO:teuthology.orchestra.run.clara011.stdout:task path: /home/ubuntu/ceph-ansible/roles/ceph-common/tasks/facts.yml:11
2017-07-07T14:26:18.208 INFO:teuthology.orchestra.run.clara011.stdout:ok: [clara010.ceph.redhat.com -> clara011.ceph.redhat.com] => {
2017-07-07T14:26:18.208 INFO:teuthology.orchestra.run.clara011.stdout:    "changed": false,
2017-07-07T14:26:18.208 INFO:teuthology.orchestra.run.clara011.stdout:    "cmd": [
2017-07-07T14:26:18.208 INFO:teuthology.orchestra.run.clara011.stdout:        "ceph",
2017-07-07T14:26:18.208 INFO:teuthology.orchestra.run.clara011.stdout:        "--connect-timeout",
2017-07-07T14:26:18.209 INFO:teuthology.orchestra.run.clara011.stdout:        "3",
2017-07-07T14:26:18.209 INFO:teuthology.orchestra.run.clara011.stdout:        "--cluster",
2017-07-07T14:26:18.209 INFO:teuthology.orchestra.run.clara011.stdout:        "ceph",
2017-07-07T14:26:18.209 INFO:teuthology.orchestra.run.clara011.stdout:        "fsid"
2017-07-07T14:26:18.209 INFO:teuthology.orchestra.run.clara011.stdout:    ],
2017-07-07T14:26:18.209 INFO:teuthology.orchestra.run.clara011.stdout:    "delta": "0:00:00.167942",
2017-07-07T14:26:18.209 INFO:teuthology.orchestra.run.clara011.stdout:    "end": "2017-07-07 18:26:04.504175",
2017-07-07T14:26:18.210 INFO:teuthology.orchestra.run.clara011.stdout:    "failed": false,
2017-07-07T14:26:18.210 INFO:teuthology.orchestra.run.clara011.stdout:    "failed_when_result": false,
2017-07-07T14:26:18.210 INFO:teuthology.orchestra.run.clara011.stdout:    "rc": 1,
2017-07-07T14:26:18.210 INFO:teuthology.orchestra.run.clara011.stdout:    "start": "2017-07-07 18:26:04.336233",
2017-07-07T14:26:18.210 INFO:teuthology.orchestra.run.clara011.stdout:    "warnings": []
2017-07-07T14:26:18.210 INFO:teuthology.orchestra.run.clara011.stdout:}
2017-07-07T14:26:18.210 INFO:teuthology.orchestra.run.clara011.stdout:
2017-07-07T14:26:18.211 INFO:teuthology.orchestra.run.clara011.stdout:STDERR:
2017-07-07T14:26:18.211 INFO:teuthology.orchestra.run.clara011.stdout:
2017-07-07T14:26:18.211 INFO:teuthology.orchestra.run.clara011.stdout:2017-07-07 18:26:04.464323 7fbf1f66b700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2017-07-07T14:26:18.211 INFO:teuthology.orchestra.run.clara011.stdout:2017-07-07 18:26:04.465808 7fbf1f66b700 -1 monclient(hunting): authenticate NOTE: no keyring found; disabled cephx authentication
2017-07-07T14:26:18.211 INFO:teuthology.orchestra.run.clara011.stdout:2017-07-07 18:26:04.465821 7fbf1f66b700  0 librados: client.admin authentication error (95) Operation not supported
2017-07-07T14:26:18.211 INFO:teuthology.orchestra.run.clara011.stdout:Error connecting to cluster: Error
2017-07-07T14:26:18.211 INFO:teuthology.orchestra.run.clara011.stdout:ok: [pluto008.ceph.redhat.com -> clara011.ceph.redhat.com] => {
2017-07-07T14:26:18.212 INFO:teuthology.orchestra.run.clara011.stdout:    "changed": false,
2017-07-07T14:26:18.212 INFO:teuthology.orchestra.run.clara011.stdout:    "cmd": [
2017-07-07T14:26:18.212 INFO:teuthology.orchestra.run.clara011.stdout:        "ceph",
2017-07-07T14:26:18.212 INFO:teuthology.orchestra.run.clara011.stdout:        "--connect-timeout",
2017-07-07T14:26:18.212 INFO:teuthology.orchestra.run.clara011.stdout:        "3",
2017-07-07T14:26:18.212 INFO:teuthology.orchestra.run.clara011.stdout:        "--cluster",
2017-07-07T14:26:18.212 INFO:teuthology.orchestra.run.clara011.stdout:        "ceph",
2017-07-07T14:26:18.213 INFO:teuthology.orchestra.run.clara011.stdout:        "fsid"
2017-07-07T14:26:18.213 INFO:teuthology.orchestra.run.clara011.stdout:    ],
2017-07-07T14:26:18.213 INFO:teuthology.orchestra.run.clara011.stdout:    "delta": "0:00:03.107497",
2017-07-07T14:26:18.213 INFO:teuthology.orchestra.run.clara011.stdout:    "end": "2017-07-07 18:26:07.443729",
2017-07-07T14:26:18.213 INFO:teuthology.orchestra.run.clara011.stdout:    "failed": false,
2017-07-07T14:26:18.213 INFO:teuthology.orchestra.run.clara011.stdout:    "failed_when_result": false,
2017-07-07T14:26:18.214 INFO:teuthology.orchestra.run.clara011.stdout:    "rc": 1,
2017-07-07T14:26:18.214 INFO:teuthology.orchestra.run.clara011.stdout:    "start": "2017-07-07 18:26:04.336232",
2017-07-07T14:26:18.214 INFO:teuthology.orchestra.run.clara011.stdout:    "warnings": []
2017-07-07T14:26:18.214 INFO:teuthology.orchestra.run.clara011.stdout:}
2017-07-07T14:26:18.214 INFO:teuthology.orchestra.run.clara011.stdout:
2017-07-07T14:26:18.214 INFO:teuthology.orchestra.run.clara011.stdout:STDERR:
2017-07-07T14:26:18.214 INFO:teuthology.orchestra.run.clara011.stdout:
2017-07-07T14:26:18.215 INFO:teuthology.orchestra.run.clara011.stdout:2017-07-07 18:26:04.464324 7fd7a3a98700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2017-07-07T14:26:18.215 INFO:teuthology.orchestra.run.clara011.stdout:2017-07-07 18:26:04.464922 7fd7a0320700  0 -- :/2038272039 >> 10.8.129.11:6789/0 pipe(0x7fd79c05dd40 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7fd79c05f000).fault
2017-07-07T14:26:18.215 INFO:teuthology.orchestra.run.clara011.stdout:Traceback (most recent call last):
2017-07-07T14:26:18.215 INFO:teuthology.orchestra.run.clara011.stdout:  File "/bin/ceph", line 948, in <module>
2017-07-07T14:26:18.215 INFO:teuthology.orchestra.run.clara011.stdout:    retval = main()
2017-07-07T14:26:18.215 INFO:teuthology.orchestra.run.clara011.stdout:  File "/bin/ceph", line 852, in main
2017-07-07T14:26:18.215 INFO:teuthology.orchestra.run.clara011.stdout:    prefix='get_command_descriptions')
2017-07-07T14:26:18.216 INFO:teuthology.orchestra.run.clara011.stdout:  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 1300, in json_command
2017-07-07T14:26:18.216 INFO:teuthology.orchestra.run.clara011.stdout:    raise RuntimeError('"{0}": exception {1}'.format(argdict, e))
2017-07-07T14:26:18.216 INFO:teuthology.orchestra.run.clara011.stdout:RuntimeError: "None": exception "['{"prefix": "get_command_descriptions"}']": exception You cannot perform that operation on a Rados object in state configuring.


Full logs at:
http://magna002.ceph.redhat.com/vasu-2017-07-07_13:17:54-smoke-jewel---basic-multi/270347/teuthology.log

Comment 2 Vasu Kulkarni 2017-07-07 20:03:15 UTC

steps for someone who want to recreate


a) Inventory

[clients]
pluto008.ceph.redhat.com devices='["/dev/sdb", "/dev/sdc", "/dev/sdd"]' monitor_interface='eno1' public_network='10.8.128.0/21'

[mons]
clara011.ceph.redhat.com devices='[]' monitor_interface='eno1' public_network='10.8.128.0/21'
clara012.ceph.redhat.com devices='[]' monitor_interface='eno1' public_network='10.8.128.0/21'
pluto009.ceph.redhat.com devices='[]' monitor_interface='eno1' public_network='10.8.128.0/21'

[osds]
clara010.ceph.redhat.com devices='["/dev/sdb", "/dev/sdc", "/dev/sdd"]' monitor_interface='eno1' public_network='10.8.128.0/21'
pluto008.ceph.redhat.com devices='["/dev/sdb", "/dev/sdc", "/dev/sdd"]' monitor_interface='eno1' public_network='10.8.128.0/21'


b) group_vars/all

ceph_conf_overrides:
  global:
    osd_default_pool_size: 2
    osd_pool_default_pg_num: 128
    osd_pool_default_pgp_num: 128
ceph_origin: distro
ceph_stable: true
ceph_stable_rh_storage: true
ceph_test: true
journal_collocation: true
journal_size: 1024
osd_auto_discovery: false


c) ansible-playbook -vv -i inven.yml site.yml

Comment 4 Vasu Kulkarni 2017-07-10 19:45:55 UTC

Different traceback where it failed to start radosgw-instance.

https://paste.fedoraproject.org/paste/262EwD6cXG8Io9Awl58kMQ/raw

Comment 5 Vasu Kulkarni 2017-07-10 19:51:02 UTC

Another instance where it failed to start mon:

https://paste.fedoraproject.org/paste/3IY9D5y5bOA-Nn7U2Bk9WA/raw

Comment 7 Ian Colle 2017-07-10 23:21:36 UTC

Seb is out on PTO for next two weeks - Andrew, please take a look

Comment 10 Alfredo Deza 2017-07-11 19:42:01 UTC

Upstream PR https://github.com/ceph/ceph-ansible/pull/1666

Comment 11 Christina Meno 2017-07-11 20:00:41 UTC

This looks like you have incorrectly configured the network addrs to be 

clara010.ceph.redhat.com devices='["/dev/sdb", "/dev/sdc", "/dev/sdd"]' monitor_interface='eno1' public_network='10.8.128.0/21'
pluto008.ceph.redhat.com devices='["/dev/sdb", "/dev/sdc", "/dev/sdd"]' monitor_interface='eno1' public_network='10.8.128.0/21'

in stead of what they actually seem to be 

10.8.129.0/21

Would you please re-test with the correct config?

[ubuntu@pluto009 ~]$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 0c:c4:7a:6e:db:58 brd ff:ff:ff:ff:ff:ff
    inet 10.8.129.109/21 brd 10.8.135.255 scope global dynamic eno1
       valid_lft 27538sec preferred_lft 27538sec
    inet6 2620:52:0:880:ec4:7aff:fe6e:db58/64 scope global noprefixroute dynamic 
       valid_lft 2591572sec preferred_lft 604372sec
    inet6 fe80::ec4:7aff:fe6e:db58/64 scope link 
       valid_lft forever preferred_lft forever
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 0c:c4:7a:6e:db:59 brd ff:ff:ff:ff:ff:ff
[ubuntu@pluto009 ~]$ logout
Connection to pluto009 closed.
gmeno@magna002:~$ ssh ubuntu@clara010
Warning: Permanently added 'clara010,10.8.129.10' (ECDSA) to the list of known hosts.
Last login: Tue Jul 11 12:48:27 2017 from pluto010.ceph.redhat.com
[ubuntu@clara010 ~]$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 0c:c4:7a:6c:69:1c brd ff:ff:ff:ff:ff:ff
    inet 10.8.129.10/21 brd 10.8.135.255 scope global dynamic eno1
       valid_lft 38060sec preferred_lft 38060sec
    inet6 2620:52:0:880:ec4:7aff:fe6c:691c/64 scope global noprefixroute dynamic 
       valid_lft 2591955sec preferred_lft 604755sec
    inet6 fe80::ec4:7aff:fe6c:691c/64 scope link 
       valid_lft forever preferred_lft forever
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 0c:c4:7a:6c:69:1d brd ff:ff:ff:ff:ff:ff

Comment 12 Vasu Kulkarni 2017-07-11 21:13:18 UTC

Gregory,

The /21 as pointed out by david correctly address the Ip range in this network: 10.8.128.0/21: 10.8.128.1 through 10.8.135.255,

So the inventory file is correct.

Comment 13 Christina Meno 2017-07-11 21:36:08 UTC

Guillaume,

Please investigate this as top priority tomorrow.

thank you

Comment 14 Guillaume Abrioux 2017-07-12 09:36:03 UTC

I tried to reproduce several times your issue with similar environment, but I couldn't :

PLAY RECAP *****************************************************************
clara010.ceph.redhat.com   : ok=56   changed=14   unreachable=0    failed=0
clara011.ceph.redhat.com   : ok=57   changed=16   unreachable=0    failed=0
clara012.ceph.redhat.com   : ok=52   changed=16   unreachable=0    failed=0
pluto008.ceph.redhat.com   : ok=93   changed=16   unreachable=0    failed=0
pluto009.ceph.redhat.com   : ok=53   changed=17   unreachable=0    failed=0

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Using the repo you provided at https://paste.fedoraproject.org/paste/reZNMtQ7Dl8NUkGPq8yIwg/raw

# ceph --version
ceph version 10.2.7-28.el7cp (216cda64fd9a9b43c4b0c2f8c402d36753ee35f7)

# rpm -qi ceph-ansible
Name        : ceph-ansible
Version     : 2.2.11
Release     : 1.el7scon

# cat group_vars/all.yml
ceph_conf_overrides:
  global:
    osd_default_pool_size: 2
    osd_pool_default_pg_num: 128
    osd_pool_default_pgp_num: 128
ceph_origin: distro
ceph_stable: true
ceph_stable_rh_storage: true
ceph_test: true
journal_collocation: true
journal_size: 1024
osd_auto_discovery: false


# cat hosts
[clients]
pluto008.ceph.redhat.com ansible_ssh_host='192.168.121.113' ansible_ssh_user='vagrant' devices='["/dev/sdb", "/dev/sdc", "/dev/sdd"]' monitor_interface='eth1' public_network='192.168.91.0/24'

[mons]
clara011.ceph.redhat.com ansible_ssh_host='192.168.121.112' ansible_ssh_user='vagrant' devices='[]' monitor_interface='eth1' public_network='192.168.91.0/24'
clara012.ceph.redhat.com ansible_ssh_host='192.168.121.38' ansible_ssh_user='vagrant' devices='[]' monitor_interface='eth1' public_network='192.168.91.0/24'
pluto009.ceph.redhat.com ansible_ssh_host='192.168.121.226' ansible_ssh_user='vagrant' devices='[]' monitor_interface='eth1' public_network='192.168.91.0/24'

[osds]
clara010.ceph.redhat.com ansible_ssh_host='192.168.121.17' ansible_ssh_user='vagrant' devices='["/dev/sda", "/dev/sdb", "/dev/sdc"]' monitor_interface='eth1' public_network='192.168.91.0/24'
pluto008.ceph.redhat.com ansible_ssh_host='192.168.121.113' ansible_ssh_user='vagrant' devices='["/dev/sda", "/dev/sdb", "/dev/sdc"]' monitor_interface='eth1' public_network='192.168.91.0/24'


you can find attached the playbook log.

Are you hitting this issue for every deployment you try with all these parameters or does it happen 'randomly'?

Comment 15 Vasu Kulkarni 2017-07-12 14:25:05 UTC

Thanks Guillaume for trying the exact steps, I will look in detail what is causing this in regression runs, I now doubt something stale from other test is probably causing this :(

Note You need to log in before you can comment on or make changes to this bug.