Even though admin_socket is not set in ceph.conf, we are not seeing this issue when deploying with ceph-ansible-3.2.5-1.el7cp.noarch (undercloud) and ceph-common-12.2.8-76.el7cp.x86_64 (overcloud) It might be useful to run "ceph -s" within the ceph-mon docker container, instead of running it from the node hosting the container ... but the ceph.conf file contents should be the same. Can you try again with the latest version of ceph-ansible and also report about what version of ceph-common is installed in the overcloud image?
Please use the following command docker exec <monitor docker container> ceph -s
(In reply to Yogev Rabl from comment #2) > Please use the following command > docker exec <monitor docker container> ceph -s # docker exec 2a7817fa6bba ceph -s 2019-04-11 07:48:37.734233 7f0bf064c700 -1 asok(0x7f0be8000fe0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.asok': (17) File exists cluster: id: b54035fc-2469-11e9-a332-5254004fb0be health: HEALTH_OK services: mon: 3 daemons, quorum overcloud-controller-1,overcloud-controller-0,overcloud-controller-2 mgr: overcloud-controller-1(active), standbys: overcloud-controller-0, overcloud-controller-2 osd: 9 osds: 9 up, 9 in rgw: 3 daemons active data: pools: 11 pools, 1408 pgs objects: 1.68k objects, 2.17GiB usage: 7.99GiB used, 397GiB / 405GiB avail pgs: 1408 active+clean The warning still happening. And customer confirmed that running ceph -w will reproduce the issue. Do you think we need to provide some method to add following configuration when deploy with director? +++++ [client] admin socket = /var/run/ceph/$name.$pid.asok +++++ Best Regards, Meiyan
Can you check that: * /var/run/ceph/ceph-client.admin.asok is not in used before issuing the ceph command * if used, then by which process * please show the ceph.conf Thanks
(In reply to leseb from comment #4) > Can you check that: > Thanks for your quick response! > * /var/run/ceph/ceph-client.admin.asok is not in used before issuing the ceph command A: /var/run/ceph/ceph-client.admin.asok is not created before running ceph -w, after running ceph -w, it is created with root:root srwxr-xr-x. 1 root root 0 Apr 11 07:48 ceph-client.admin.asok > * if used, then by which process A: with ceph -w > * please show the ceph.conf A: ceph.conf is created when deploying with director. Here is the ceph.conf: ++++ # cat /etc/ceph/ceph.conf [client.rgw.overcloud-controller-0] host = overcloud-controller-0 keyring = /var/lib/ceph/radosgw/ceph-rgw.overcloud-controller-0/keyring log file = /var/log/ceph/ceph-rgw-overcloud-controller-0.log rgw frontends = civetweb port=172.16.1.15:8080 num_threads=100 [client.rgw.overcloud-controller-1] host = overcloud-controller-1 keyring = /var/lib/ceph/radosgw/ceph-rgw.overcloud-controller-1/keyring log file = /var/log/ceph/ceph-rgw-overcloud-controller-1.log rgw frontends = civetweb port=172.16.1.4:8080 num_threads=100 [client.rgw.overcloud-controller-2] host = overcloud-controller-2 keyring = /var/lib/ceph/radosgw/ceph-rgw.overcloud-controller-2/keyring log file = /var/log/ceph/ceph-rgw-overcloud-controller-2.log rgw frontends = civetweb port=172.16.1.30:8080 num_threads=100 # Please do not change this file directly since it is managed by Ansible and will be overwritten [global] # let's force the admin socket the way it was so we can properly check for existing instances # also the line $cluster-$name.$pid.$cctid.asok is only needed when running multiple instances # of the same daemon, thing ceph-ansible cannot do at the time of writing admin socket = "$run_dir/$cluster-$name.asok" cluster network = 172.16.3.0/24 filestore_max_sync_interval = 10 fsid = b54035fc-2469-11e9-a332-5254004fb0be log file = /dev/null mon cluster log file = /dev/null mon host = 172.16.1.30,172.16.1.4,172.16.1.15 mon initial members = overcloud-controller-2,overcloud-controller-1,overcloud-controller-0 mon_max_pg_per_osd = 3072 osd_pool_default_pg_num = 128 osd_pool_default_pgp_num = 128 osd_pool_default_size = 3 public network = 172.16.1.0/24 rgw_keystone_accepted_roles = Member, admin rgw_keystone_admin_domain = default rgw_keystone_admin_password = kHpZsZf2KZWgeQEWGmN24GQun rgw_keystone_admin_project = service rgw_keystone_admin_user = swift rgw_keystone_api_version = 3 rgw_keystone_implicit_tenants = true rgw_keystone_revocation_interval = 0 rgw_keystone_url = http://172.16.2.5:5000 rgw_s3_auth_use_keystone = true # enable rgw usage log rgw enable usage log = true rgw usage log tick interval = 30 rgw usage log flush threshold = 1024 rgw usage max shards = 32 rgw usage max user shards = 1 rgw_enable_ops_log = true rgw_log_http_headers = http_x_forwarded_for, http_expect, http_content_md5 ++++ I can reproduce this in my test env. Please let me know if you need further information. Best Regards, Meiyan
Level setting the severity of this defect to "High" with a bulk update. Pls refine it to a more closure value, as defined by the severity definition in https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity