Bug 1672025
Summary: | "failed to bind the UNIX domain socket" warning happens on 1 of 3 controller nodes after deployed Ceph with RHOSP13 Director | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Meiyan Zheng <mzheng> |
Component: | Ceph-Ansible | Assignee: | Sébastien Han <shan> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Vasishta <vashastr> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.1 | CC: | aschoen, ceph-eng-bugs, gabrioux, gfidente, gmeno, mzheng, nthomas, sankarshan, shan, tpetr, yrabl |
Target Milestone: | rc | Keywords: | Reopened |
Target Release: | 3.* | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-08-26 17:08:42 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1578730 |
Comment 1
Giulio Fidente
2019-02-06 16:03:00 UTC
Please use the following command docker exec <monitor docker container> ceph -s (In reply to Yogev Rabl from comment #2) > Please use the following command > docker exec <monitor docker container> ceph -s # docker exec 2a7817fa6bba ceph -s 2019-04-11 07:48:37.734233 7f0bf064c700 -1 asok(0x7f0be8000fe0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.asok': (17) File exists cluster: id: b54035fc-2469-11e9-a332-5254004fb0be health: HEALTH_OK services: mon: 3 daemons, quorum overcloud-controller-1,overcloud-controller-0,overcloud-controller-2 mgr: overcloud-controller-1(active), standbys: overcloud-controller-0, overcloud-controller-2 osd: 9 osds: 9 up, 9 in rgw: 3 daemons active data: pools: 11 pools, 1408 pgs objects: 1.68k objects, 2.17GiB usage: 7.99GiB used, 397GiB / 405GiB avail pgs: 1408 active+clean The warning still happening. And customer confirmed that running ceph -w will reproduce the issue. Do you think we need to provide some method to add following configuration when deploy with director? +++++ [client] admin socket = /var/run/ceph/$name.$pid.asok +++++ Best Regards, Meiyan Can you check that: * /var/run/ceph/ceph-client.admin.asok is not in used before issuing the ceph command * if used, then by which process * please show the ceph.conf Thanks (In reply to leseb from comment #4) > Can you check that: > Thanks for your quick response! > * /var/run/ceph/ceph-client.admin.asok is not in used before issuing the ceph command A: /var/run/ceph/ceph-client.admin.asok is not created before running ceph -w, after running ceph -w, it is created with root:root srwxr-xr-x. 1 root root 0 Apr 11 07:48 ceph-client.admin.asok > * if used, then by which process A: with ceph -w > * please show the ceph.conf A: ceph.conf is created when deploying with director. Here is the ceph.conf: ++++ # cat /etc/ceph/ceph.conf [client.rgw.overcloud-controller-0] host = overcloud-controller-0 keyring = /var/lib/ceph/radosgw/ceph-rgw.overcloud-controller-0/keyring log file = /var/log/ceph/ceph-rgw-overcloud-controller-0.log rgw frontends = civetweb port=172.16.1.15:8080 num_threads=100 [client.rgw.overcloud-controller-1] host = overcloud-controller-1 keyring = /var/lib/ceph/radosgw/ceph-rgw.overcloud-controller-1/keyring log file = /var/log/ceph/ceph-rgw-overcloud-controller-1.log rgw frontends = civetweb port=172.16.1.4:8080 num_threads=100 [client.rgw.overcloud-controller-2] host = overcloud-controller-2 keyring = /var/lib/ceph/radosgw/ceph-rgw.overcloud-controller-2/keyring log file = /var/log/ceph/ceph-rgw-overcloud-controller-2.log rgw frontends = civetweb port=172.16.1.30:8080 num_threads=100 # Please do not change this file directly since it is managed by Ansible and will be overwritten [global] # let's force the admin socket the way it was so we can properly check for existing instances # also the line $cluster-$name.$pid.$cctid.asok is only needed when running multiple instances # of the same daemon, thing ceph-ansible cannot do at the time of writing admin socket = "$run_dir/$cluster-$name.asok" cluster network = 172.16.3.0/24 filestore_max_sync_interval = 10 fsid = b54035fc-2469-11e9-a332-5254004fb0be log file = /dev/null mon cluster log file = /dev/null mon host = 172.16.1.30,172.16.1.4,172.16.1.15 mon initial members = overcloud-controller-2,overcloud-controller-1,overcloud-controller-0 mon_max_pg_per_osd = 3072 osd_pool_default_pg_num = 128 osd_pool_default_pgp_num = 128 osd_pool_default_size = 3 public network = 172.16.1.0/24 rgw_keystone_accepted_roles = Member, admin rgw_keystone_admin_domain = default rgw_keystone_admin_password = kHpZsZf2KZWgeQEWGmN24GQun rgw_keystone_admin_project = service rgw_keystone_admin_user = swift rgw_keystone_api_version = 3 rgw_keystone_implicit_tenants = true rgw_keystone_revocation_interval = 0 rgw_keystone_url = http://172.16.2.5:5000 rgw_s3_auth_use_keystone = true # enable rgw usage log rgw enable usage log = true rgw usage log tick interval = 30 rgw usage log flush threshold = 1024 rgw usage max shards = 32 rgw usage max user shards = 1 rgw_enable_ops_log = true rgw_log_http_headers = http_x_forwarded_for, http_expect, http_content_md5 ++++ I can reproduce this in my test env. Please let me know if you need further information. Best Regards, Meiyan Level setting the severity of this defect to "High" with a bulk update. Pls refine it to a more closure value, as defined by the severity definition in https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity |