Bug 1323232
Summary: | [ceph-ansible] old settings overriding jewel defaults in ceph.conf | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Storage Console | Reporter: | shylesh <shmohan> | ||||
Component: | ceph-ansible | Assignee: | Andrew Schoen <aschoen> | ||||
Status: | CLOSED ERRATA | QA Contact: | Martin Kudlej <mkudlej> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 2 | CC: | adeza, amaredia, aschoen, ceph-eng-bugs, dzafman, hnallurv, jdurgin, kchai, kdreyer, mkudlej, nthomas, poelstra, sankarshan, sds-qe-bugs, shmohan | ||||
Target Milestone: | --- | ||||||
Target Release: | 2 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | ceph-ansible-1.0.5-10.el7scon | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-08-23 19:49:06 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
What is the contents of ceph.conf on these nodes? Did you change the 'admin socket' setting there, or was it set up by ceph-ansible? Does the /var/run/ceph/rbd-clients/ directory exist? If it was set up automatically, we need to adjust the packaging to create that directory and label it with the proper selinux settings. (In reply to Josh Durgin from comment #2) > What is the contents of ceph.conf on these nodes? Did you change the 'admin > socket' setting there, or was it set up by ceph-ansible? Everything was setup by ceph-ansible. I didn't change anything in ceph.conf > > Does the /var/run/ceph/rbd-clients/ directory exist? [root@magna105 ~]# ll /var/run/ceph/rbd-clients/ ls: cannot access /var/run/ceph/rbd-clients/: No such file or directory > > If it was set up automatically, we need to adjust the packaging to create > that directory and label it with the proper selinux settings. ceph.conf ============== # Please do not change this file directly since it is managed by Ansible and will be overwritten [global] auth cluster required = cephx auth service required = cephx auth client required = cephx cephx require signatures = True # Kernel RBD does NOT support signatures! cephx cluster require signatures = True cephx service require signatures = False fsid = e547f093-99ba-44e5-b63a-317efa3ecfb3 max open files = 131072 osd pool default pg num = 128 osd pool default pgp num = 128 osd pool default size = 2 osd pool default min size = 1 osd pool default crush rule = 0 # Disable in-memory logs debug_lockdep = 0/0 debug_context = 0/0 debug_crush = 0/0 debug_buffer = 0/0 debug_timer = 0/0 debug_filer = 0/0 debug_objecter = 0/0 debug_rados = 0/0 debug_rbd = 0/0 debug_journaler = 0/0 debug_objectcatcher = 0/0 debug_client = 0/0 debug_osd = 0/0 debug_optracker = 0/0 debug_objclass = 0/0 debug_filestore = 0/0 debug_journal = 0/0 debug_ms = 0/0 debug_monc = 0/0 debug_tp = 0/0 debug_auth = 0/0 debug_finisher = 0/0 debug_heartbeatmap = 0/0 debug_perfcounter = 0/0 debug_asok = 0/0 debug_throttle = 0/0 debug_mon = 20/0 debug_paxos = 0/0 debug_rgw = 0/0 [client] rbd cache = true rbd cache writethrough until flush = true rbd concurrent management ops = 20 admin socket = /var/run/ceph/rbd-clients/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor log file = /var/log/rbd-clients/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor rbd default map options = rw rbd default features = 3 # sum features digits rbd default format = 2 [mon] mon osd down out interval = 600 mon osd min down reporters = 7 mon clock drift allowed = 0.15 mon clock drift warn backoff = 30 mon osd full ratio = 0.95 mon osd nearfull ratio = 0.85 mon osd report timeout = 300 mon pg warn max per osd = 0 mon osd allow primary affinity = true mon pg warn max object skew = 10 [mon.magna105] host = magna105 mon addr = 10.8.128.105 [mon.magna107] host = magna107 mon addr = 10.8.128.107 [mon.magna108] host = magna108 mon addr = 10.8.128.108 [osd] osd mkfs type = xfs osd mkfs options xfs = -f -i size=2048 osd mount options xfs = noatime,largeio,inode64,swalloc osd journal size = 10000 cluster_network = 10.8.128.0/21 public_network = 10.8.128.0/21 osd mon heartbeat interval = 30 # Performance tuning filestore merge threshold = 40 filestore split multiple = 8 osd op threads = 8 filestore op threads = 8 filestore max sync interval = 5 osd max scrubs = 1 osd scrub begin hour = 0 osd scrub end hour = 24 # Recovery tuning osd recovery max active = 5 osd max backfills = 2 osd recovery op priority = 2 osd recovery max chunk = 1048576 osd recovery threads = 1 osd objectstore = filestore osd crush update on start = true # Deep scrub impact osd scrub sleep = 0.1 osd disk thread ioprio class = idle osd disk thread ioprio priority = 0 osd scrub chunk max = 5 osd deep scrub stride = 1048576 Ok, those are a bunch of settings that we mostly don't want in ceph.conf, since the defaults in ceph itself are more appropriate for jewel. Added a ceph-ansible ticket here: https://github.com/ceph/ceph-ansible/issues/693 and commented here for the selinux issue: https://github.com/ceph/ceph-ansible/issues/687 There appear to be other ansible bzs assigned to gmeno, piling on. Andrew, Would you please send status on where we are with this upstream? A pull request for this has been opened upstream here: https://github.com/ceph/ceph-ansible/pull/694 Looks like that PR 694 needs a rebase The PR for this has been merged upstream: https://github.com/ceph/ceph-ansible/pull/694 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2016:1754 |
Created attachment 1142620 [details] mon logs Description of problem: I created a cluster using ansible installatin for 2.0. Processes were launched as ceph:ceph user and group. But after putting selinux to enforcing mode , ceph commands starting throwing errors and monitors are experiencing disconnects. Version-Release number of selected component (if applicable): [root@magna105 ~]# rpm -qa| grep ceph ceph-common-10.0.4-2.el7cp.x86_64 ceph-mon-10.0.4-2.el7cp.x86_64 mod_fastcgi-2.4.7-1.ceph.el7.x86_64 iozone-3.424-2_ceph.el7.x86_64 python-cephfs-10.0.4-2.el7cp.x86_64 ceph-base-10.0.4-2.el7cp.x86_64 ceph-10.0.4-2.el7cp.x86_64 ceph-osd-10.0.4-2.el7cp.x86_64 libcephfs1-10.0.4-2.el7cp.x86_64 ceph-mds-10.0.4-2.el7cp.x86_64 ceph-selinux-10.0.4-2.el7cp.x86_64 How reproducible: Always Steps to Reproduce: 1.created a 2.0 cluster using ceph-ansible 2.all the processes like mon and osds are launched as ceph:ceph user and group 3.enable selinux enforcing mode on each of the nodes and restart the nodes 4. after node comes back try to execute ceph commands ex: ceph -s Actual results: [ubuntu@magna107 ~]$ sudo ceph -s 2016-04-01 05:47:49.182183 7fa124fc9700 -1 asok(0x7fa120000f80) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/rbd-clients/ceph-client.admin.4314.140330003337376.asok': (2) No such file or directory cluster e547f093-99ba-44e5-b63a-317efa3ecfb3 health HEALTH_WARN clock skew detected on mon.magna107, mon.magna108 too few PGs per OSD (14 < min 30) Monitor clock skew detected monmap e1: 3 mons at {magna105=10.8.128.105:6789/0,magna107=10.8.128.107:6789/0,magna108=10.8.128.108:6789/0} election epoch 104, quorum 0,1,2 magna105,magna107,magna108 osdmap e1624: 9 osds: 9 up, 9 in flags sortbitwise pgmap v4255: 64 pgs, 1 pools, 0 bytes data, 0 objects 415 MB used, 8291 GB / 8291 GB avail 64 active+clean Expected results: Additional info: Earlier before enabling selinux enforcing modes commands were working fine . cluster info ========== [mons] magna105-----> -- magna107------> | on theses nodes selinux enforcing mode is enabled -- magna108 [on this node selinux is permissive , so i don't see any error here] [osds] magna109 magna110 magna116 log snippets ============= from magna105 =========== 2016-03-31 17:12:46.511286 7fcdd2976480 0 starting mon.magna105 rank 0 at 10.8.128.105:6789/0 mon_data /var/lib/ceph/mon/ceph-magna105 fsid e547f093-99ba-44e5-b63a-317efa3ecfb3 2016-03-31 17:12:46.512612 7fcdd2976480 0 mon.magna105@-1(probing) e0 my rank is now 0 (was -1) 2016-03-31 17:12:46.512959 7fcdc978d700 0 -- 10.8.128.105:6789/0 >> 10.8.128.108:6789/0 pipe(0x7fcdddaa9000 sd=12 :0 s=1 pgs=0 cs=0 l=0 c=0x7fcddd90a580).fault 2016-03-31 17:12:46.512986 7fcdd2962700 0 -- 10.8.128.105:6789/0 >> 10.8.128.107:6789/0 pipe(0x7fcdddaa4000 sd=11 :0 s=1 pgs=0 cs=0 l=0 c=0x7fcddd90a400).fault 2016-03-31 17:12:46.557905 7fcdc8e8b700 0 -- 10.8.128.105:6789/0 >> 10.8.128.107:6789/0 pipe(0x7fcdddab6000 sd=21 :6789 s=0 pgs=0 cs=0 l=0 c=0x7fcddd90b180).accept connect_seq 0 vs existing 0 state connecting 2016-03-31 17:12:46.559146 7fcdcaf90700 0 log_channel(cluster) log [INF] : mon.magna105 calling new monitor election 2016-03-31 17:12:46.561287 7fcdc8d8a700 0 -- 10.8.128.105:6789/0 >> 10.8.128.108:6789/0 pipe(0x7fcdddab6000 sd=21 :6789 s=0 pgs=0 cs=0 l=0 c=0x7fcddd90b600).accept connect_seq 0 vs existing 0 state connecting 2016-03-31 17:12:46.718938 7fcdcdaff700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch 2016-03-31 17:12:46.719020 7fcdcdaff700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished 2016-03-31 17:12:47.519777 7fcdcdaff700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch ...skipping... Attaching the full logs.