Bug 1276186

Summary:

Use of syslog results in all log messages at priority "emerg"

Product:

[Red Hat Storage] Red Hat Ceph Storage

Reporter:

Peter Portante <pportant>

Component:

RADOS

Assignee:

Brad Hubbard <bhubbard>

Status:

CLOSED ERRATA

QA Contact:

ceph-qe-bugs <ceph-qe-bugs>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

1.3.1

CC:

amaredia, bhubbard, ceph-eng-bugs, charcrou, cstpierr, dmick, dzafman, flucifre, hnallurv, kchai, kdreyer, perfbz, pportant, shmohan, sweil, tganguly, vakulkar

Target Milestone:

Keywords:

Patch

Target Release:

1.3.3

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

RHEL: ceph-0.94.5-11.el7cp Ubuntu: ceph_0.94.5-6redhat1trusty

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-05-06 18:39:49 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Patch to address syslog priority issue	none

Description Peter Portante 2015-10-29 02:30:46 UTC

Working with an 8 node deployment here with Ceph that a co-worker has setup, and we tried to setup Ceph to use syslog.  It seems to work, but we see all log messages sent as "emerg", program name "ceph-osd", facility "user".

Here is a Kibana page showing the logs: http://ki-perf44.perf.lab.eng.bos.redhat.com/?#/discover/Ceph-on-BAGL

We are using:

Installed Packages
ceph.x86_64                    1:0.94.1-13.el7cp             @ceph-mon
ceph-common.x86_64             1:0.94.1-13.el7cp             @ceph-mon
ceph-mon.x86_64                1:0.94.1-13.el7cp             @ceph-mon
ceph-osd.x86_64                1:0.94.1-13.el7cp             @ceph-osd

-----

Sample ceph.conf

[global]
fsid = 58996142-c833-4cfa-8de7-533154f3617b
mon_initial_members = overcloud-controller-0 overcloud-controller-1 overcloud-controller-2
mon_host = 172.18.0.17 172.18.0.15 172.18.0.20
#mon_host = 10.16.159.202 10.16.159.201 10.16.159.203
auth cluster required = none
auth service required = none
auth client required = none
filestore_xattr_use_omap = true
public network = 172.18.0.0/24
#public network = 10.16.159.0/24
cluster network = 172.18.0.0/24
log to syslog = true
err to syslog = true

[client]
#admin socket = /var/log/qemu/rbd-client-$pid.asok
#admin socket = /var/log/qemu/cluster-$type.$id.$pid.$cctid.asok
log_file = /var/log/qemu/ceph-rbd.log
rbd cache = true
rbd cache writethrough until flush = true

[osd]
osd backfill full ratio = 0.90

[mon]
mon cluster log to syslog = true

[mon.overcloud-controller-0]
        host = overcloud-controller-0
        addr = 172.18.0.17:6789
#        addr = 10.16.159.202:6789

[mon.overcloud-controller-1]
        host = overcloud-controller-1
        addr = 172.18.0.15:6789
#        addr = 10.16.159.201:6789

[mon.overcloud-controller-2]
        host = overcloud-controller-2
        addr = 172.18.0.20:6789
#        addr = 10.16.159.203:6789

Comment 2 Brad Hubbard 2015-10-29 02:39:00 UTC

I'm looking into this.

Comment 3 Sage Weil 2015-10-29 02:43:24 UTC

This is the call site responsible for those messages:

https://github.com/ceph/ceph/blob/master/src/log/Log.cc#L244

1- We're not specifying a priority.  Perhaps if you don't it defaults to EMERG?  That would be an easy fix.

2- We're not calling openlog()... not sure that matters, but I just noticed it on the man page.

Comment 4 Dan Mick 2015-10-29 03:37:30 UTC

syslog is a stupid, stupid interface.  Indeed, the first arg "priority" is *really* "priority | facility", and since priority is "the low three bits" and facility is "the next three bits", LOG_USER alone implies LOG_USER | LOG_EMERG.

Also, it seems like we might be interested in encoding the actual priority in the log based on the ceph log level, but it's not available at Log::_flush().  That's not really this bug, but it might be good to fix both at the same time.

Comment 5 Brad Hubbard 2015-10-29 03:48:14 UTC

(In reply to Dan Mick from comment #4)
> syslog is a stupid, stupid interface.  Indeed, the first arg "priority" is
> *really* "priority | facility", and since priority is "the low three bits"
> and facility is "the next three bits", LOG_USER alone implies LOG_USER |
> LOG_EMERG.

Right.

Comment 6 Peter Portante 2015-10-29 15:44:10 UTC

Would you also want to fix the reference here:

  https://github.com/ceph/ceph/blob/master/src/log/Log.cc#L266

All of those messages would also show up as "emerg", priority 0, so perhaps line 266 would want LOG_USER|LOG_DEBUG?

Comment 7 Brad Hubbard 2015-10-29 21:40:02 UTC

(In reply to Peter Portante from comment #6)
> Would you also want to fix the reference here:
> 
>   https://github.com/ceph/ceph/blob/master/src/log/Log.cc#L266
> 
> All of those messages would also show up as "emerg", priority 0, so perhaps
> line 266 would want LOG_USER|LOG_DEBUG?

Sure Peter,

I'm still reviewing the code and behavior but we will certainly try to catch all instances.

Comment 8 Brad Hubbard 2015-11-03 04:40:54 UTC

Peter,

If I create a test package that includes a patch for this would you want to test it out? What version would you want it base on?

0.94.1-13.el7cp ?

Comment 9 Peter Portante 2015-11-03 19:37:51 UTC

Thanks, Brad.  Yes, I would like to test that package if you create it, and 0.94.1-13.el7cp would be fine.

Comment 10 Brad Hubbard 2015-11-09 11:36:06 UTC

Sorry about the delay here, I was traveling last week. I should have this built and tested tomorrow.

Comment 12 Brad Hubbard 2015-11-10 02:37:55 UTC

Created attachment 1092042 [details]
Patch to address syslog priority issue

Comment 13 Brad Hubbard 2015-11-10 02:38:37 UTC

Once I have your feedback I'll get a tracker opened upstream and submit my patch

Comment 14 Brad Hubbard 2015-11-19 02:17:28 UTC

Did you manage to test with the package I created Peter?

Comment 15 Peter Portante 2015-11-19 02:28:15 UTC

Sorry Brad, I have not had a chance to test the changes yet.  Soon, hopefully, probably by the end of the second week of December, if all goes well.

Comment 16 Brad Hubbard 2015-11-19 02:40:40 UTC

Okay, no problem thanks.

Comment 17 Brad Hubbard 2015-12-07 01:45:56 UTC

Created http://tracker.ceph.com/issues/13993 and submitted a PR for this to save time.

Comment 18 Ken Dreyer (Red Hat) 2015-12-11 22:29:57 UTC

The timing is sticky because upstream's test lab is still down, so upstream hasn't been able to verify the fix and merge it.

Comment 19 Brad Hubbard 2015-12-12 00:01:51 UTC

(In reply to Ken Dreyer (Red Hat) from comment #18)
> The timing is sticky because upstream's test lab is still down, so upstream
> hasn't been able to verify the fix and merge it.

Not urgent at the moment Ken.

Comment 20 Peter Portante 2016-01-04 20:50:46 UTC

Could I get a set of bits deployable on Fedora 23 for testing?

Comment 21 Brad Hubbard 2016-01-04 21:56:01 UTC

(In reply to Peter Portante from comment #20)
> Could I get a set of bits deployable on Fedora 23 for testing?

This bug is for RHCS, we don't ship RHCS for Fedora 23 so we can't provide you with RHCS bits for F23?

Comment 22 Dan Mick 2016-01-04 22:23:12 UTC

Right, but there's an upstream bug/fix/package production process as well...

Comment 23 Brad Hubbard 2016-01-04 22:49:06 UTC

(In reply to Dan Mick from comment #22)
> Right, but there's an upstream bug/fix/package production process as well...

Dan is right of course. The most appropriate place for a discussion about Fedora binaries is a Fedora bug, I can build a test package for F23 but I don't believe this is the appropriate place to discuss it, nor provide it, and doing so does not progress this Bugzilla at all.

Comment 24 Peter Portante 2016-01-04 22:57:15 UTC

Can we verify upstream first then?  Did this fix make it all the way out to master yet?  If so, can I just install that on F23 first?  "ceph-deploy --dev master"?

If it is fixed there, then I can build a RHEL 7.2 box and try out that RHCS version explicitly after.

Does that sound reasonable?

Comment 25 Brad Hubbard 2016-01-04 23:18:56 UTC

(In reply to Peter Portante from comment #24)
> Can we verify upstream first then?  Did this fix make it all the way out to
> master yet?  If so, can I just install that on F23 first?  "ceph-deploy
> --dev master"?

Upstream tracker is http://tracker.ceph.com/issues/13993 and the PR https://github.com/ceph/ceph/pull/6815 was merged into Jewel. There a PRs for backports to Hammer and Infernalis but they are not merged yet, possibly due to the season.

$ git show 0a4b7ab20c2979f1de97ac4b0d8bc5a78c5bce16
commit 0a4b7ab20c2979f1de97ac4b0d8bc5a78c5bce16
Merge: d7581cd 8e93f3f
Author: Sage Weil <sage>
Date:   Sat Dec 19 13:58:37 2015 -0500

    Merge pull request #6815 from badone/wip-13993
    
    common: log: Assign LOG_DEBUG priority to syslog calls
    
    Reviewed-by: Sage Weil <sage>

$ git branch -r --contains 0a4b7ab20c2979f1de97ac4b0d8bc5a78c5bce16
  origin/HEAD -> origin/master
  origin/jewel
  origin/master
  origin/wip-cmake-reorg
  origin/wip-cmake-rocksdb
  origin/wip-doc-style
  origin/wip-rgw-new-multisite
  origin/wip-sage-testing

$ git branch
* master

$ git blame src/log/Log.cc|grep syslog\(
8e93f3f4 (Brad Hubbard     2015-12-07 11:31:28 +1000 259)         syslog(LOG_USER|LOG_DEBUG, "%s", buf);
8e93f3f4 (Brad Hubbard     2015-12-07 11:31:28 +1000 287)     syslog(LOG_USER|LOG_DEBUG, "%s", s);

> 
> If it is fixed there, then I can build a RHEL 7.2 box and try out that RHCS
> version explicitly after.
> 
> Does that sound reasonable?

Sure. Let me know via the upstream tracker if you need help with the upstream part.

Comment 26 Peter Portante 2016-01-05 03:10:50 UTC

Okay, I was able to hack and slash my way through an F22 install of upstream master to see the logging fixed worked there.

So I can build a RHEL 7.2 box to try out RHCS packages.  I'll do that now, and then if you could post the proper install instructions to try this fix out there, that would be great.

Comment 28 Peter Portante 2016-01-05 05:00:51 UTC

Sorry to be a bother, but for #11, it seems to assume I already have RHCS installed.

I'd like to use something very simple and straight-forward to get an RHCS install up and running, like https://www.berrange.com/posts/2015/12/21/ceph-single-node-deployment-on-fedora-23/

Can I use ceph-deploy to install from your provided packages with the fix?

Comment 29 Brad Hubbard 2016-01-05 05:21:31 UTC

(In reply to Peter Portante from comment #28)
> Sorry to be a bother, but for #11, it seems to assume I already have RHCS
> installed.
> 
> I'd like to use something very simple and straight-forward to get an RHCS
> install up and running, like
> https://www.berrange.com/posts/2015/12/21/ceph-single-node-deployment-on-
> fedora-23/
> 
> Can I use ceph-deploy to install from your provided packages with the fix?

Just run the yum install command on each machine you want in your cluster and skip the "ceph-deploy install" step.

Comment 31 Peter Portante 2016-01-05 13:04:24 UTC

Verified.

I do not see any logging at emerg level with these installed.

Thanks!

Comment 32 Ken Dreyer (Red Hat) 2016-01-05 15:47:19 UTC

dev-ack'ing since it's merged upstream; let's get this patch into RHCS 1.3.2 if we can.

Comment 35 Ken Dreyer (Red Hat) 2016-02-25 21:32:58 UTC

(In reply to Ken Dreyer (Red Hat) from comment #32)
> dev-ack'ing since it's merged upstream; let's get this patch into RHCS 1.3.2
> if we can.

Unfortunately this missed the 1.3.2 cut-off :(

Comment 36 Brad Hubbard 2016-02-25 22:35:13 UTC

*** Bug 1312144 has been marked as a duplicate of this bug. ***

Comment 37 Charles Crouch 2016-03-15 17:17:28 UTC

Can we get a hotfix for this? Thanks

Comment 40 Mike McCune 2016-03-28 22:20:08 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 52 Tanay Ganguly 2016-04-28 09:39:45 UTC

This issue appears to be resolved as we are not seeing any "emerg" message in syslog

Marking this BUG as verified.

Comment 57 errata-xmlrpc 2016-05-06 18:39:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0721.html