Red Hat Bugzilla – Bug 1276186
Use of syslog results in all log messages at priority "emerg"
Last modified: 2017-07-30 11:22:34 EDT
Working with an 8 node deployment here with Ceph that a co-worker has setup, and we tried to setup Ceph to use syslog. It seems to work, but we see all log messages sent as "emerg", program name "ceph-osd", facility "user".
Here is a Kibana page showing the logs: http://ki-perf44.perf.lab.eng.bos.redhat.com/?#/discover/Ceph-on-BAGL
We are using:
ceph.x86_64 1:0.94.1-13.el7cp @ceph-mon
ceph-common.x86_64 1:0.94.1-13.el7cp @ceph-mon
ceph-mon.x86_64 1:0.94.1-13.el7cp @ceph-mon
ceph-osd.x86_64 1:0.94.1-13.el7cp @ceph-osd
fsid = 58996142-c833-4cfa-8de7-533154f3617b
mon_initial_members = overcloud-controller-0 overcloud-controller-1 overcloud-controller-2
mon_host = 172.18.0.17 172.18.0.15 172.18.0.20
#mon_host = 10.16.159.202 10.16.159.201 10.16.159.203
auth cluster required = none
auth service required = none
auth client required = none
filestore_xattr_use_omap = true
public network = 172.18.0.0/24
#public network = 10.16.159.0/24
cluster network = 172.18.0.0/24
log to syslog = true
err to syslog = true
#admin socket = /var/log/qemu/rbd-client-$pid.asok
#admin socket = /var/log/qemu/cluster-$type.$id.$pid.$cctid.asok
log_file = /var/log/qemu/ceph-rbd.log
rbd cache = true
rbd cache writethrough until flush = true
osd backfill full ratio = 0.90
mon cluster log to syslog = true
host = overcloud-controller-0
addr = 172.18.0.17:6789
# addr = 10.16.159.202:6789
host = overcloud-controller-1
addr = 172.18.0.15:6789
# addr = 10.16.159.201:6789
host = overcloud-controller-2
addr = 172.18.0.20:6789
# addr = 10.16.159.203:6789
I'm looking into this.
This is the call site responsible for those messages:
1- We're not specifying a priority. Perhaps if you don't it defaults to EMERG? That would be an easy fix.
2- We're not calling openlog()... not sure that matters, but I just noticed it on the man page.
syslog is a stupid, stupid interface. Indeed, the first arg "priority" is *really* "priority | facility", and since priority is "the low three bits" and facility is "the next three bits", LOG_USER alone implies LOG_USER | LOG_EMERG.
Also, it seems like we might be interested in encoding the actual priority in the log based on the ceph log level, but it's not available at Log::_flush(). That's not really this bug, but it might be good to fix both at the same time.
(In reply to Dan Mick from comment #4)
> syslog is a stupid, stupid interface. Indeed, the first arg "priority" is
> *really* "priority | facility", and since priority is "the low three bits"
> and facility is "the next three bits", LOG_USER alone implies LOG_USER |
Would you also want to fix the reference here:
All of those messages would also show up as "emerg", priority 0, so perhaps line 266 would want LOG_USER|LOG_DEBUG?
(In reply to Peter Portante from comment #6)
> Would you also want to fix the reference here:
> All of those messages would also show up as "emerg", priority 0, so perhaps
> line 266 would want LOG_USER|LOG_DEBUG?
I'm still reviewing the code and behavior but we will certainly try to catch all instances.
If I create a test package that includes a patch for this would you want to test it out? What version would you want it base on?
Thanks, Brad. Yes, I would like to test that package if you create it, and 0.94.1-13.el7cp would be fine.
Sorry about the delay here, I was traveling last week. I should have this built and tested tomorrow.
Created attachment 1092042 [details]
Patch to address syslog priority issue
Once I have your feedback I'll get a tracker opened upstream and submit my patch
Did you manage to test with the package I created Peter?
Sorry Brad, I have not had a chance to test the changes yet. Soon, hopefully, probably by the end of the second week of December, if all goes well.
Okay, no problem thanks.
Created http://tracker.ceph.com/issues/13993 and submitted a PR for this to save time.
The timing is sticky because upstream's test lab is still down, so upstream hasn't been able to verify the fix and merge it.
(In reply to Ken Dreyer (Red Hat) from comment #18)
> The timing is sticky because upstream's test lab is still down, so upstream
> hasn't been able to verify the fix and merge it.
Not urgent at the moment Ken.
Could I get a set of bits deployable on Fedora 23 for testing?
(In reply to Peter Portante from comment #20)
> Could I get a set of bits deployable on Fedora 23 for testing?
This bug is for RHCS, we don't ship RHCS for Fedora 23 so we can't provide you with RHCS bits for F23?
Right, but there's an upstream bug/fix/package production process as well...
(In reply to Dan Mick from comment #22)
> Right, but there's an upstream bug/fix/package production process as well...
Dan is right of course. The most appropriate place for a discussion about Fedora binaries is a Fedora bug, I can build a test package for F23 but I don't believe this is the appropriate place to discuss it, nor provide it, and doing so does not progress this Bugzilla at all.
Can we verify upstream first then? Did this fix make it all the way out to master yet? If so, can I just install that on F23 first? "ceph-deploy --dev master"?
If it is fixed there, then I can build a RHEL 7.2 box and try out that RHCS version explicitly after.
Does that sound reasonable?
(In reply to Peter Portante from comment #24)
> Can we verify upstream first then? Did this fix make it all the way out to
> master yet? If so, can I just install that on F23 first? "ceph-deploy
> --dev master"?
Upstream tracker is http://tracker.ceph.com/issues/13993 and the PR https://github.com/ceph/ceph/pull/6815 was merged into Jewel. There a PRs for backports to Hammer and Infernalis but they are not merged yet, possibly due to the season.
$ git show 0a4b7ab20c2979f1de97ac4b0d8bc5a78c5bce16
Merge: d7581cd 8e93f3f
Author: Sage Weil <email@example.com>
Date: Sat Dec 19 13:58:37 2015 -0500
Merge pull request #6815 from badone/wip-13993
common: log: Assign LOG_DEBUG priority to syslog calls
Reviewed-by: Sage Weil <firstname.lastname@example.org>
$ git branch -r --contains 0a4b7ab20c2979f1de97ac4b0d8bc5a78c5bce16
origin/HEAD -> origin/master
$ git branch
$ git blame src/log/Log.cc|grep syslog\(
8e93f3f4 (Brad Hubbard 2015-12-07 11:31:28 +1000 259) syslog(LOG_USER|LOG_DEBUG, "%s", buf);
8e93f3f4 (Brad Hubbard 2015-12-07 11:31:28 +1000 287) syslog(LOG_USER|LOG_DEBUG, "%s", s);
> If it is fixed there, then I can build a RHEL 7.2 box and try out that RHCS
> version explicitly after.
> Does that sound reasonable?
Sure. Let me know via the upstream tracker if you need help with the upstream part.
Okay, I was able to hack and slash my way through an F22 install of upstream master to see the logging fixed worked there.
So I can build a RHEL 7.2 box to try out RHCS packages. I'll do that now, and then if you could post the proper install instructions to try this fix out there, that would be great.
Sorry to be a bother, but for #11, it seems to assume I already have RHCS installed.
I'd like to use something very simple and straight-forward to get an RHCS install up and running, like https://www.berrange.com/posts/2015/12/21/ceph-single-node-deployment-on-fedora-23/
Can I use ceph-deploy to install from your provided packages with the fix?
(In reply to Peter Portante from comment #28)
> Sorry to be a bother, but for #11, it seems to assume I already have RHCS
> I'd like to use something very simple and straight-forward to get an RHCS
> install up and running, like
> Can I use ceph-deploy to install from your provided packages with the fix?
Just run the yum install command on each machine you want in your cluster and skip the "ceph-deploy install" step.
I do not see any logging at emerg level with these installed.
dev-ack'ing since it's merged upstream; let's get this patch into RHCS 1.3.2 if we can.
(In reply to Ken Dreyer (Red Hat) from comment #32)
> dev-ack'ing since it's merged upstream; let's get this patch into RHCS 1.3.2
> if we can.
Unfortunately this missed the 1.3.2 cut-off :(
*** Bug 1312144 has been marked as a duplicate of this bug. ***
Can we get a hotfix for this? Thanks
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see email@example.com with any questions
This issue appears to be resolved as we are not seeing any "emerg" message in syslog
Marking this BUG as verified.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.