Bug 1489704

Summary: [Infra] Log JVM’s STDOUT
Product: Red Hat OpenStack Reporter: Nir Yechiel <nyechiel>
Component: opendaylightAssignee: Michael Vorburger <vorburger>
Status: CLOSED ERRATA QA Contact: Itzik Brown <itbrown>
Severity: high Docs Contact:
Priority: high    
Version: 12.0 (Pike)CC: aadam, jchhatba, mkolesni, nyechiel, trozet, vorburger
Target Milestone: betaKeywords: OtherQA, Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: odl_infra
Fixed In Version: opendaylight-8.0.0-1.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
N/A
Last Closed: 2018-06-27 13:36:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nir Yechiel 2017-09-08 07:45:40 UTC
Description of problem:

Make all supported “logging configurations” also preserve not only Karaf’s main log (e.g. karaf.log file, et al) but also the JVM’s STDOUT because that is where the JVM, not Karaf, will log things like memory or thread resource issues, thread death exceptions, instead of in the karaf.log (for technical reasons, e.g. if there is no UncaughtExceptionHandler set, which sometimes there is but sometimes there isn’t, and it’s hard to guarantee and enforce; the more important thing is just not to be blind and lose such messages). 

Today this probably is captured by the systemd service STDOUT capture - it’s possible that this point can be addressed simply by adding a paragraph to some troubleshooting doc somewhere (for systemd system controlled ODL, for containerized it would be different again).

Comment 3 Michael Vorburger 2017-11-29 15:41:05 UTC
We first need the re-discuss and reach agreement the actual requirement here.. the original "Description of problem" in the initial Description, which is copy/pasted from something I wrote somewhere months ago, doesn't make much sense even to myself anymore upon re-reading it now.

If we focus on our supported containerized ODL deployment, then I think what would make sense here is to change ODL's Karaf logging configuration, when containerized, to not write into a karaf.log file stuck hidden inside the container (because, who's ever going to get access into the container to read that, anyway?), but to make it just make its logging spew everything to STDOUT, so that "something" (what?) can grab / archive / monitor that?

Reading e.g. https://www.12factor.net/logs and https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level, my understanding is that is how modern cloud native containerized application should do logging nowadays?

Doing this would implicitly and automatically ALSO cover the requirement initially described, because any "JVM, not Karaf, will log things like memory or thread resource issues, thread death exceptions" which today is not on the karaf.log but on STDOUT will end up in the same place if we make *all* logging go to STDOUT.

Who knows more about where logs in the containerized OSP (13) world have to go and can confirm that if the ODL container just writes (a lot...) to STDOUT that is the right thing to do, and will be piped into something that makes it accessible to Ops somehow?

If we can reach agreement that is what we want here, then it's not particularly hard to do - we would just have to get some script which creates the container to change ODL Karaf Log4j Pax Logging config to instead of karaf.log spew to STDOUT (with the right options to do it fast - there are some options related to this in Log4j.

PS: A completely different alternative to above would be to send ODL logs not into karaf.log nor to the container STDOUT but instead from within the container out to some log capture something - say to systemd journal (Bug 1475951), or.. whatever else is the "right" destination from logs from within containers in OSP.

Comment 4 Tim Rozet 2017-11-29 16:20:34 UTC
We currently run in a container without systemd, so we do not need to even consider that case for now.  The way to get container logs with docker is using this:
https://docs.docker.com/engine/reference/commandline/logs/#options

That will grab STDOUT and STDERR from the container, which is running a foreground karaf process shell.  So if we were to somehow enable logging to karaf console via a configuration file in ODL, I think that would be good enough.

Comment 5 Michael Vorburger 2017-11-29 16:36:46 UTC
> somehow enable logging to karaf console via a configuration file in ODL

when you say "logging to karaf console" that technically really just means "make the ODL Karaf JVM send its logs to STDOUT" (instead of into karaf.log), agreed? Then let me look into and propose the required logging configuration change for that here...

PS: An alternative would be to make ODL Karaf STDOUT for JVM errors go into a karaf.out file next to the usual karaf.log, using the KARAF_REDIRECT env var; that may be of interest instead of having to consult the journal via systemctl for non-containerized standalone - but it doesn't solve the real issue about logging into files in containers; so we'll focus on containers in this issue.

Comment 6 Janki 2017-11-30 05:30:01 UTC
(In reply to Michael Vorburger from comment #3)
> We first need the re-discuss and reach agreement the actual requirement
> here.. the original "Description of problem" in the initial Description,
> which is copy/pasted from something I wrote somewhere months ago, doesn't
> make much sense even to myself anymore upon re-reading it now.
> 
> If we focus on our supported containerized ODL deployment, then I think what
> would make sense here is to change ODL's Karaf logging configuration, when
> containerized, to not write into a karaf.log file stuck hidden inside the
> container (because, who's ever going to get access into the container to
> read that, anyway?), 

All /opt/opendaylight/data/log/Karaf.log files (of ODL container) are ALSO available on overcloud at /var/log/containers/opendaylight. And so are journal and snapshots folders at /var/lib/opendaylight/ on overcloud.

> but to make it just make its logging spew everything to
> STDOUT, so that "something" (what?) can grab / archive / monitor that?
> 
> Reading e.g. https://www.12factor.net/logs and
> https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-
> at-the-node-level, my understanding is that is how modern cloud native
> containerized application should do logging nowadays?
> 
> Doing this would implicitly and automatically ALSO cover the requirement
> initially described, because any "JVM, not Karaf, will log things like
> memory or thread resource issues, thread death exceptions" which today is
> not on the karaf.log but on STDOUT will end up in the same place if we make
> *all* logging go to STDOUT.
> 
> Who knows more about where logs in the containerized OSP (13) world have to
> go and can confirm that if the ODL container just writes (a lot...) to
> STDOUT that is the right thing to do, and will be piped into something that
> makes it accessible to Ops somehow?
> 
> If we can reach agreement that is what we want here, then it's not
> particularly hard to do - we would just have to get some script which
> creates the container to change ODL Karaf Log4j Pax Logging config to
> instead of karaf.log spew to STDOUT (with the right options to do it fast -
> there are some options related to this in Log4j.
> 
> PS: A completely different alternative to above would be to send ODL logs
> not into karaf.log nor to the container STDOUT but instead from within the
> container out to some log capture something - say to systemd journal (Bug
> 1475951), or.. whatever else is the "right" destination from logs from
> within containers in OSP.

Comment 7 Michael Vorburger 2017-12-06 18:02:33 UTC
> All /opt/opendaylight/data/log/Karaf.log files (of ODL container) are
> ALSO available on overcloud at /var/log/containers/opendaylight.

oh OK so those directories are "mounted" from the host into the container, it seems.

But that does not solves our problem here.. I've given this some thought, and don't think we can make the JVM stdout go into the karaf.log file which log4j opens... (Based on https://docs.docker.com/engine/admin/logging/view_container_logs/, I was considering playing tricks like "ln - sf /dev/stdout karaf.log" or setting the log file location to /proc/self/fd/1, but with Log4j going to roll-over log files, neither is going to work reliably; I fear.)

>> somehow enable logging to karaf console via a configuration file in ODL

> when you say "logging to karaf console" that technically really just means
> "make the ODL Karaf JVM send its logs to STDOUT" (instead of into karaf.log),
> agreed? Then let me look into and propose the required logging configuration 

This turns out to be trivial: If in ODL Karaf's etc/org.ops4j.pax.logging.cfg we change the first non-commented line which contains "log4j.rootLogger=INFO, async, osgi:*" to:

log4j.rootLogger=INFO, stdout, osgi:*

note the replacement of "async" by "stdout", then (tested) it will log to STDOUT instead of into karaf.log!  We don't want to / cannot change this upstream, of course - but if somebody could change this in the "right place" (?) where the ODL container image gets created, we would basically be "done" here!


BTW #1: https://logging.apache.org/log4j/2.x/manual/appenders.html#ConsoleAppender documents the "direct" parameter for the ConsoleAppender which is "New since 2.6.2" and "Can give up to 10x performance boost when the output is redirected to file or other process.".  So while we are at tweaking the logging configuration, let us set this as well, by adding the following to the same etc/org.ops4j.pax.logging.cfg a few lines later:

log4j.appender.stdout.direct=true

This currently leads to the following warning, but that's probably just because the Log4j version in PAX Logging is older than 2.6.2; let us set this anyway now already, it cannot hurt, and when that gets upgraded, it will use this "direct" mode and this warning will disappear:

log4j:WARN No such property [direct] in org.apache.log4j.ConsoleAppender


If at all possible, it would be good if to make this tweak for the containerized ODL we could read the original org.ops4j.pax.logging.cfg and only change that line (and add one for the next point below), like by applying a diff, and not just make a downstream copy of, modify and replace the org.ops4j.pax.logging.cfg.  This is because we may evolve the org.ops4j.pax.logging.cfg upstream; for example, I'm working on something else which will modify that layout.ConversionPattern.


BTW #2: While we are at tweaking things here, can I suggest that the Dockerfile for the ODL container image should start it with "bin/karaf server", if this isn't already done?  This sets -Dkaraf.startLocalConsole=false, which make it non-interactive without console on STDIN (we can still SSH into it though), and the early lines in the log during start up are a bit less ugly & confusing.  ("Not starting local console. To activate set karaf.startLocalConsole=true" log.)


Just FTR: The usual old data/log/karaf.log does not completely disappear, something somewhere in Karaf appears to be hard-coded to this and the following few lines appear in it on start-up, until it reads org.ops4j.pax.logging.cfg and realizes we just want STDOUT, this is fine and not a problem, just worth mentioning here:

Dec 06, 2017 6:50:14 PM org.apache.karaf.main.lock.SimpleFileLock lock
INFO: Trying to lock /home/vorburger/dev/ODL/git/odlparent/karaf/opendaylight-karaf-empty/target/assembly/lock
Dec 06, 2017 6:50:15 PM org.apache.karaf.main.lock.SimpleFileLock lock
INFO: Lock acquired
Dec 06, 2017 6:50:15 PM org.apache.karaf.main.Main$KarafLockCallback lockAquired
INFO: Lock acquired. Setting startlevel to 100

Conclusion: I've locally tested this and can confirm that with this (above), the original stated goal to "preserve not only karaf.log but also the JVM’s STDOUT" is achieved.  This can be verified by changing org.ops4j.pax.logging.cfg as above and starting "bin/karaf >/tmp/bug1489704.log" and then sending a "kill -3 <PID>" - the JVM thread dump will be in bug1489704.log, interspersed with log mesages, which is exactly what we wanted here.


PS: https://docs.docker.com/engine/admin/logging/overview/ mentions the Docker journald logging driver; that would indirectly gives us Bug 1475951 (kind of; it would be "text based", not really properly "structured" logging); seems worthwhile to me to use that driver, but I'll have to let others decide if we want this, and where to enable this.  What's very nice is that, apparently, "docker logs" works for the Docker journald logging driver as well, not just the default json-file driver.

Comment 12 Michael Vorburger 2017-12-13 13:01:28 UTC
FTR: Janki pointed out that in https://review.openstack.org/#/c/510001/ "all services in TripleO are moving to logging to stdout[1] so logs are accessible via 'docker logs'"; confirming that doing this for ODL as well is a Good Idea.

Comment 13 Michael Vorburger 2017-12-18 18:46:40 UTC
FTR: Upstream ODL master (Oxygen) in January is about to upgrade Karaf version and with that its underlying logging framework to Log4J2 (see e.g. https://git.opendaylight.org/gerrit/#/c/66568/) so depending on when we will actually make this change, the diff to apply to etc/org.ops4j.pax.logging.cfg would have to be a little bit adjusted (but everything else discussed here would not be impacted; just minor change in that configuration file).

Comment 14 Janki 2017-12-19 04:51:37 UTC
I had the changes made locally. I am waiting to test running ODL with "server" in TripleO before pushing upstream. I will now push the changes in Januanry once karaf  migration is done.

Comment 23 Itzik Brown 2018-04-08 08:14:04 UTC
Hi Michael,
Can you please suggest a way to verify? (Such as an entry that should be in the output?

Comment 24 Michael Vorburger 2018-04-10 09:37:37 UTC
Itzik, see comment 7 (#c7) above - basically: Sending a "kill -3 <ODL-Karaf-JVM-PID>" - usually that goes to STDOUT, but now the resulting JVM thread dump should be in interspersed with the log mesages where the full Karaf ODL log is now - karaf.log file and/or docker log.

Comment 25 Itzik Brown 2018-04-23 10:22:03 UTC
Verified with:
opendaylight-8.0.0-5.el7ost.noarch

Comment 26 Michael Vorburger 2018-04-23 11:09:37 UTC
> (...) the "direct" parameter for the ConsoleAppender which (...)

==> carried over into new bug 1570635

Comment 28 errata-xmlrpc 2018-06-27 13:36:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086