Bug 1416535

Summary: Custom corosync and pacemaker log locations are not collected
Product: Red Hat Enterprise Linux 7 Reporter: Ken Gaillot <kgaillot>
Component: sosAssignee: Pavel Moravec <pmoravec>
Status: CLOSED ERRATA QA Contact: Miroslav HradĂ­lek <mhradile>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.3CC: abeekhof, agk, bmr, cfeist, gavin, isenfeld, jfriesse, kgaillot, plambri, sbradley
Target Milestone: rcKeywords: OtherQA
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
URL: https://github.com/sosreport/sos/pull/924, https://github.com/sosreport/sos/pull/1002
Whiteboard:
Fixed In Version: sos-3.4-3.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 23:10:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ken Gaillot 2017-01-25 18:13:14 UTC
Description of problem: The corosync and pacemaker plugins hardcode the locations of the logfiles /var/log/cluster/corosync.log and /var/log/pacemaker.log. The user may configure alternate locations for these, in which case they are not collected by sosreport.

Version-Release number of selected component (if applicable):


How reproducible: trivially


Steps to Reproduce:
1. Install corosync and pacemaker.
2. Configure custom log locations.
3. Start corosync and pacemaker to generate some logs.
4. Run sosreport.

Actual results: The logs are not collected.

Expected results: The logs are collected.


Additional info:

Corosync logging options are set in /etc/corosync/corosync.conf as described in the corosync.conf(5) man page. In particular, logs may go to a file (specified by the to_logfile option) and/or to syslog (specified by syslog_facility). Of course an additional complication is that the user may have configured their system logger to send the particular syslog facility (e.g. local1) to a custom location.

Pacemaker has comparable options in /etc/sysconfig/pacemaker: PCMK_logfile to specify a file, and PCMK_logfacility to specify syslog.

Comment 2 Pavel Moravec 2017-01-31 13:53:30 UTC
How often a customer uses non-default location of these logfiles? Just one / very few, or is it more common? And do they disable logging to syslog (if not, check journalctl logs)?

Also how to *uniquely* identify value of logfile option? Does it need to be like:

logfile: /path/to/log.log

or can it be:

logfile:
  /path/to/log.log

?

What if some subsystem changes this? Shall we grep the very first value only and rely on it?

Comment 3 Jan Friesse 2017-01-31 17:02:55 UTC
@Pavel, Ken asked me to reply to your question.

corosync.conf parser is basically very simple (and sometimes little stupid ;) ). Parser is line based. Line can be one of:
- Beginning section. Format is then section_name {. It's line based so nothing like section1_name { subsection_name {.
- End of section. Format is then }. Because it's line based, there cannot be things like }}.
- key + value. Format is key: value. No new lines allowed.


So
----
logfile:
  /path/to/log.log
---
cannot happen.

Spaces (space + tab) at the beginning and end of line are stripped. Lines beginning with # are comments and they are ignored (keep in mind first step is strip of spaces, so "  #comment" is valid comment line and it's ignored).

If key set is found (so key: value), spaces are stripped from beginning and end of both key and value. So "logfile:           /path/to/log.log       " is valid and sets logfile to "/path/to/log.log").

Problem is, that values may be overwritten. So config file:

---
logging {
  log_file: test
  something: some_value
  log_file: /tmp/test.log
}
---

results in log_file set to /tmp/test.log. Same applies to

---
logging {
  log_file: test
}

logging {
  log_file: /tmp/test.log
}
---

again means log_file is /tmp/test.log. As a bonus key can have dot in it's name and it means same as start of section, so:

----
logging {
  log_file: test
}

logging.log_file: /tmp/test.log
---

again means log_file is set to /tmp/test.log.

Of course subsystem change is problematic. Question is, if we really need to try to solve it.

Another question is, if we need to be collect logs even if corosync is not running. Because if not, we can use corosync-cmapctl. This would solve all "parsing" problems.

If collecting logs with not running corosync is needed I would recommend to make it simple. Just grep 'log_file:', read value, check if value is valid existing file and if so, add it to sosreport.

Comment 4 Ken Gaillot 2017-01-31 17:57:13 UTC
We do need to collect logs without the cluster running, so I would go with the parsing.

We can use the currently configured value, and not worry about previous values. Users that change the value typically do so when they initially set up the system, and the window for problems (someone changing the location between when an issue occurs and when they run sosreport) is small enough that we can handle any such case manually.

I wouldn't say it's common that RHEL users change the location (more so upstream and in other distributions, e.g. /var/log/corosync/corosync.log is common), but I believe the impact on investigating issues is high enough to warrant the effort. It doesn't have to be a high priority.

Comment 5 Pavel Moravec 2017-02-05 20:04:37 UTC
Proper processing the logfile still would need more logic than a sos plugin shall have on parsing config files. I suggest searching of (first? all?) reg.exp:

'^\s*(logging.)?log_file:\s*(\S+)\s*$'

(i.e. either "log_file:" followed by filename, or "logging.log_file:" followed by filename, both with whitespaces anywhere around)

I guess log_file can't appear elsewhere than under logging section, so this simplification is safe.


Question: does it make sense to search:
- just for the very first occurrence (usually the generic/non-subsystem one, I expect) of log_file (easier/faster to get, shall be sufficient),
- or do we need/utilize all logfiles (i.e. also from all subsystems (cant we obtain something unwanted?) - more (maybe relevant) logs to get, more lengthy?

Comment 6 Jan Friesse 2017-02-06 07:48:05 UTC
@Pavel:

I believe there is misunderstanding:

> I guess log_file can't appear elsewhere than under logging section, so this simplification is safe.

logging {
  log_file: something
}

is equal to:

logging.log_file: something

but not to:

logging {
  logging.log_file: something
}


For first/all question. If possible go with ALL. Just test if file exists and if so, add it to SOS report. Possibility of storing more information is almost always better than storing less.

Comment 7 Pavel Moravec 2017-02-07 07:57:39 UTC
(In reply to Jan Friesse from comment #6)
> @Pavel:
> 
> I believe there is misunderstanding:
> 
> > I guess log_file can't appear elsewhere than under logging section, so this simplification is safe.
> 
> logging {
>   log_file: something
> }
> 
> is equal to:
> 
> logging.log_file: something
> 
> but not to:
> 
> logging {
>   logging.log_file: something
> }

This "something" will be captured while it shouldnt. No big deal, esp. when assuming customers wont have such (mis)config often.

That simplified parsing shall not skip collecting a log_file but might collect something else that shall be rather syntax error.

If sos would have to parse also "logging" section, the parsing would be more complex with a little gain only.

> 
> 
> For first/all question. If possible go with ALL. Just test if file exists
> and if so, add it to SOS report. Possibility of storing more information is
> almost always better than storing less.

OK, raised it in upstream as https://github.com/sosreport/sos/pull/924, subject to change if needed (esp. wrt. parsing the log_file locations). Feel free to play with the code if it will collect all expected logfiles.

Comment 8 Jan Friesse 2017-02-07 13:08:39 UTC
(In reply to Pavel Moravec from comment #7)
> (In reply to Jan Friesse from comment #6)
> > @Pavel:
> > 
> > I believe there is misunderstanding:
> > 
> > > I guess log_file can't appear elsewhere than under logging section, so this simplification is safe.
> > 
> > logging {
> >   log_file: something
> > }
> > 
> > is equal to:
> > 
> > logging.log_file: something
> > 
> > but not to:
> > 
> > logging {
> >   logging.log_file: something
> > }
> 
> This "something" will be captured while it shouldnt. No big deal, esp. when

Yes. And it's really nothing bad.

> assuming customers wont have such (mis)config often.

Yes.

> 
> That simplified parsing shall not skip collecting a log_file but might
> collect something else that shall be rather syntax error.
> 
> If sos would have to parse also "logging" section, the parsing would be more
> complex with a little gain only.
> 
> > 
> > 
> > For first/all question. If possible go with ALL. Just test if file exists
> > and if so, add it to SOS report. Possibility of storing more information is
> > almost always better than storing less.
> 
> OK, raised it in upstream as https://github.com/sosreport/sos/pull/924,
> subject to change if needed (esp. wrt. parsing the log_file locations). Feel
> free to play with the code if it will collect all expected logfiles.

I was testing the code and it doesn't work as expected. It is probably caused by me, because I was always writing "log_file" what is not correct. Correct one is "logfile". Again, sorry for confusing you. After changing regexp to:

^\s*(logging.)?logfile:\s*(\S+)$

it works as expected.

Comment 9 Pavel Moravec 2017-02-07 13:28:18 UTC
Thanks for the check, fixed in upstream PR in https://github.com/sosreport/sos/pull/924/commits/774e930a0324795077885a7e58a743b3d25e48a2

Comment 10 Ken Gaillot 2017-02-07 14:56:18 UTC
The corosync plugin looks good to me.

The pacemaker log will be easy if you already have common functions for detecting init environment location (/etc/sysconfig vs. /etc/default etc.) and parsing shell-style variables.(VAR=value).

Comment 11 Pavel Moravec 2017-02-19 14:25:14 UTC
Hello,
would you be able to test the fix in candidate RPM for RHEL7.4 (once available)?

Our QE would appreciate it.

Comment 12 Ken Gaillot 2017-02-20 16:30:09 UTC
(In reply to Pavel Moravec from comment #11)
> Hello,
> would you be able to test the fix in candidate RPM for RHEL7.4 (once
> available)?
> 
> Our QE would appreciate it.

Yes, put the link here when it's ready, and I'll test it as soon as I can. Thanks!

Comment 13 Pavel Moravec 2017-02-21 14:53:38 UTC
Closed #924 via 7015223.

Comment 18 Pavel Moravec 2017-05-03 07:52:29 UTC
Ah sorry, forgot the 2nd requirement. Captured here:

https://github.com/sosreport/sos/pull/1002

Comment 23 errata-xmlrpc 2017-08-01 23:10:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2203