Bug 758507

Summary: Document troubleshooting for luci, using log files
Product: Red Hat Enterprise Linux 6 Reporter: Steven J. Levine <slevine>
Component: doc-Cluster_AdministrationAssignee: Steven J. Levine <slevine>
Status: CLOSED WONTFIX QA Contact: ecs-bugs
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2CC: cluster-maint, fdinitto, jpokorny, rmccabe, slevine
Target Milestone: rcKeywords: Documentation
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-15 17:34:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Steven J. Levine 2011-11-29 22:50:13 UTC
Description of problem:

The Cluster Administration document could include information about troubleshooting luci, by means of the log files. There is already a troubleshooting section, but it could  be improved by adding the information that Jan Pokorny provided in this email.

This information applies to RHEL 5 as well, so after researching this we should probably clone the bug.

Email exchange that yielded this BZ:


On 11/28/2011 11:13 PM, Steven Levine wrote:
> Plus from IRC:
>
> 618321, 742431:  I'm going to need to look at this some more, but I'm
> not sure that this is a user documentation issue (rather than a fix).
> What is it that an admin should be doing here?  That's what I need to
> figure out.
>
> Is it a general issue of documenting the log files (which we don't
> currently do to any sort of detail)?  Would it make sense to open a
> documentation bug to "document the log files" and be sure that we
> include information about queue pruning?  That still seems internal to me.

In general:
-----------

I have no idea to what extent the log files should be considered
internal, probably depends on how often are respective pieces of
information required/what is the user's profit to work with such
information.

But e.g. when the problem is trivial (network connection not ready),
such information is worth to read from somewhere before asking for
help (no implication inserted here).

Maybe there could be a small section in the doc called "Troubleshooting"
containing something like:

Table X lists log files for cluster components.  You can use the
information contained within these files as a first diagnostics of your
cluster-related issues.

component     | log file location      | note
--------------+------------------------+--------------------------------
[...]
luci (RHEL 6) | /var/log/luci/luci.log |
--------------+------------------------+--------------------------------
modcluster    | /var/log/clumond.log   | as of <version solving
              |                        | bz618321/bz742431>, info about
              |                        | interventions due to insuffic.
              |                        | environment parameters is
              |                        | logged here
              |                        |            -OR-
              |                        | see knowledge base ...
--------------+------------------------+--------------------------------
ricci         | N/A                    | some messages via syslog
              |                        | (/var/log/messages by default)
--------------+------------------------+--------------------------------
[...]


Modclusterd specific:
---------------------

In case of modclusterd, the location of the log file is not much
intuitive one (/var/log/clumond.log, can be changed but what about
the backward compatibility?).  This may be also a problem with
lacking man page as man pages IMHO usually states the log files.

Specifically, the information about "queue pruning" is a good
self-diagnostics that the user can use to reevaluate its HW
resources and if s/he thinks it rather "our" SW fault, we can
be addressed with details, but the things can move.

Maybe the variant of that table with the link to knowledge base
in case of modclusterd (and maybe other components) would be
a best solution to keep users (mostly admins IMHO) informed?

Any thoughts?

-- Jan

Comment 1 Steven J. Levine 2011-11-30 16:48:53 UTC
From email...

On 11/29/2011 05:46 PM, Steven Levine wrote:
> I'm going to open a documentation bug for this issue: Document luci
> troubleshooting". I will open it for now as a RHEL 6.3 bug since we're
> past the development stage for 5.8, but that's just a first shot.

There is a kbase on this:

https://access.redhat.com/kb/docs/DOC-53506

-- Lon

Comment 2 Steven J. Levine 2011-11-30 20:48:15 UTC
Copying more of the email discussion here:

From Jan:

On 11/30/2011 05:42 PM, Lon Hohberger wrote:
> On 11/29/2011 05:46 PM, Steven Levine wrote:
>> I'm going to open a documentation bug for this issue: Document luci
>> troubleshooting". I will open it for now as a RHEL 6.3 bug since we're
>> past the development stage for 5.8, but that's just a first shot.
>
> There is a kbase on this:
>
> https://access.redhat.com/kb/docs/DOC-53506
>
> -- Lon

I haven't checked KB out.  So this really boils down mainly to
modclusterd, ricci and luci only (provided that nothing else
[cluster-snmp/-cim etc.] performs logging).

I think it's regrettable that modclusterd and ricci (incl. workers)
missed the cluster logging framework (lack of coordination?).
Luci is a detached thing anyway.

Despite the KBs, I think it is still worth considering a summarizing
table as proposed (for all components) possibly accompanied with the
mentioned link (there is also RHEL 6 version: DOC-53585) and detailed
KB link in case of modclusterd and maybe others.

I think so also based on what I found in RHEL 6 cluster doc:

3.5.5. Logging Configuration
- could link such table/section

9.4. Cluster Service Will Not Start
- "then read the messages logs"
  - either "messages" should be TT-formatted (if it refers to /var/log)
    or could link such table/section with defaults (if it refers to some
    abstract logs -- this seems not to be explained properly anywhere)

9.5. Cluster-Controlled Services Fails to Migrate
- ditto

9.10. Fencing Occurs at Random
- ditto

... and maybe others


I just think the concept of "message logs" would deserve some kind
of centralized clarification and part of it can be the proposed table.


Another perception (addressing mainly Lon now) cluster-snmp/-cim is not
a subject of documentation?  Is it supported?  I am getting lost.

-- Jan 

-----------

On 11/30/2011 01:44 PM, Jan Pokorny wrote:
> On 11/30/2011 05:42 PM, Lon Hohberger wrote:
>> On 11/29/2011 05:46 PM, Steven Levine wrote:
>>> I'm going to open a documentation bug for this issue: Document luci
>>> troubleshooting". I will open it for now as a RHEL 6.3 bug since we're
>>> past the development stage for 5.8, but that's just a first shot.
>>
>> There is a kbase on this:
>>
>> https://access.redhat.com/kb/docs/DOC-53506
>>
>> -- Lon
>
> I haven't checked KB out. So this really boils down mainly to
> modclusterd, ricci and luci only (provided that nothing else
> [cluster-snmp/-cim etc.] performs logging).
>
> I think it's regrettable that modclusterd and ricci (incl. workers)
> missed the cluster logging framework (lack of coordination?).
> Luci is a detached thing anyway.

I think this is because there was a lack of resources, and we never made it a requirement for those components.


> Despite the KBs, I think it is still worth considering a summarizing
> table as proposed (for all components) possibly accompanied with the
> mentioned link (there is also RHEL 6 version: DOC-53585) and detailed
> KB link in case of modclusterd and maybe others.

We typically are not allowed to link to KB from formal documentation, but otherwise, it's not a bad idea.


> I think so also based on what I found in RHEL 6 cluster doc:
>
> 3.5.5. Logging Configuration
> - could link such table/section
>
> 9.4. Cluster Service Will Not Start
> - "then read the messages logs"
> - either "messages" should be TT-formatted (if it refers to /var/log)
> or could link such table/section with defaults (if it refers to some
> abstract logs -- this seems not to be explained properly anywhere)

Sorry, what is "TT-formatted"?

It would be a very good thing to have all of the default message files listed in the documentation, so people know where to look.


> 9.5. Cluster-Controlled Services Fails to Migrate
> - ditto
>
> 9.10. Fencing Occurs at Random
> - ditto
>
> ... and maybe others
>
> I just think the concept of "message logs" would deserve some kind
> of centralized clarification and part of it can be the proposed table.

I don't disagree.


> Another perception (addressing mainly Lon now) cluster-snmp/-cim is not
> a subject of documentation? Is it supported? I am getting lost.

Cluster-snmp / cluster-cim are deprecated.

We have foghorn in RHEL6 for SNMP traps.  Foghorn is currently not documented.  Perhaps this is an area of enhancement we could look in to for RHEL 6.3.

-- Lon 

------------------

On 11/30/2011 08:14 PM, Lon Hohberger wrote:
> On 11/30/2011 01:44 PM, Jan Pokorny wrote:
>> I think it's regrettable that modclusterd and ricci (incl. workers)
>> missed the cluster logging framework (lack of coordination?).
>> Luci is a detached thing anyway.
>
> I think this is because there was a lack of resources, and we never made
> it a requirement for those components.

Understood.

>> Despite the KBs, I think it is still worth considering a summarizing
>> table as proposed (for all components) possibly accompanied with the
>> mentioned link (there is also RHEL 6 version: DOC-53585) and detailed
>> KB link in case of modclusterd and maybe others.
>
> We typically are not allowed to link to KB from formal documentation,
> but otherwise, it's not a bad idea.

Understood.  So for modclusterd in connection with bz618321/bz742431, there can be either nothing or something as per yesterday's email:
> as of <version solving bz618321/bz742431>, info about interventions
> due to insufficient environment parameters is logged here

And that KB is not too large so if there is something really vital, it
can perhaps be mentioned directly.

>
>> I think so also based on what I found in RHEL 6 cluster doc:
>>
>> 3.5.5. Logging Configuration
>> - could link such table/section
>>
>> 9.4. Cluster Service Will Not Start
>> - "then read the messages logs"
>> - either "messages" should be TT-formatted (if it refers to /var/log)
>> or could link such table/section with defaults (if it refers to some
>> abstract logs -- this seems not to be explained properly anywhere)
>
> Sorry, what is "TT-formatted"?

Was too brief here, teletype/typewrite text = monospace.

> It would be a very good thing to have all of the default message files
> listed in the documentation, so people know where to look.
>
>
>> 9.5. Cluster-Controlled Services Fails to Migrate
>> - ditto
>>
>> 9.10. Fencing Occurs at Random
>> - ditto
>>
>> ... and maybe others
>>
>> I just think the concept of "message logs" would deserve some kind
>> of centralized clarification and part of it can be the proposed table.
>
> I don't disagree.
>
>
>> Another perception (addressing mainly Lon now) cluster-snmp/-cim is not
>> a subject of documentation? Is it supported? I am getting lost.
>
> Cluster-snmp / cluster-cim are deprecated.
>
> We have foghorn in RHEL6 for SNMP traps. Foghorn is currently not
> documented. Perhaps this is an area of enhancement we could look in to
> for RHEL 6.3.

Yes, it is documented, at least in preparation [1].  Deprecated since
6.3 or earlier?  People seem to be using it [2].  That doc. explicitly
says foghorn does not allow get-access.

[1] <http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/ch-SNMP-Configuration-CA.html>

[2] https://www.redhat.com/archives/linux-cluster/2011-September/msg00069.html

> -- Lon
-- Jan 

-----------

From Lon

On 11/30/2011 02:34 PM, Jan Pokorny wrote:

>>> Despite the KBs, I think it is still worth considering a summarizing
>>> table as proposed (for all components) possibly accompanied with the
>>> mentioned link (there is also RHEL 6 version: DOC-53585) and detailed
>>> KB link in case of modclusterd and maybe others.
>>
>> We typically are not allowed to link to KB from formal documentation,
>> but otherwise, it's not a bad idea.
>
> Understood. So for modclusterd in connection with bz618321/bz742431,
> there can be either nothing or something as per yesterday's email:
>  > as of <version solving bz618321/bz742431>, info about interventions
>  > due to insufficient environment parameters is logged here
>
> And that KB is not too large so if there is something really vital, it
> can perhaps be mentioned directly.

Yes, it would be fine to add a troubleshooting section to the 6.3 documentation and add common problems to it (including issues causing modclusterd warnings when queue gets pruned).

Steve, are there resources for this sort of thing for 6.3?  This would largely be a support/doc effort with some feedback from engineering.


>>
>> Sorry, what is "TT-formatted"?
>
> Was too brief here, teletype/typewrite text = monospace.
>

Ah, ok.

> Yes, it is documented, at least in preparation [1]. Deprecated since
> 6.3 or earlier? People seem to be using it [2]. That doc. explicitly
> says foghorn does not allow get-access.

At least since the introduction of foghorn, but maybe even since RHEL 6.0 GA.

SNMP GET support is the only thing provided by cluster-snmp; it is kind of like the equivalent of "clustat".  Cluster-snmp has installed documentation:

  /usr/share/doc/cluster-snmp-0.16.2/README
  /usr/share/doc/cluster-snmp-0.16.2/README.snmpd

The current plan is that we will not be providing SNMP GET support in RHEL7 nor adding features to it in RHEL6.  We specifically asked product management about the demand while implementing foghorn, and the feedback we received was that there was minimal (or no) need for it.

So, while I agree that we should have comprehensive documentation, I don't see placing cluster-snmp information in the RHEL HA Add-On documentation as a good use of our limited resources.

-- Lon
-------------

From me:
[Adding John Ha and Mike Smith only because we've moved this a little
bit to the area of documentation resources, so if I'm promising
something they should at least know about it...]

Summarizing this down to this question of Lon's:

> > Yes, it would be fine to add a troubleshooting section to the 6.3
> > documentation and add common problems to it (including issues causing
> > modclusterd warnings when queue gets pruned).
> >
> > Steve, are there resources for this sort of thing for 6.3?  This would
> > largely be a support/doc effort with some feedback from engineering.
I think at this point the resources come down to whether I will have
time to do this for 6.3, and as of now I think that's wholly feasible.

I haven't yet begun planning work for RHEL 7, which could change that
equation since from what I've picked up so far this will involve some
major work on new material, but an expanded troubleshooting section for
RHEL 6.3 seems not only possible but very much the sort of thing that
people most frequently request. In general, people tend to go the
documentation when they are having trouble, so anything that falls under
the category of troubleshooting makes the docs more useful (and saves
the time and resources of our support folks, so there is always a
business case to be made for improving troubleshooting documentation.)

I'm keeping this conversation on file as part of BZ#758507, so this is
now at least on the table for RHEL 6.3 in a place where people can ping
it or check up on it or add things to it.

-Steven


On 11/30/2011 01:58 PM, Lon Hohberger wrote:
> > On 11/30/2011 02:34 PM, Jan Pokorny wrote:
> > 
>>>> >>>> Despite the KBs, I think it is still worth considering a summarizing
>>>> >>>> table as proposed (for all components) possibly accompanied with the
>>>> >>>> mentioned link (there is also RHEL 6 version: DOC-53585) and detailed
>>>> >>>> KB link in case of modclusterd and maybe others.
>>> >>>
>>> >>> We typically are not allowed to link to KB from formal documentation,
>>> >>> but otherwise, it's not a bad idea.
>> >>
>> >> Understood. So for modclusterd in connection with bz618321/bz742431,
>> >> there can be either nothing or something as per yesterday's email:
>> >>  > as of <version solving bz618321/bz742431>, info about interventions
>> >>  > due to insufficient environment parameters is logged here
>> >>
>> >> And that KB is not too large so if there is something really vital, it
>> >> can perhaps be mentioned directly.
> > 
> > Yes, it would be fine to add a troubleshooting section to the 6.3
> > documentation and add common problems to it (including issues causing
> > modclusterd warnings when queue gets pruned).
> > 
> > Steve, are there resources for this sort of thing for 6.3?  This would
> > largely be a support/doc effort with some feedback from engineering.
> > 
> > 
>>> >>>
>>> >>> Sorry, what is "TT-formatted"?
>> >>
>> >> Was too brief here, teletype/typewrite text = monospace.
>> >>
> > 
> > Ah, ok.
> > 
>> >> Yes, it is documented, at least in preparation [1]. Deprecated since
>> >> 6.3 or earlier? People seem to be using it [2]. That doc. explicitly
>> >> says foghorn does not allow get-access.
> > 
> > At least since the introduction of foghorn, but maybe even since RHEL
> > 6.0 GA.
> > 
> > SNMP GET support is the only thing provided by cluster-snmp; it is kind
> > of like the equivalent of "clustat".  Cluster-snmp has installed
> > documentation:
> > 
> >   /usr/share/doc/cluster-snmp-0.16.2/README
> >   /usr/share/doc/cluster-snmp-0.16.2/README.snmpd
> > 
> > The current plan is that we will not be providing SNMP GET support in
> > RHEL7 nor adding features to it in RHEL6.  We specifically asked product
> > management about the demand while implementing foghorn, and the feedback
> > we received was that there was minimal (or no) need for it.
> > 
> > So, while I agree that we should have comprehensive documentation, I
> > don't see placing cluster-snmp information in the RHEL HA Add-On
> > documentation as a good use of our limited resources.
> > 
> > -- Lon
> >

Comment 3 Steven J. Levine 2012-03-13 15:52:48 UTC
At this point I'm not sure this will make RHEL 6.3 -- Beta is in a couple of weeks and this could be a large project, requiring input and review and discussion. I have, however, incorporated the information from the kBase article about debug options into the document for 6.3, as per this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=712400

That's a start, but not quite as extensive as what Jan suggest.

I will keep this open and get back to it, but in terms of priority I'm moving this down right now until I can get all the 6.3 new feature work documented. I'm not yet moving this to 6.4, though -- I'll cycle back to this.

Comment 4 Steven J. Levine 2012-03-27 19:34:10 UTC
As per Comment 3, I'm moving this to 6.4. It's a large project.

Comment 14 Steven J. Levine 2016-11-15 17:34:32 UTC
I am moving all of this info in the category of plans for UseCaseFest, to get it out of the RHEL 6 queue and coordinate it better with the other larger non-feature doc plans going forward.  I'm closing this as wontfix since it will not be fixed as part of the bug system, but that's for administrative reasons.

This information is now in this blog post:

https://mojo.redhat.com/people/slevine/blog/2016/11/15/notes-on-debugging-luci-with-log-files

The larger plans for incorporating this into our doc are here:

https://mojo.redhat.com/docs/DOC-1034070

With this information now part of those larger plans we may finally see some action here, if we determine there is still a need at this point in the RHEL 6 cycle.