Bug 1128877 - rgmanager: Log errors during resource load
Summary: rgmanager: Log errors during resource load
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: rgmanager
Version: 6.5
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Ryan McCabe
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1075802 1172231
TreeView+ depends on / blocked
 
Reported: 2014-08-11 17:30 UTC by John Ruemker
Modified: 2018-12-09 18:20 UTC (History)
6 users (show)

Fixed In Version: rgmanager-3.0.12.1-26.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-11 01:13:10 UTC


Attachments (Terms of Use)
Log resource load errors on stdout and in logfile (2.35 KB, patch)
2014-08-11 17:43 UTC, John Ruemker
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0961 normal SHIPPED_LIVE rgmanager bug fix update 2016-05-10 22:57:21 UTC
Red Hat Knowledge Base (Solution) 1156833 None None None Never

Description John Ruemker 2014-08-11 17:30:38 UTC
Description of problem: When rgmanager fails to load a resource, such as due to a higher-than-allowed reference count or a reference to a non-existent resource, it is only logged via printf, but not to the log file and the resource is then not included in the tree, nor are any of its children.  This is problematic because it can cause a situation in which its not immediately clear why certain resources are not being started as part of a service.

A customer brought such a situation to us when they had a global ip resource referenced in two different services, not realizing this was invalid.  One of these resources had a script child, and so when that ip reference was excluded from the tree, so was the script.  This led to confusion and several days of delays in their rollout because it wasn't clear from the logs anything was failing, the ip resource from that service did appear to be running (although it was actually just started by the primary reference in another service), and the script was simply just not being processed at all.  Having the error logged would have made detecting this much quicker. 


Version-Release number of selected component (if applicable): rgmanager-3.0.12.1-19.el6


How reproducible: Easily


Steps to Reproduce:
1. Create a global resource in the <resources> section

<resources>
   <ip address="192.168.143.101/24" monitor_link="1"/>
</resources>

2. Create a reference to the same resource in multiple services, with a child listed in the 2nd

<service autostart="1" name="svc1" domain="1then2" recovery="relocate">
       <ip ref="192.168.143.101/24"/>
</service>
<service autostart="1" name="svc2" domain="1then2" recovery="relocate">
       <ip address="192.168.143.102/24" monitor_link="1"/>
       <ip ref="192.168.143.101/24">
             <script name="script" file="/usr/local/bin/script.sh"/>
       </ip>
</service>

3. Apply config and start rgmanager
4. Start svc2 via rg_test

  # rg_test test /etc/cluster/cluster.conf start service svc2

Actual results: When rgmanager starts, both services show as started, the double-referenced IP is created (because of svc1's ip reference), and the other IP in svc2 is started, but the script is never executed, and nothing relevant is found in the log files. 


Expected results: rgmanager logs an error clearly explaining why the resource did not load


Additional info:

Comment 1 John Ruemker 2014-08-11 17:43:22 UTC
Created attachment 925847 [details]
Log resource load errors on stdout and in logfile

Seems best to keep the existing printf()s in addition to using logt_print(), because rg_test also uses do_load_resource and it would be preferable to have it continue to log these errors in addition to them making their way to the logfile when it happens for the daemon.

Comment 3 Cole Towsley 2014-10-14 15:51:24 UTC
Experienced this issue in another form in case #01243042 where if you have specified an ip ref in a service, but that ip ref doesn't match any defined global ip resource, the service will still start despite failing to bind an ip. Caused confusion with a customer in regards to their service state and configuration.

-cole

Comment 25 errata-xmlrpc 2016-05-11 01:13:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0961.html


Note You need to log in before you can comment on or make changes to this bug.