Description of problem: When rgmanager fails to load a resource, such as due to a higher-than-allowed reference count or a reference to a non-existent resource, it is only logged via printf, but not to the log file and the resource is then not included in the tree, nor are any of its children. This is problematic because it can cause a situation in which its not immediately clear why certain resources are not being started as part of a service.
A customer brought such a situation to us when they had a global ip resource referenced in two different services, not realizing this was invalid. One of these resources had a script child, and so when that ip reference was excluded from the tree, so was the script. This led to confusion and several days of delays in their rollout because it wasn't clear from the logs anything was failing, the ip resource from that service did appear to be running (although it was actually just started by the primary reference in another service), and the script was simply just not being processed at all. Having the error logged would have made detecting this much quicker.
Version-Release number of selected component (if applicable): rgmanager-188.8.131.52-19.el6
How reproducible: Easily
Steps to Reproduce:
1. Create a global resource in the <resources> section
<ip address="192.168.143.101/24" monitor_link="1"/>
2. Create a reference to the same resource in multiple services, with a child listed in the 2nd
<service autostart="1" name="svc1" domain="1then2" recovery="relocate">
<service autostart="1" name="svc2" domain="1then2" recovery="relocate">
<ip address="192.168.143.102/24" monitor_link="1"/>
<script name="script" file="/usr/local/bin/script.sh"/>
3. Apply config and start rgmanager
4. Start svc2 via rg_test
# rg_test test /etc/cluster/cluster.conf start service svc2
Actual results: When rgmanager starts, both services show as started, the double-referenced IP is created (because of svc1's ip reference), and the other IP in svc2 is started, but the script is never executed, and nothing relevant is found in the log files.
Expected results: rgmanager logs an error clearly explaining why the resource did not load
Created attachment 925847 [details]
Log resource load errors on stdout and in logfile
Seems best to keep the existing printf()s in addition to using logt_print(), because rg_test also uses do_load_resource and it would be preferable to have it continue to log these errors in addition to them making their way to the logfile when it happens for the daemon.
Experienced this issue in another form in case #01243042 where if you have specified an ip ref in a service, but that ip ref doesn't match any defined global ip resource, the service will still start despite failing to bind an ip. Caused confusion with a customer in regards to their service state and configuration.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.