Bug 472960 - rg_test script does not provide adequate debugging of a faulty cluster.conf
Summary: rg_test script does not provide adequate debugging of a faulty cluster.conf
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: rgmanager
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-11-25 20:14 UTC by Stuart R. Kirk
Modified: 2010-01-21 19:38 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-01-21 19:38:30 UTC
Embargoed:


Attachments (Terms of Use)
rg_test being run against a valid cluster.conf does not produce a "No errors detected" message. (3.83 KB, text/plain)
2008-11-25 20:14 UTC, Stuart R. Kirk
no flags Details
Introducing a typo and subsequent display of expected output. (370 bytes, text/plain)
2008-11-25 20:17 UTC, Stuart R. Kirk
no flags Details
Introducing several typos which are not found by rg_test (3.42 KB, text/plain)
2008-11-25 20:21 UTC, Stuart R. Kirk
no flags Details

Description Stuart R. Kirk 2008-11-25 20:14:00 UTC
Created attachment 324658 [details]
rg_test being run against a valid cluster.conf does not produce a "No errors detected" message.

Description of problem:
Part of the rgmanager package, rg_test can be employed to debug a problematic cluster configuration file.  The documentation located at http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Cluster_Administration/s1-clust-rsc-testing-config-CA.html gives the appropriate syntax to run this program and explains that it will "Test a configuration (and /usr/share/cluster) for errors or redundant resource agents."

Upon running the script using the syntax specified (rg_test test /etc/cluster/cluster.conf) it does not do a detailed job in diagnosing and displaying potential syntax errors within the cluster configuration.  Purposefully introducing typos, or other errors only occasionally produce meaningful output. 

Version-Release number of selected component (if applicable):
2.0.38-2


Steps to Reproduce:
1.  Introduce errors into a known, good /etc/cluster/cluster.conf
2.  Run "rg_test test /etc/cluster/cluster.conf"
3.  Observe the output for errors present (or not).
  
Actual results:
See attachments.

Expected results:
See attachments.

Additional info:

Comment 1 Stuart R. Kirk 2008-11-25 20:17:16 UTC
Created attachment 324659 [details]
Introducing a typo and subsequent display of expected output.

I would expect that when a typo is introduced (intentionally or otherwise) to a cluster configuration that the rg_test script will alert the user to the problem that is found.  That is the case in this example.

Comment 2 Stuart R. Kirk 2008-11-25 20:21:30 UTC
Created attachment 324661 [details]
Introducing several typos which are not found by rg_test

I would expect that rg_test would alert the user to the misspelling of "ref" in the cluster configuration, however it doesn't seem to be as thorough as it should be in checking the XML syntax.

Comment 3 Lon Hohberger 2008-12-03 20:39:51 UTC
"Valid" and "correct" are very different.  Typos don't necessarily make a cluster.conf invalid, but they can make them semantically wrong.

In this particular case, sometimes, it's not what's written, but what isn't.  The service structure's output is flat-out *wrong* if you add the typo.

If it would make more sense, the XML outputs would have been like this:

<service name="test" autostart="1" hardrecovery="0" exclusive="0"
         nfslock="0" recovery="restart"
         depend_mode="hard" max_restarts="0"
         restart_expire_time="0">
  <lvm name="armstrong2-oracle-vol" vg_name="armstrong2vg"
       lv_name="oracle" nfslock="0" />
  <fs name="armstrong2-oracle-fs"
      mountpoint="/armstrong2/oracle"
      device="/dev/mapper/armstrong2vg-oracle"
      fstype="ext3" force_unmount= "1"
      self_fence="1" nfslock="0" fsid="47123"
      force_fsck="0" />
  <ip address="40.1.255.23" monitor_link="1" nfslock="0" />
</service>


With the typo added...

<service name="test" autostart="1" hardrecovery="0"
         exclusive="0" nfslock="0" recovery="restart"
         depend_mode="hard" max_restarts="0"
         restart_expire_time="0" />


We can detect some things easily (say wrong "reff").  This is actually fairly easy to add, and I agree that it should be added.

However, reporting errors for unrecognized attributes names is more difficult and probably will get us into trouble: we obsolete attribute names from time to time, but this doesn't make the configuration invalid (or even incorrect!).

Because the names of attributes and such are from the resource agents' metadata, and are not something that rg_test is aware of at compile-time (only run-time), maintaining a list of "obsolete" attributes within rg_test is just begging for problems.

Comment 4 Lon Hohberger 2009-02-27 15:24:55 UTC
We are working on updating the cluster.conf schema to be more robust.

Comment 5 RHEL Program Management 2010-01-21 19:38:30 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.


Note You need to log in before you can comment on or make changes to this bug.