Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Prior to this update, any resource agent returning corrupted or invalid metadata would cause the rgmanager utility to terminate unexpectedly and the node to be fenced. With this update, rgmanager is able to skip the broken agent and continue operating as expected, thus fixing this bug.
DescriptionYevheniy Demchenko
2011-04-01 14:31:05 UTC
Created attachment 489417[details]
patch to resolve the issue.
Description of problem:
This issue was initially triggered by rpm package which wrongly set executable flag on resource-script.metadata file in /usr/share/cluster/. After running "cman_tool version -r" cluster crashed, last log entry was
rgmanager[47960]: Loading Service Data
Further investigation has shown, that this behaviour is triggered always if any runnable file in /usr/share/cluster provides not-well-xml-formatted meta-data output.
Version-Release number of selected component (if applicable):
rgmanager-3.0.12-10.el6.x86_64
How reproducible:
Always
Steps to Reproduce:
1. configure 2-node cluster, service cman start; service rgmanager start on 1 node.
2. chmod a+x /usr/share/lvm.metadata (for example)
3. increase version number in cluster.conf
4. cman_tool version -r -S
Actual results:
Rgmanager gets segfault, node restarts (or all nodes with incorrect resource script restart)
Expected results:
Cluster survives, maybe some error is logged.
Additional info:
rgmanager survives if debugging is enabled via -d flag or RGMANAGER_DEBUG=1. rgmanager also survives initial initialization.
gdb backtrace:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f4eb07df700 (LWP 21797)]
0x00000031392f8e01 in __vfprintf_chk () from /lib64/libc.so.6
(gdb) backtrace
#0 0x00000031392f8e01 in __vfprintf_chk () from /lib64/libc.so.6
#1 0x000000313c232da8 in vfprintf (ctx=<value optimized out>, msg=0x313c308acd "Entity: line %d: ") at /usr/include/bits/stdio2.h:128
#2 xmlGenericErrorDefaultFunc (ctx=<value optimized out>, msg=0x313c308acd "Entity: line %d: ") at error.c:78
#3 0x000000313c2317e9 in xmlReportError (err=0x7f4e9c044d48, ctxt=0x7f4e9c044af0, str=0x7f4e9c03dfa0 "Start tag expected, '<' not found\n",
channel=0x313c232d00 <xmlGenericErrorDefaultFunc>, data=0x0) at error.c:290
#4 0x000000313c232a55 in __xmlRaiseError (schannel=0, channel=0x313c232120 <xmlParserError__internal_alias>, data=0x7f4e9c044af0, ctx=0x7f4e9c044af0, nod=0x0, domain=1, code=4,
level=XML_ERR_FATAL, file=0x0, line=1, str1=0x0, str2=0x0, str3=0x0, int1=0, col=1, msg=0x313c312ce7 "%s") at error.c:624
#5 0x000000313c236f41 in xmlFatalErrMsg (ctxt=0x7f4e9c044af0, error=<value optimized out>, msg=<value optimized out>) at parser.c:496
#6 0x000000313c24d050 in xmlParseDocument__internal_alias (ctxt=0x7f4e9c044af0) at parser.c:10200
#7 0x000000313c24de45 in xmlSAXParseMemoryWithData__internal_alias (sax=0x0, buffer=<value optimized out>, size=<value optimized out>, recovery=0, data=0x0) at parser.c:13709
#8 0x000000000040c2b0 in read_resource_agent_metadata (rpath=0x4213a7 "/usr/share/cluster", rules=0x7f4eb07ded68)
at /usr/src/debug/rgmanager-3.0.12/rgmanager/src/daemons/resrules.c:991
#9 load_resource_rulefile (rpath=0x4213a7 "/usr/share/cluster", rules=0x7f4eb07ded68) at /usr/src/debug/rgmanager-3.0.12/rgmanager/src/daemons/resrules.c:1015
#10 load_resource_rules (rpath=0x4213a7 "/usr/share/cluster", rules=0x7f4eb07ded68) at /usr/src/debug/rgmanager-3.0.12/rgmanager/src/daemons/resrules.c:1163
#11 0x00000000004070da in init_resource_groups (reconfigure=1, do_init=0) at /usr/src/debug/rgmanager-3.0.12/rgmanager/src/daemons/groups.c:1647
#12 0x00000000004107ac in _event_thread_f (arg=<value optimized out>) at /usr/src/debug/rgmanager-3.0.12/rgmanager/src/daemons/rg_event.c:405
#13 0x00000031396077e1 in start_thread () from /lib64/libpthread.so.0
#14 0x00000031392e151d in clone () from /lib64/libc.so.6
This behaviour is probably caused by the wrong usage of xmlInitParser()/XmlCleanUp() parser in threaded application.
Attached patch resolves the issue, please revue it, it might need verifying.
Comment 2RHEL Program Management
2011-04-04 02:10:03 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.
Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.
I suspect it's actually related to the DBus integration work, which may or may not be calling xmlInitParser().
Your patch makes the init/cleanup of XML work consistently with other libxml2 programs.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Prior to this update, any resource agent returning corrupted or invalid metadata would cause the rgmanager utility to terminate unexpectedly and the node to be fenced. With this update, rgmanager is able to skip the broken agent and continue operating as expected, thus fixing this bug.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
http://rhn.redhat.com/errata/RHBA-2011-1595.html
Created attachment 489417 [details] patch to resolve the issue. Description of problem: This issue was initially triggered by rpm package which wrongly set executable flag on resource-script.metadata file in /usr/share/cluster/. After running "cman_tool version -r" cluster crashed, last log entry was rgmanager[47960]: Loading Service Data Further investigation has shown, that this behaviour is triggered always if any runnable file in /usr/share/cluster provides not-well-xml-formatted meta-data output. Version-Release number of selected component (if applicable): rgmanager-3.0.12-10.el6.x86_64 How reproducible: Always Steps to Reproduce: 1. configure 2-node cluster, service cman start; service rgmanager start on 1 node. 2. chmod a+x /usr/share/lvm.metadata (for example) 3. increase version number in cluster.conf 4. cman_tool version -r -S Actual results: Rgmanager gets segfault, node restarts (or all nodes with incorrect resource script restart) Expected results: Cluster survives, maybe some error is logged. Additional info: rgmanager survives if debugging is enabled via -d flag or RGMANAGER_DEBUG=1. rgmanager also survives initial initialization. gdb backtrace: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f4eb07df700 (LWP 21797)] 0x00000031392f8e01 in __vfprintf_chk () from /lib64/libc.so.6 (gdb) backtrace #0 0x00000031392f8e01 in __vfprintf_chk () from /lib64/libc.so.6 #1 0x000000313c232da8 in vfprintf (ctx=<value optimized out>, msg=0x313c308acd "Entity: line %d: ") at /usr/include/bits/stdio2.h:128 #2 xmlGenericErrorDefaultFunc (ctx=<value optimized out>, msg=0x313c308acd "Entity: line %d: ") at error.c:78 #3 0x000000313c2317e9 in xmlReportError (err=0x7f4e9c044d48, ctxt=0x7f4e9c044af0, str=0x7f4e9c03dfa0 "Start tag expected, '<' not found\n", channel=0x313c232d00 <xmlGenericErrorDefaultFunc>, data=0x0) at error.c:290 #4 0x000000313c232a55 in __xmlRaiseError (schannel=0, channel=0x313c232120 <xmlParserError__internal_alias>, data=0x7f4e9c044af0, ctx=0x7f4e9c044af0, nod=0x0, domain=1, code=4, level=XML_ERR_FATAL, file=0x0, line=1, str1=0x0, str2=0x0, str3=0x0, int1=0, col=1, msg=0x313c312ce7 "%s") at error.c:624 #5 0x000000313c236f41 in xmlFatalErrMsg (ctxt=0x7f4e9c044af0, error=<value optimized out>, msg=<value optimized out>) at parser.c:496 #6 0x000000313c24d050 in xmlParseDocument__internal_alias (ctxt=0x7f4e9c044af0) at parser.c:10200 #7 0x000000313c24de45 in xmlSAXParseMemoryWithData__internal_alias (sax=0x0, buffer=<value optimized out>, size=<value optimized out>, recovery=0, data=0x0) at parser.c:13709 #8 0x000000000040c2b0 in read_resource_agent_metadata (rpath=0x4213a7 "/usr/share/cluster", rules=0x7f4eb07ded68) at /usr/src/debug/rgmanager-3.0.12/rgmanager/src/daemons/resrules.c:991 #9 load_resource_rulefile (rpath=0x4213a7 "/usr/share/cluster", rules=0x7f4eb07ded68) at /usr/src/debug/rgmanager-3.0.12/rgmanager/src/daemons/resrules.c:1015 #10 load_resource_rules (rpath=0x4213a7 "/usr/share/cluster", rules=0x7f4eb07ded68) at /usr/src/debug/rgmanager-3.0.12/rgmanager/src/daemons/resrules.c:1163 #11 0x00000000004070da in init_resource_groups (reconfigure=1, do_init=0) at /usr/src/debug/rgmanager-3.0.12/rgmanager/src/daemons/groups.c:1647 #12 0x00000000004107ac in _event_thread_f (arg=<value optimized out>) at /usr/src/debug/rgmanager-3.0.12/rgmanager/src/daemons/rg_event.c:405 #13 0x00000031396077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x00000031392e151d in clone () from /lib64/libc.so.6 This behaviour is probably caused by the wrong usage of xmlInitParser()/XmlCleanUp() parser in threaded application. Attached patch resolves the issue, please revue it, it might need verifying.