RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 625601 - openais should handle non null terminated chkpoint strings
Summary: openais should handle non null terminated chkpoint strings
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: openais
Version: 6.0
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Ryan O'Hara
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 625947
TreeView+ depends on / blocked
 
Reported: 2010-08-19 22:08 UTC by Corey Marthaler
Modified: 2016-04-26 14:02 UTC (History)
8 users (show)

Fixed In Version: openais-1.1.1-6.el6
Doc Type: Bug Fix
Doc Text:
When a checkpoint name was not terminated with the NULL character, the aisexec process may have terminated unexpectedly with a segmentation fault, causing a cluster outage. With this update, the underlying source code has been modified to resolve this issue, and such strings no longer cause aisexec to crash.
Clone Of:
: 625947 (view as bug list)
Environment:
Last Closed: 2010-11-10 22:13:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Fix printing of non-NULL terminated strings (11.19 KB, patch)
2010-08-20 19:50 UTC, Jonathan Earl Brassow
no flags Details | Diff
Minor change to previous patch. (11.19 KB, patch)
2010-08-20 21:51 UTC, Ryan O'Hara
no flags Details | Diff
Resolve FIXME issues, changes printf statement to use length. (11.02 KB, patch)
2010-08-23 15:38 UTC, Ryan O'Hara
no flags Details | Diff

Description Corey Marthaler 2010-08-19 22:08:44 UTC
Description of problem:

Aug 19 21:36:32 taft-01 clvmd: Cluster LVM daemon started - connected to CMAN
Aug 19 21:36:32 taft-01 kernel: device-mapper: dm-log-userspace: version 1.0.0 loaded
Aug 19 21:36:32 taft-01 dmeventd[2446]: dmeventd ready for processing.
Aug 19 21:36:32 taft-01 lvm[2446]: Monitoring mirror device TAFT-ha1 for events.
Aug 19 21:36:32 taft-01 lvm[2446]: TAFT-ha1 is now in-sync.
Aug 19 21:36:33 taft-01 abrt[2449]: saved core dump of pid 2057 (/usr/sbin/corosync) to /var/spool/abrt/ccpp-1282271793-2057.new/coredump (69464064 bytes)
Aug 19 21:36:33 taft-01 abrtd: Directory 'ccpp-1282271793-2057' creation detected
Aug 19 21:36:33 taft-01 fenced[2112]: cluster is down, exiting
Aug 19 21:36:33 taft-01 gfs_controld[2148]: daemon cpg_dispatch error 2
Aug 19 21:36:33 taft-01 dlm_controld[2128]: cluster is down, exiting
Aug 19 21:36:33 taft-01 gfs_controld[2148]: cluster is down, exiting
Aug 19 21:36:33 taft-01 abrtd: Registered Database plugin 'SQLite3'
Aug 19 21:36:33 taft-01 abrtd: Package 'corosync' isn't signed with proper key
Aug 19 21:36:33 taft-01 abrtd: Corrupted or bad crash /var/spool/abrt/ccpp-1282271793-2057 (res:5), deleting
Aug 19 21:36:35 taft-01 cmirrord[2268]: Sync checkpoint section creation failed: SA_AIS_ERR_LIBRARY
Aug 19 21:36:37 taft-01 cmirrord[2268]: [0tBLqj9e] Failed to export checkpoint for 4
Aug 19 21:36:37 taft-01 kernel: dlm: closing connection to node 4
Aug 19 21:36:37 taft-01 kernel: dlm: closing connection to node 3
Aug 19 21:36:37 taft-01 kernel: dlm: closing connection to node 2
Aug 19 21:36:37 taft-01 kernel: dlm: closing connection to node 1
Aug 19 21:36:37 taft-01 kernel: dlm: clvmd: no userland control daemon, stopping lockspace
Aug 19 21:36:39 taft-01 cmirrord[2268]: [0tBLqj9e] Failed to open checkpoint for 4: SA_AIS_ERR_LIBRARY
Aug 19 21:36:39 taft-01 cmirrord[2268]: [0tBLqj9e] Failed to export checkpoint for 4
Aug 19 21:36:41 taft-01 cmirrord[2268]: [0tBLqj9e] Failed to open checkpoint for 4: SA_AIS_ERR_LIBRARY
Aug 19 21:36:41 taft-01 cmirrord[2268]: [0tBLqj9e] Failed to export checkpoint for 4
Aug 19 21:36:43 taft-01 cmirrord[2268]: [0tBLqj9e] Failed to open checkpoint for 4: SA_AIS_ERR_LIBRARY
Aug 19 21:36:43 taft-01 cmirrord[2268]: [0tBLqj9e] Failed to export checkpoint for 4


(gdb) where
#0  0x0000003def2329b5 in raise () from /lib64/libc.so.6
#1  0x0000003def234195 in abort () from /lib64/libc.so.6
#2  0x0000003def26fe1b in __libc_message () from /lib64/libc.so.6
#3  0x0000003def2fb447 in __fortify_fail () from /lib64/libc.so.6
#4  0x0000003def2f9340 in __chk_fail () from /lib64/libc.so.6
#5  0x0000003def2f8799 in _IO_str_chk_overflow () from /lib64/libc.so.6
#6  0x0000003def273f1c in _IO_default_xsputn_internal () from /lib64/libc.so.6
#7  0x0000003def247b36 in vfprintf () from /lib64/libc.so.6
#8  0x0000003def2f883d in __vsprintf_chk () from /lib64/libc.so.6
#9  0x00007fc81a525ee2 in vsprintf (rec_ident=4294965351, function_name=0x7fc818331c80 "checkpoint_section_find", file_name=0x7fc818330d00 "ckpt.c",
    file_line=919, format=<value optimized out>, ap=<value optimized out>) at /usr/include/bits/stdio2.h:47
#10 _logsys_log_vprintf (rec_ident=4294965351, function_name=0x7fc818331c80 "checkpoint_section_find", file_name=0x7fc818330d00 "ckpt.c", file_line=919,
    format=<value optimized out>, ap=<value optimized out>) at logsys.c:1325
#11 0x00007fc81a5261da in _logsys_log_printf (rec_ident=<value optimized out>, function_name=<value optimized out>, file_name=<value optimized out>,
    file_line=<value optimized out>, format=<value optimized out>) at logsys.c:1398
#12 0x00007fc818329663 in checkpoint_section_find (checkpoint=0x19b2320,
    id=0x7ffffd28c4d8 "sync_bits\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"..., id_len=9) at ckpt.c:919
#13 0x00007fc81832dd9a in message_handler_req_exec_ckpt_sectioncreate (message=0x7ffffd28c390, nodeid=<value optimized out>) at ckpt.c:1801
#14 0x0000000000406ec0 in deliver_fn (nodeid=1, msg=0x7ffffd28c390, msg_len=<value optimized out>, endian_conversion_required=0) at main.c:758
#15 0x00007fc81a74122f in app_deliver_fn (nodeid=1, msg=<value optimized out>, msg_len=<value optimized out>, endian_conversion_required=0)
    at totempg.c:506
#16 0x00007fc81a7417b3 in totempg_deliver_fn (nodeid=1, msg=0x19b3112, msg_len=0, endian_conversion_required=0) at totempg.c:618
#17 0x00007fc81a73958f in messages_deliver_to_app (instance=0x7fc8135cb010, skip=0, end_point=<value optimized out>) at totemsrp.c:3701
#18 0x00007fc81a73f594 in message_handler_orf_token (instance=<value optimized out>, msg=<value optimized out>, msg_len=<value optimized out>,
    endian_conversion_needed=<value optimized out>) at totemsrp.c:3575
#19 0x00007fc81a7357c3 in rrp_deliver_fn (context=0x194e610, msg=0x1974b3c, msg_len=71) at totemrrp.c:1393
#20 0x00007fc81a7346b6 in net_deliver_fn (handle=<value optimized out>, fd=<value optimized out>, revents=<value optimized out>, data=0x1974470)
    at totemudp.c:1244
#21 0x00007fc81a73068a in poll_run (handle=5429246941735157760) at coropoll.c:435
#22 0x00000000004056ab in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:1607


Version-Release number of selected component (if applicable):
openais-1.1.1-5.el6.x86_64
corosync-1.2.3-20.el6.x86_64

2.6.32-59.1.el6.x86_64

lvm2-2.02.72-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
lvm2-libs-2.02.72-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
lvm2-cluster-2.02.72-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
udev-147-2.22.el6    BUILT: Fri Jul 23 07:21:33 CDT 2010
device-mapper-1.02.53-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
device-mapper-libs-1.02.53-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
device-mapper-event-1.02.53-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
device-mapper-event-libs-1.02.53-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
cmirror-2.02.72-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010



How reproducible:
Everytime I attempt to start clvmd with the cluster in "this state".

Comment 1 Steven Dake 2010-08-19 22:12:50 UTC
The issue is sync_bits section id length as given to the ckpt service is = 9.  This does not account for the null termination that the pretty printing in ckpt does.

We can't add a null termination inside ckpt because it will break wire compat.

The two possible solutions are to fix the numerous pretty printing that takes place inside the ckpt service or alternatively increase the "sync_bits" and possibly other length parameters used by cmirror passed into ckpt.

Comment 2 Jonathan Earl Brassow 2010-08-20 19:50:37 UTC
Created attachment 440026 [details]
Fix printing of non-NULL terminated strings

Don't add support for NULL terminated strings to the structures - or as others to pass in NULL terminated strings - just print the strings as allowed by printf ("%.*s" format).

Comment 3 Steven Dake 2010-08-20 21:25:48 UTC
General solution of patch looks reasonable, although attachment 440026 [details] has several FIXMEs which remain unaddressed.  Ryan will have to resolve those issues in an updated patch.

Comment 5 Ryan O'Hara 2010-08-20 21:51:13 UTC
Created attachment 440058 [details]
Minor change to previous patch.

Removed an extra ')' in the previous patch, line 1531.

Comment 6 Ryan O'Hara 2010-08-21 01:08:18 UTC
I have FIXMEs resolved. Updated patch will be posted soon. Need to recreate bug and test fix before handing-off to QE.

Comment 8 Ryan O'Hara 2010-08-23 15:38:28 UTC
Created attachment 440415 [details]
Resolve FIXME issues, changes printf statement to use length.

This should resolve all issues with log_printf and unterminated string. The issues tagged with FIXME in the original patch have been resolved. Please review this patch.

Comment 9 Ryan O'Hara 2010-08-23 19:27:16 UTC
Fix pushed upstream and pulled into RHEL6 build. Marking this as POST.

Comment 10 Steven Dake 2010-08-23 19:31:30 UTC
Patch reviewed by sdake - is good to go.

Comment 11 Ryan O'Hara 2010-08-23 19:38:25 UTC
Fixed in openais-1.1.1-6 build. Marking this MODIFIED.

Comment 12 Corey Marthaler 2010-08-23 20:34:46 UTC
No longer seeing clvmd start-up issues. Marking verified with the latest build.

openais-1.1.1-6.el6.x86_64
openaislib-1.1.1-6.el6.x86_64

Comment 13 releng-rhel@redhat.com 2010-11-10 22:13:01 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Comment 14 Jaromir Hradilek 2010-12-07 16:54:37 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
When a checkpoint name was not terminated with the NULL character, the aisexec process may have terminated unexpectedly with a segmentation fault, causing a cluster outage. With this update, the underlying source code has been modified to resolve this issue, and such strings no longer cause aisexec to crash.


Note You need to log in before you can comment on or make changes to this bug.