Created attachment 348509 [details] 3.c Description of problem: SUBJ doesn't work and falls with valgrind Version-Release number of selected component (if applicable): Trunk How reproducible: Run included file Steps to Reproduce: 1. 2. 3. Actual results: Fall in valgrind Expected results: Pass test Additional info: This test looks like have two problems: 1. saLckResourceLock returns 5 (TIMEOUT). Really don't know why 2. Valgrind detect error in saLckResourceUnlockAsync
1. I fixed the problem with the return value. The patch will be attached below. 2. What error(s) do you see when using valgrind? I don't see any valgrind errors related to any of the lock service API calls.
Created attachment 349889 [details] Return SA_AIS_OK when unlocking a pending lock request. This should fix the problem with the return value.
Ryan, second need info, second bad news. Valgrind still shows error: [root@node-06 saLckResourceUnlockAsync]# valgrind ./3.test ==5275== Memcheck, a memory error detector. ==5275== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al. ==5275== Using LibVEX rev 1884, a library for dynamic binary translation. ==5275== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP. ==5275== Using valgrind-3.4.1, a dynamic binary instrumentation framework. ==5275== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al. ==5275== For more details, rerun with: -v ==5275== [DEBUG]: saLckInitialize ==5275== Syscall param socketcall.sendmsg(msg.msg_iov[i]) points to uninitialised byte(s) ==5275== at 0x4046451: sendmsg (in /lib/libpthread-2.10.1.so) ==5275== by 0x41C8132: coroipcc_service_connect (coroipcc.c:642) ==5275== by 0x4035753: saLckInitialize (lck.c:194) ==5275== by 0x8048C14: main (3.c:125) ==5275== Address 0xbec5221c is on thread 1's stack [DEBUG]: saLckResourceOpen [DEBUG]: saLckResourceLock ==5275== ==5275== Syscall param socketcall.sendmsg(msg.msg_iov[i]) points to uninitialised byte(s) ==5275== at 0x4046451: sendmsg (in /lib/libpthread-2.10.1.so) ==5275== by 0x41C8132: coroipcc_service_connect (coroipcc.c:642) ==5275== by 0x4036955: saLckResourceLock (lck.c:913) ==5275== by 0x8048DC3: main (3.c:155) ==5275== Address 0xbec5207c is on thread 1's stack [DEBUG]: saLckResourceLock ==5275== ==5275== Thread 2: ==5275== Syscall param socketcall.sendmsg(msg.msg_iov[i]) points to uninitialised byte(s) ==5275== at 0x4046478: sendmsg (in /lib/libpthread-2.10.1.so) ==5275== by 0x41C8132: coroipcc_service_connect (coroipcc.c:642) ==5275== by 0x4036955: saLckResourceLock (lck.c:913) ==5275== by 0x8048A03: lock_thread (3.c:48) ==5275== by 0x403E934: start_thread (in /lib/libpthread-2.10.1.so) ==5275== by 0x413282D: clone (in /lib/libc-2.10.1.so) ==5275== Address 0x53ccfbc is on thread 2's stack [DEBUG]: saLckResourceUnlockAsync ==5275== ==5275== Thread 1: ==5275== Invalid write of size 4 ==5275== at 0x403546B: list_del (list.h:71) ==5275== by 0x4035543: lckLockIdInstanceFinalize (lck.c:113) ==5275== by 0x4035E0A: saLckDispatch (lck.c:494) ==5275== by 0x8049083: main (3.c:213) ==5275== Address 0x4 is not stack'd, malloc'd or (recently) free'd ==5275== ==5275== Process terminating with default action of signal 11 (SIGSEGV) ==5275== Access not within mapped region at address 0x4 ==5275== at 0x403546B: list_del (list.h:71) ==5275== by 0x4035543: lckLockIdInstanceFinalize (lck.c:113) ==5275== by 0x4035E0A: saLckDispatch (lck.c:494) ==5275== by 0x8049083: main (3.c:213) ==5275== If you believe this happened as a result of a stack overflow in your ==5275== program's main thread (unlikely but possible), you can try to increase ==5275== the size of the main thread stack using the --main-stacksize= flag. ==5275== The main thread stack size used in this run was 10485760. ==5275== ==5275== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 19 from 1) ==5275== malloc/free: in use at exit: 844 bytes in 11 blocks. ==5275== malloc/free: 14 allocs, 3 frees, 952 bytes allocated. ==5275== For counts of detected errors, rerun with: -v ==5275== Use --track-origins=yes to see where uninitialised values come from ==5275== searching for pointers to 11 not-freed blocks. ==5275== checked 18,961,252 bytes. ==5275== ==5275== LEAK SUMMARY: ==5275== definitely lost: 0 bytes in 0 blocks. ==5275== possibly lost: 136 bytes in 1 blocks. ==5275== still reachable: 708 bytes in 10 blocks. ==5275== suppressed: 0 bytes in 0 blocks. ==5275== Rerun with --leak-check=full to see details of leaked memory. Killed Problem is, it's not 100% reproducible (I must run it 20 times before I reached this)
Honza, Please retry with openais 1.1.0. Regards -steve
Retry with today TRUNK of corosync and openais, [root@node-06 saLckResourceUnlockAsync]# valgrind ./3.test ==23815== Memcheck, a memory error detector. ==23815== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al. ==23815== Using LibVEX rev 1884, a library for dynamic binary translation. ==23815== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP. ==23815== Using valgrind-3.4.1, a dynamic binary instrumentation framework. ==23815== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al. ==23815== For more details, rerun with: -v ==23815== ==23815== Syscall param socketcall.sendmsg(msg.msg_iov[i]) points to uninitialised byte(s) ==23815== at 0x4046451: sendmsg (in /lib/libpthread-2.10.1.so) ==23815== by 0x41C8035: coroipcc_service_connect (coroipcc.c:697) ==23815== by 0x4035715: saLckInitialize (lck.c:191) ==23815== by 0x8048C14: main (3.c:125) ==23815== Address 0xbeab9204 is on thread 1's stack ==23815== ==23815== Syscall param socketcall.sendmsg(msg.msg_iov[i]) points to uninitialised byte(s) ==23815== at 0x4046451: sendmsg (in /lib/libpthread-2.10.1.so) ==23815== by 0x41C8035: coroipcc_service_connect (coroipcc.c:697) ==23815== by 0x40368E5: saLckResourceLock (lck.c:884) ==23815== by 0x8048DC3: main (3.c:155) ==23815== Address 0xbeab9064 is on thread 1's stack ==23815== ==23815== Thread 2: ==23815== Syscall param socketcall.sendmsg(msg.msg_iov[i]) points to uninitialised byte(s) ==23815== at 0x4046478: sendmsg (in /lib/libpthread-2.10.1.so) ==23815== by 0x41C8035: coroipcc_service_connect (coroipcc.c:697) ==23815== by 0x40368E5: saLckResourceLock (lck.c:884) ==23815== by 0x8048A03: lock_thread (3.c:48) ==23815== by 0x403E934: start_thread (in /lib/libpthread-2.10.1.so) ==23815== by 0x413282D: clone (in /lib/libc-2.10.1.so) ==23815== Address 0x53ccfc4 is on thread 2's stack ==23815== ==23815== Thread 1: ==23815== Invalid write of size 4 ==23815== at 0x403543B: list_del (list.h:71) ==23815== by 0x4035513: lckLockIdInstanceFinalize (lck.c:113) ==23815== by 0x4035DD2: saLckDispatch (lck.c:491) ==23815== by 0x8049083: main (3.c:213) ==23815== Address 0x4 is not stack'd, malloc'd or (recently) free'd ==23815== ==23815== Process terminating with default action of signal 11 (SIGSEGV) ==23815== Access not within mapped region at address 0x4 ==23815== at 0x403543B: list_del (list.h:71) ==23815== by 0x4035513: lckLockIdInstanceFinalize (lck.c:113) ==23815== by 0x4035DD2: saLckDispatch (lck.c:491) ==23815== by 0x8049083: main (3.c:213) ==23815== If you believe this happened as a result of a stack overflow in your ==23815== program's main thread (unlikely but possible), you can try to increase ==23815== the size of the main thread stack using the --main-stacksize= flag. ==23815== The main thread stack size used in this run was 10485760. ==23815== ==23815== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 19 from 1) ==23815== malloc/free: in use at exit: 828 bytes in 11 blocks. ==23815== malloc/free: 14 allocs, 3 frees, 928 bytes allocated. ==23815== For counts of detected errors, rerun with: -v ==23815== Use --track-origins=yes to see where uninitialised values come from ==23815== searching for pointers to 11 not-freed blocks. ==23815== checked 18,961,268 bytes. ==23815== ==23815== LEAK SUMMARY: ==23815== definitely lost: 0 bytes in 0 blocks. ==23815== possibly lost: 136 bytes in 1 blocks. ==23815== still reachable: 692 bytes in 10 blocks. ==23815== suppressed: 0 bytes in 0 blocks. ==23815== Rerun with --leak-check=full to see details of leaked memory. Killed So yes, bug is still there.
It is possible that this is due to differences in the type definitions in saAis.h. Steve and I discussed this while I was testing the MSG service with saftest. If I recall, saftest uses its own type definitions for various integers, etc. and I believe they were different that the type definitions that the openais services are compiled with. This caused problems with a few tests, and Steve and I wondered if perhaps it is the cause of subtle problems like this. This problem seen when using valgrind only seems to exist on i386 architecture. In other words, I cannot recreate it on x86_64. Steve, do you remember what/how we fixed the header file in saftest to make this work?
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle. Changing version to '12'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle. Changing version to '14'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Closing WONTFIX since openais will be going away in F17.