506772 – SAF Test lck: saLckResourceUnlockAsync/3

Bug 506772 - SAF Test lck: saLckResourceUnlockAsync/3

Summary: SAF Test lck: saLckResourceUnlockAsync/3

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	openais
Sub Component:
Version:	14
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Ryan O'Hara
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	561190
TreeView+	depends on / blocked

Reported:	2009-06-18 15:56 UTC by Jan Friesse
Modified:	2011-10-03 15:33 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	561190 (view as bug list)
Environment:
Last Closed:	2011-10-03 15:33:51 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
3.c (7.04 KB, text/x-csrc) 2009-06-18 15:56 UTC, Jan Friesse	no flags	Details
Return SA_AIS_OK when unlocking a pending lock request. (473 bytes, patch) 2009-06-29 23:10 UTC, Ryan O'Hara	no flags	Details \| Diff
View All

Description Jan Friesse 2009-06-18 15:56:49 UTC

Created attachment 348509 [details]
3.c

Description of problem:
SUBJ doesn't work and falls with valgrind

Version-Release number of selected component (if applicable):
Trunk

How reproducible:
Run included file

Steps to Reproduce:
1.
2.
3.
  
Actual results:
Fall in valgrind

Expected results:
Pass test

Additional info:
This test looks like have two problems:
1. saLckResourceLock returns 5 (TIMEOUT). Really don't know why
2. Valgrind detect error in saLckResourceUnlockAsync

Comment 1 Ryan O'Hara 2009-06-29 21:44:13 UTC

1. I fixed the problem with the return value. The patch will be attached below.
2. What error(s) do you see when using valgrind? I don't see any valgrind errors related to any of the lock service API calls.

Comment 2 Ryan O'Hara 2009-06-29 23:10:29 UTC

Created attachment 349889 [details]
Return SA_AIS_OK when unlocking a pending lock request.

This should fix the problem with the return value.

Comment 3 Jan Friesse 2009-06-30 10:04:14 UTC

Ryan,
second need info, second bad news. Valgrind still shows error:

[root@node-06 saLckResourceUnlockAsync]# valgrind ./3.test
==5275== Memcheck, a memory error detector.
==5275== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al.
==5275== Using LibVEX rev 1884, a library for dynamic binary translation.
==5275== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP.
==5275== Using valgrind-3.4.1, a dynamic binary instrumentation framework.
==5275== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al.
==5275== For more details, rerun with: -v
==5275==
[DEBUG]: saLckInitialize
==5275== Syscall param socketcall.sendmsg(msg.msg_iov[i]) points to uninitialised byte(s)
==5275==    at 0x4046451: sendmsg (in /lib/libpthread-2.10.1.so)
==5275==    by 0x41C8132: coroipcc_service_connect (coroipcc.c:642)
==5275==    by 0x4035753: saLckInitialize (lck.c:194)
==5275==    by 0x8048C14: main (3.c:125)
==5275==  Address 0xbec5221c is on thread 1's stack
[DEBUG]: saLckResourceOpen
[DEBUG]: saLckResourceLock
==5275==
==5275== Syscall param socketcall.sendmsg(msg.msg_iov[i]) points to uninitialised byte(s)
==5275==    at 0x4046451: sendmsg (in /lib/libpthread-2.10.1.so)
==5275==    by 0x41C8132: coroipcc_service_connect (coroipcc.c:642)
==5275==    by 0x4036955: saLckResourceLock (lck.c:913)
==5275==    by 0x8048DC3: main (3.c:155)
==5275==  Address 0xbec5207c is on thread 1's stack
[DEBUG]: saLckResourceLock
==5275==
==5275== Thread 2:
==5275== Syscall param socketcall.sendmsg(msg.msg_iov[i]) points to uninitialised byte(s)
==5275==    at 0x4046478: sendmsg (in /lib/libpthread-2.10.1.so)
==5275==    by 0x41C8132: coroipcc_service_connect (coroipcc.c:642)
==5275==    by 0x4036955: saLckResourceLock (lck.c:913)
==5275==    by 0x8048A03: lock_thread (3.c:48)
==5275==    by 0x403E934: start_thread (in /lib/libpthread-2.10.1.so)
==5275==    by 0x413282D: clone (in /lib/libc-2.10.1.so)
==5275==  Address 0x53ccfbc is on thread 2's stack
[DEBUG]: saLckResourceUnlockAsync
==5275==
==5275== Thread 1:
==5275== Invalid write of size 4
==5275==    at 0x403546B: list_del (list.h:71)
==5275==    by 0x4035543: lckLockIdInstanceFinalize (lck.c:113)
==5275==    by 0x4035E0A: saLckDispatch (lck.c:494)
==5275==    by 0x8049083: main (3.c:213)
==5275==  Address 0x4 is not stack'd, malloc'd or (recently) free'd
==5275==
==5275== Process terminating with default action of signal 11 (SIGSEGV)
==5275==  Access not within mapped region at address 0x4
==5275==    at 0x403546B: list_del (list.h:71)
==5275==    by 0x4035543: lckLockIdInstanceFinalize (lck.c:113)
==5275==    by 0x4035E0A: saLckDispatch (lck.c:494)
==5275==    by 0x8049083: main (3.c:213)
==5275==  If you believe this happened as a result of a stack overflow in your
==5275==  program's main thread (unlikely but possible), you can try to increase
==5275==  the size of the main thread stack using the --main-stacksize= flag.
==5275==  The main thread stack size used in this run was 10485760.
==5275==
==5275== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 19 from 1)
==5275== malloc/free: in use at exit: 844 bytes in 11 blocks.
==5275== malloc/free: 14 allocs, 3 frees, 952 bytes allocated.
==5275== For counts of detected errors, rerun with: -v
==5275== Use --track-origins=yes to see where uninitialised values come from
==5275== searching for pointers to 11 not-freed blocks.
==5275== checked 18,961,252 bytes.
==5275==
==5275== LEAK SUMMARY:
==5275==    definitely lost: 0 bytes in 0 blocks.
==5275==      possibly lost: 136 bytes in 1 blocks.
==5275==    still reachable: 708 bytes in 10 blocks.
==5275==         suppressed: 0 bytes in 0 blocks.
==5275== Rerun with --leak-check=full to see details of leaked memory.
Killed

Problem is, it's not 100% reproducible (I must run it 20 times before I reached this)

Comment 4 Steven Dake 2009-09-28 15:43:28 UTC

Honza,

Please retry with openais 1.1.0.

Regards
-steve

Comment 5 Jan Friesse 2009-09-29 09:03:06 UTC

Retry with today TRUNK of corosync and openais,
[root@node-06 saLckResourceUnlockAsync]# valgrind ./3.test
==23815== Memcheck, a memory error detector.
==23815== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al.
==23815== Using LibVEX rev 1884, a library for dynamic binary translation.
==23815== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP.
==23815== Using valgrind-3.4.1, a dynamic binary instrumentation framework.
==23815== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al.
==23815== For more details, rerun with: -v
==23815==
==23815== Syscall param socketcall.sendmsg(msg.msg_iov[i]) points to uninitialised byte(s)
==23815==    at 0x4046451: sendmsg (in /lib/libpthread-2.10.1.so)
==23815==    by 0x41C8035: coroipcc_service_connect (coroipcc.c:697)
==23815==    by 0x4035715: saLckInitialize (lck.c:191)
==23815==    by 0x8048C14: main (3.c:125)
==23815==  Address 0xbeab9204 is on thread 1's stack
==23815==
==23815== Syscall param socketcall.sendmsg(msg.msg_iov[i]) points to uninitialised byte(s)
==23815==    at 0x4046451: sendmsg (in /lib/libpthread-2.10.1.so)
==23815==    by 0x41C8035: coroipcc_service_connect (coroipcc.c:697)
==23815==    by 0x40368E5: saLckResourceLock (lck.c:884)
==23815==    by 0x8048DC3: main (3.c:155)
==23815==  Address 0xbeab9064 is on thread 1's stack
==23815==
==23815== Thread 2:
==23815== Syscall param socketcall.sendmsg(msg.msg_iov[i]) points to uninitialised byte(s)
==23815==    at 0x4046478: sendmsg (in /lib/libpthread-2.10.1.so)
==23815==    by 0x41C8035: coroipcc_service_connect (coroipcc.c:697)
==23815==    by 0x40368E5: saLckResourceLock (lck.c:884)
==23815==    by 0x8048A03: lock_thread (3.c:48)
==23815==    by 0x403E934: start_thread (in /lib/libpthread-2.10.1.so)
==23815==    by 0x413282D: clone (in /lib/libc-2.10.1.so)
==23815==  Address 0x53ccfc4 is on thread 2's stack
==23815==
==23815== Thread 1:
==23815== Invalid write of size 4
==23815==    at 0x403543B: list_del (list.h:71)
==23815==    by 0x4035513: lckLockIdInstanceFinalize (lck.c:113)
==23815==    by 0x4035DD2: saLckDispatch (lck.c:491)
==23815==    by 0x8049083: main (3.c:213)
==23815==  Address 0x4 is not stack'd, malloc'd or (recently) free'd
==23815==
==23815== Process terminating with default action of signal 11 (SIGSEGV)
==23815==  Access not within mapped region at address 0x4
==23815==    at 0x403543B: list_del (list.h:71)
==23815==    by 0x4035513: lckLockIdInstanceFinalize (lck.c:113)
==23815==    by 0x4035DD2: saLckDispatch (lck.c:491)
==23815==    by 0x8049083: main (3.c:213)
==23815==  If you believe this happened as a result of a stack overflow in your
==23815==  program's main thread (unlikely but possible), you can try to increase
==23815==  the size of the main thread stack using the --main-stacksize= flag.
==23815==  The main thread stack size used in this run was 10485760.
==23815==
==23815== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 19 from 1)
==23815== malloc/free: in use at exit: 828 bytes in 11 blocks.
==23815== malloc/free: 14 allocs, 3 frees, 928 bytes allocated.
==23815== For counts of detected errors, rerun with: -v
==23815== Use --track-origins=yes to see where uninitialised values come from
==23815== searching for pointers to 11 not-freed blocks.
==23815== checked 18,961,268 bytes.
==23815==
==23815== LEAK SUMMARY:
==23815==    definitely lost: 0 bytes in 0 blocks.
==23815==      possibly lost: 136 bytes in 1 blocks.
==23815==    still reachable: 692 bytes in 10 blocks.
==23815==         suppressed: 0 bytes in 0 blocks.
==23815== Rerun with --leak-check=full to see details of leaked memory.
Killed

So yes, bug is still there.

Comment 6 Ryan O'Hara 2009-09-29 18:30:05 UTC

It is possible that this is due to differences in the type definitions in saAis.h. Steve and I discussed this while I was testing the MSG service with saftest. If I recall, saftest uses its own type definitions for various integers, etc. and I believe they were different that the type definitions that the openais services are compiled with. This caused problems with a few tests, and Steve and I wondered if perhaps it is the cause of subtle problems like this.

This problem seen when using valgrind only seems to exist on i386 architecture. In other words, I cannot recreate it on x86_64. Steve, do you remember what/how we fixed the header file in saftest to make this work?

Comment 7 Bug Zapper 2009-11-16 10:14:22 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 8 Bug Zapper 2010-07-30 10:41:16 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle.
Changing version to '14'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 9 Ryan O'Hara 2011-10-03 15:33:51 UTC

Closing WONTFIX since openais will be going away in F17.

Note You need to log in before you can comment on or make changes to this bug.