Bug 506933 - SAF Test lck: SaLckLockWaiterCallbackT/6 and 9
Summary: SAF Test lck: SaLckLockWaiterCallbackT/6 and 9
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: openais
Version: rawhide
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Ryan O'Hara
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-19 11:40 UTC by Jan Friesse
Modified: 2009-07-08 03:59 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-07-08 03:59:14 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
9.c (4.71 KB, text/x-csrc)
2009-06-19 11:40 UTC, Jan Friesse
no flags Details
9-fork.c (5.13 KB, text/x-csrc)
2009-06-19 11:41 UTC, Jan Friesse
no flags Details
6.c (6.01 KB, text/x-csrc)
2009-06-19 11:41 UTC, Jan Friesse
no flags Details
6-fork.c (3.93 KB, text/x-csrc)
2009-06-19 11:42 UTC, Jan Friesse
no flags Details
Remove resource from cleanup list on close. (1.88 KB, patch)
2009-07-08 03:27 UTC, Ryan O'Hara
no flags Details | Diff

Description Jan Friesse 2009-06-19 11:40:52 UTC
Created attachment 348642 [details]
9.c

Description of problem:
This tests doesn't fall, but after finishing, doesn't destroy resource (both calls  
saLckResourceClose)

Version-Release number of selected component (if applicable):
Trunk

How reproducible:
Run tests

Steps to Reproduce:
1.
2.
3.
  
Actual results:
Doesn't detroy resource

Expected results:
Destroy resource

Additional info:

Comment 1 Jan Friesse 2009-06-19 11:41:12 UTC
Created attachment 348643 [details]
9-fork.c

Comment 2 Jan Friesse 2009-06-19 11:41:43 UTC
Created attachment 348644 [details]
6.c

Comment 3 Jan Friesse 2009-06-19 11:42:04 UTC
Created attachment 348645 [details]
6-fork.c

Comment 4 Ryan O'Hara 2009-06-29 19:26:50 UTC
Hozaf,

Can you test these again with the latest lock service code in trunk? Both of these tests work for me. I might have fixed this problem with an previous patch. Let me know.

Comment 5 Jan Friesse 2009-06-30 09:54:02 UTC
Ryan,
no trunk has still this problem. Problem is not in tests self but in this two test + saLckResourceOpen/19, saLckResourceOpenAsync/19, SaLckResourceOpenCallbckT/9. This tests test, if we can return error value on opening uncreated resource. What we can. On clean start of corosync, saLckResourceOpen/19, ... works perfectly, and works perfectly until SaLckLockWaiterCallbackT/6 or 9 are run.

Comment 6 Jan Friesse 2009-06-30 10:17:34 UTC
Ryan,
this remind me, same problem has SaLckLockWaiterCallbackT/7 (modified version from      506523).

Comment 7 Ryan O'Hara 2009-06-30 17:20:12 UTC
I don't understand. How do you know that the resource is not destroyed? Please provide some output from running the test or something equivalent. I see no problem with either of these tests.

Comment 8 Jan Friesse 2009-07-01 08:22:35 UTC
Please notice this:

*Problem is not in tests self but in this two test + saLckResourceOpen/19, saLckResourceOpenAsync/19, SaLckResourceOpenCallbckT/9.*

Example:
[root@node-06 ~]# aisexec
[root@node-06 ~]# ats-61/autotest/saftest/AIS-lock-B.01.01/src/operations/saLckResourceOpen/19.test
[DEBUG]: saLckInitialize
[DEBUG]: saLckResourceOpen
[root@node-06 ~]# cd ats-61/autotest/saftest/AIS-lock-B.01.01/src/operations/SaLckLockWaiterCallbackT/
[root@node-06 SaLckLockWaiterCallbackT]# ./9.test
[DEBUG]: saLckInitialize
[DEBUG]: saLckResourceOpen
[DEBUG]: saLckResourceLock
[DEBUG]: saLckInitialize
[DEBUG]: saLckResourceOpen
[DEBUG]: saLckResourceLock
[DEBUG]: saLckResourceUnlock
[DEBUG]: saLckResourceClose
[DEBUG]: saLckResourceUnlock
[DEBUG]: saLckResourceClose
[root@node-06 SaLckLockWaiterCallbackT]# cd ../saLckResourceOpen
[root@node-06 saLckResourceOpen]# ./19.test
[DEBUG]: saLckInitialize
[DEBUG]: saLckResourceOpen
  Does not conform the expected behaviors!
  saLckResourceOpen, Return value: SA_AIS_OK, should be SA_AIS_ERR_NOT_EXIST
[root@node-06 saLckResourceOpen]# killall corosync
[root@node-06 saLckResourceOpen]# aisexec
[root@node-06 saLckResourceOpen]# ./19.test
[DEBUG]: saLckInitialize
[DEBUG]: saLckResourceOpen
[root@node-06 saLckResourceOpen]#

Comment 9 Ryan O'Hara 2009-07-08 03:14:37 UTC
It appears that this problem is due to the an issue in saLckResourceClose, which will decrement the reference count for that resource and strip locks appropriately (ignoring the case of orphan locks). The resource is only removed when 1) the reference count is zero and 2) no granted locks exist on that resource.

Also note that when a process exits, lck_lib_exit_fn is called. This function also closes any resources that exist in the process' cleanup list (in private data) and will also decrement the reference count.

The problem is that when we close a resource via saLckResourceClose, we don't remove that resource from the cleanup list. The result is that when the process exits, we close the resource again and decrement the reference count again, which is bad. In the end, the resource doesn't get deleted as it should.

The solution is as simple as removing the resource from the cleanup list when saLckResourceClose is called. A very simple fix for a problem that was only apparent do to the fact that a PR lock (granted) was still present on a resource when another process closed the resource and exited cleanly.

Patch to follow.

Comment 10 Ryan O'Hara 2009-07-08 03:27:00 UTC
Created attachment 350889 [details]
Remove resource from cleanup list on close.

This should fix the problem.

Comment 11 Ryan O'Hara 2009-07-08 03:59:14 UTC
Closing this as fixed upstream.


Note You need to log in before you can comment on or make changes to this bug.