Bug 506933 - SAF Test lck: SaLckLockWaiterCallbackT/6 and 9
SAF Test lck: SaLckLockWaiterCallbackT/6 and 9
Product: Fedora
Classification: Fedora
Component: openais (Show other bugs)
All Linux
low Severity medium
: ---
: ---
Assigned To: Ryan O'Hara
Fedora Extras Quality Assurance
Depends On:
  Show dependency treegraph
Reported: 2009-06-19 07:40 EDT by Jan Friesse
Modified: 2009-07-07 23:59 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-07-07 23:59:14 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
9.c (4.71 KB, text/x-csrc)
2009-06-19 07:40 EDT, Jan Friesse
no flags Details
9-fork.c (5.13 KB, text/x-csrc)
2009-06-19 07:41 EDT, Jan Friesse
no flags Details
6.c (6.01 KB, text/x-csrc)
2009-06-19 07:41 EDT, Jan Friesse
no flags Details
6-fork.c (3.93 KB, text/x-csrc)
2009-06-19 07:42 EDT, Jan Friesse
no flags Details
Remove resource from cleanup list on close. (1.88 KB, patch)
2009-07-07 23:27 EDT, Ryan O'Hara
no flags Details | Diff

  None (edit)
Description Jan Friesse 2009-06-19 07:40:52 EDT
Created attachment 348642 [details]

Description of problem:
This tests doesn't fall, but after finishing, doesn't destroy resource (both calls  

Version-Release number of selected component (if applicable):

How reproducible:
Run tests

Steps to Reproduce:
Actual results:
Doesn't detroy resource

Expected results:
Destroy resource

Additional info:
Comment 1 Jan Friesse 2009-06-19 07:41:12 EDT
Created attachment 348643 [details]
Comment 2 Jan Friesse 2009-06-19 07:41:43 EDT
Created attachment 348644 [details]
Comment 3 Jan Friesse 2009-06-19 07:42:04 EDT
Created attachment 348645 [details]
Comment 4 Ryan O'Hara 2009-06-29 15:26:50 EDT

Can you test these again with the latest lock service code in trunk? Both of these tests work for me. I might have fixed this problem with an previous patch. Let me know.
Comment 5 Jan Friesse 2009-06-30 05:54:02 EDT
no trunk has still this problem. Problem is not in tests self but in this two test + saLckResourceOpen/19, saLckResourceOpenAsync/19, SaLckResourceOpenCallbckT/9. This tests test, if we can return error value on opening uncreated resource. What we can. On clean start of corosync, saLckResourceOpen/19, ... works perfectly, and works perfectly until SaLckLockWaiterCallbackT/6 or 9 are run.
Comment 6 Jan Friesse 2009-06-30 06:17:34 EDT
this remind me, same problem has SaLckLockWaiterCallbackT/7 (modified version from      506523).
Comment 7 Ryan O'Hara 2009-06-30 13:20:12 EDT
I don't understand. How do you know that the resource is not destroyed? Please provide some output from running the test or something equivalent. I see no problem with either of these tests.
Comment 8 Jan Friesse 2009-07-01 04:22:35 EDT
Please notice this:

*Problem is not in tests self but in this two test + saLckResourceOpen/19, saLckResourceOpenAsync/19, SaLckResourceOpenCallbckT/9.*

[root@node-06 ~]# aisexec
[root@node-06 ~]# ats-61/autotest/saftest/AIS-lock-B.01.01/src/operations/saLckResourceOpen/19.test
[DEBUG]: saLckInitialize
[DEBUG]: saLckResourceOpen
[root@node-06 ~]# cd ats-61/autotest/saftest/AIS-lock-B.01.01/src/operations/SaLckLockWaiterCallbackT/
[root@node-06 SaLckLockWaiterCallbackT]# ./9.test
[DEBUG]: saLckInitialize
[DEBUG]: saLckResourceOpen
[DEBUG]: saLckResourceLock
[DEBUG]: saLckInitialize
[DEBUG]: saLckResourceOpen
[DEBUG]: saLckResourceLock
[DEBUG]: saLckResourceUnlock
[DEBUG]: saLckResourceClose
[DEBUG]: saLckResourceUnlock
[DEBUG]: saLckResourceClose
[root@node-06 SaLckLockWaiterCallbackT]# cd ../saLckResourceOpen
[root@node-06 saLckResourceOpen]# ./19.test
[DEBUG]: saLckInitialize
[DEBUG]: saLckResourceOpen
  Does not conform the expected behaviors!
  saLckResourceOpen, Return value: SA_AIS_OK, should be SA_AIS_ERR_NOT_EXIST
[root@node-06 saLckResourceOpen]# killall corosync
[root@node-06 saLckResourceOpen]# aisexec
[root@node-06 saLckResourceOpen]# ./19.test
[DEBUG]: saLckInitialize
[DEBUG]: saLckResourceOpen
[root@node-06 saLckResourceOpen]#
Comment 9 Ryan O'Hara 2009-07-07 23:14:37 EDT
It appears that this problem is due to the an issue in saLckResourceClose, which will decrement the reference count for that resource and strip locks appropriately (ignoring the case of orphan locks). The resource is only removed when 1) the reference count is zero and 2) no granted locks exist on that resource.

Also note that when a process exits, lck_lib_exit_fn is called. This function also closes any resources that exist in the process' cleanup list (in private data) and will also decrement the reference count.

The problem is that when we close a resource via saLckResourceClose, we don't remove that resource from the cleanup list. The result is that when the process exits, we close the resource again and decrement the reference count again, which is bad. In the end, the resource doesn't get deleted as it should.

The solution is as simple as removing the resource from the cleanup list when saLckResourceClose is called. A very simple fix for a problem that was only apparent do to the fact that a PR lock (granted) was still present on a resource when another process closed the resource and exited cleanly.

Patch to follow.
Comment 10 Ryan O'Hara 2009-07-07 23:27:00 EDT
Created attachment 350889 [details]
Remove resource from cleanup list on close.

This should fix the problem.
Comment 11 Ryan O'Hara 2009-07-07 23:59:14 EDT
Closing this as fixed upstream.

Note You need to log in before you can comment on or make changes to this bug.