Bug 506933

Summary: SAF Test lck: SaLckLockWaiterCallbackT/6 and 9
Product: [Fedora] Fedora Reporter: Jan Friesse <jfriesse>
Component: openaisAssignee: Ryan O'Hara <rohara>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: agk, fdinitto, sdake
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-07-08 03:59:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
9.c
none
9-fork.c
none
6.c
none
6-fork.c
none
Remove resource from cleanup list on close. none

Description Jan Friesse 2009-06-19 11:40:52 UTC
Created attachment 348642 [details]
9.c

Description of problem:
This tests doesn't fall, but after finishing, doesn't destroy resource (both calls  
saLckResourceClose)

Version-Release number of selected component (if applicable):
Trunk

How reproducible:
Run tests

Steps to Reproduce:
1.
2.
3.
  
Actual results:
Doesn't detroy resource

Expected results:
Destroy resource

Additional info:

Comment 1 Jan Friesse 2009-06-19 11:41:12 UTC
Created attachment 348643 [details]
9-fork.c

Comment 2 Jan Friesse 2009-06-19 11:41:43 UTC
Created attachment 348644 [details]
6.c

Comment 3 Jan Friesse 2009-06-19 11:42:04 UTC
Created attachment 348645 [details]
6-fork.c

Comment 4 Ryan O'Hara 2009-06-29 19:26:50 UTC
Hozaf,

Can you test these again with the latest lock service code in trunk? Both of these tests work for me. I might have fixed this problem with an previous patch. Let me know.

Comment 5 Jan Friesse 2009-06-30 09:54:02 UTC
Ryan,
no trunk has still this problem. Problem is not in tests self but in this two test + saLckResourceOpen/19, saLckResourceOpenAsync/19, SaLckResourceOpenCallbckT/9. This tests test, if we can return error value on opening uncreated resource. What we can. On clean start of corosync, saLckResourceOpen/19, ... works perfectly, and works perfectly until SaLckLockWaiterCallbackT/6 or 9 are run.

Comment 6 Jan Friesse 2009-06-30 10:17:34 UTC
Ryan,
this remind me, same problem has SaLckLockWaiterCallbackT/7 (modified version from      506523).

Comment 7 Ryan O'Hara 2009-06-30 17:20:12 UTC
I don't understand. How do you know that the resource is not destroyed? Please provide some output from running the test or something equivalent. I see no problem with either of these tests.

Comment 8 Jan Friesse 2009-07-01 08:22:35 UTC
Please notice this:

*Problem is not in tests self but in this two test + saLckResourceOpen/19, saLckResourceOpenAsync/19, SaLckResourceOpenCallbckT/9.*

Example:
[root@node-06 ~]# aisexec
[root@node-06 ~]# ats-61/autotest/saftest/AIS-lock-B.01.01/src/operations/saLckResourceOpen/19.test
[DEBUG]: saLckInitialize
[DEBUG]: saLckResourceOpen
[root@node-06 ~]# cd ats-61/autotest/saftest/AIS-lock-B.01.01/src/operations/SaLckLockWaiterCallbackT/
[root@node-06 SaLckLockWaiterCallbackT]# ./9.test
[DEBUG]: saLckInitialize
[DEBUG]: saLckResourceOpen
[DEBUG]: saLckResourceLock
[DEBUG]: saLckInitialize
[DEBUG]: saLckResourceOpen
[DEBUG]: saLckResourceLock
[DEBUG]: saLckResourceUnlock
[DEBUG]: saLckResourceClose
[DEBUG]: saLckResourceUnlock
[DEBUG]: saLckResourceClose
[root@node-06 SaLckLockWaiterCallbackT]# cd ../saLckResourceOpen
[root@node-06 saLckResourceOpen]# ./19.test
[DEBUG]: saLckInitialize
[DEBUG]: saLckResourceOpen
  Does not conform the expected behaviors!
  saLckResourceOpen, Return value: SA_AIS_OK, should be SA_AIS_ERR_NOT_EXIST
[root@node-06 saLckResourceOpen]# killall corosync
[root@node-06 saLckResourceOpen]# aisexec
[root@node-06 saLckResourceOpen]# ./19.test
[DEBUG]: saLckInitialize
[DEBUG]: saLckResourceOpen
[root@node-06 saLckResourceOpen]#

Comment 9 Ryan O'Hara 2009-07-08 03:14:37 UTC
It appears that this problem is due to the an issue in saLckResourceClose, which will decrement the reference count for that resource and strip locks appropriately (ignoring the case of orphan locks). The resource is only removed when 1) the reference count is zero and 2) no granted locks exist on that resource.

Also note that when a process exits, lck_lib_exit_fn is called. This function also closes any resources that exist in the process' cleanup list (in private data) and will also decrement the reference count.

The problem is that when we close a resource via saLckResourceClose, we don't remove that resource from the cleanup list. The result is that when the process exits, we close the resource again and decrement the reference count again, which is bad. In the end, the resource doesn't get deleted as it should.

The solution is as simple as removing the resource from the cleanup list when saLckResourceClose is called. A very simple fix for a problem that was only apparent do to the fact that a PR lock (granted) was still present on a resource when another process closed the resource and exited cleanly.

Patch to follow.

Comment 10 Ryan O'Hara 2009-07-08 03:27:00 UTC
Created attachment 350889 [details]
Remove resource from cleanup list on close.

This should fix the problem.

Comment 11 Ryan O'Hara 2009-07-08 03:59:14 UTC
Closing this as fixed upstream.