Created attachment 348642 [details] 9.c Description of problem: This tests doesn't fall, but after finishing, doesn't destroy resource (both calls saLckResourceClose) Version-Release number of selected component (if applicable): Trunk How reproducible: Run tests Steps to Reproduce: 1. 2. 3. Actual results: Doesn't detroy resource Expected results: Destroy resource Additional info:
Created attachment 348643 [details] 9-fork.c
Created attachment 348644 [details] 6.c
Created attachment 348645 [details] 6-fork.c
Hozaf, Can you test these again with the latest lock service code in trunk? Both of these tests work for me. I might have fixed this problem with an previous patch. Let me know.
Ryan, no trunk has still this problem. Problem is not in tests self but in this two test + saLckResourceOpen/19, saLckResourceOpenAsync/19, SaLckResourceOpenCallbckT/9. This tests test, if we can return error value on opening uncreated resource. What we can. On clean start of corosync, saLckResourceOpen/19, ... works perfectly, and works perfectly until SaLckLockWaiterCallbackT/6 or 9 are run.
Ryan, this remind me, same problem has SaLckLockWaiterCallbackT/7 (modified version from 506523).
I don't understand. How do you know that the resource is not destroyed? Please provide some output from running the test or something equivalent. I see no problem with either of these tests.
Please notice this: *Problem is not in tests self but in this two test + saLckResourceOpen/19, saLckResourceOpenAsync/19, SaLckResourceOpenCallbckT/9.* Example: [root@node-06 ~]# aisexec [root@node-06 ~]# ats-61/autotest/saftest/AIS-lock-B.01.01/src/operations/saLckResourceOpen/19.test [DEBUG]: saLckInitialize [DEBUG]: saLckResourceOpen [root@node-06 ~]# cd ats-61/autotest/saftest/AIS-lock-B.01.01/src/operations/SaLckLockWaiterCallbackT/ [root@node-06 SaLckLockWaiterCallbackT]# ./9.test [DEBUG]: saLckInitialize [DEBUG]: saLckResourceOpen [DEBUG]: saLckResourceLock [DEBUG]: saLckInitialize [DEBUG]: saLckResourceOpen [DEBUG]: saLckResourceLock [DEBUG]: saLckResourceUnlock [DEBUG]: saLckResourceClose [DEBUG]: saLckResourceUnlock [DEBUG]: saLckResourceClose [root@node-06 SaLckLockWaiterCallbackT]# cd ../saLckResourceOpen [root@node-06 saLckResourceOpen]# ./19.test [DEBUG]: saLckInitialize [DEBUG]: saLckResourceOpen Does not conform the expected behaviors! saLckResourceOpen, Return value: SA_AIS_OK, should be SA_AIS_ERR_NOT_EXIST [root@node-06 saLckResourceOpen]# killall corosync [root@node-06 saLckResourceOpen]# aisexec [root@node-06 saLckResourceOpen]# ./19.test [DEBUG]: saLckInitialize [DEBUG]: saLckResourceOpen [root@node-06 saLckResourceOpen]#
It appears that this problem is due to the an issue in saLckResourceClose, which will decrement the reference count for that resource and strip locks appropriately (ignoring the case of orphan locks). The resource is only removed when 1) the reference count is zero and 2) no granted locks exist on that resource. Also note that when a process exits, lck_lib_exit_fn is called. This function also closes any resources that exist in the process' cleanup list (in private data) and will also decrement the reference count. The problem is that when we close a resource via saLckResourceClose, we don't remove that resource from the cleanup list. The result is that when the process exits, we close the resource again and decrement the reference count again, which is bad. In the end, the resource doesn't get deleted as it should. The solution is as simple as removing the resource from the cleanup list when saLckResourceClose is called. A very simple fix for a problem that was only apparent do to the fact that a PR lock (granted) was still present on a resource when another process closed the resource and exited cleanly. Patch to follow.
Created attachment 350889 [details] Remove resource from cleanup list on close. This should fix the problem.
Closing this as fixed upstream.