Bug 2139737

Summary: found sh_count 4294967295 in lvmlockd log
Product: [Community] LVM and device-mapper Reporter: shan.wu <shan.wu>
Component: lvm2Assignee: David Teigland <teigland>
lvm2 sub component: lvmlockd QA Contact: cluster-qe <cluster-qe>
Status: NEW --- Docs Contact:
Severity: high    
Priority: unspecified CC: agk, heinzm, jbrassow, prajnoha, teigland, zkabelac
Version: 2.02.180Flags: pm-rhel: lvm-technical-solution?
pm-rhel: lvm-test-coverage?
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description shan.wu 2022-11-03 10:18:27 UTC

Comment 1 shan.wu 2022-11-03 10:21:43 UTC
found sh_count 4294967295 in lvmlockd log,this causes vgs execution to lock but not unlock。
I understand that there is a thread safety problem in the code: sh_ count++?

Comment 2 David Teigland 2022-11-03 14:31:14 UTC
I don't see any obvious problem related to sh_count.  If you can provide more information it might help us find or reproduce the problem, e.g. the full lvmlockd log, a description of commands using the shared VG, the frequency and concurrency of lvm commands, an lvm command with full debugging like 'pvs -vvvv'.

Comment 3 shan.wu 2022-11-04 05:35:28 UTC
(In reply to David Teigland from comment #2)
> I don't see any obvious problem related to sh_count.  If you can provide
> more information it might help us find or reproduce the problem, e.g. the
> full lvmlockd log, a description of commands using the shared VG, the
> frequency and concurrency of lvm commands, an lvm command with full
> debugging like 'pvs -vvvv'.

I tried to execute vgs - vvvv, but I didn't see any error information, but lvmlockd didn't release the shared lock of vg. Then I saw sh_count 4294967295 in the log of lvmlockd. According to my observation, the value of this variable is the total number of shared locks that multiple threads of a client have added to a resource. For example, a client has five threads that have added shared locks to the same vg. The value of sh_count should be 5, If concurrency results in a final value of 4, and the five threads finally release the shared lock of the vg(which will be reduced by 5), the value will become 4294967295. The next time the vgs is executed, the bug will appear

Comment 4 shan.wu 2022-11-04 05:38:37 UTC
So I want to confirm whether there is thread safety problem with sh_count++ in the code?

Comment 5 shan.wu 2022-11-04 06:08:46 UTC
The log of lvmlockd executing vgs is as follows:

1vmlockd: 1666859132 recv vgs[19817] cl 834 lock vg “926af2flacac46a4b298ebe1b4122f08' mode sh flags 0
1vmlockd: 166859132 S lvm_926af2f1acac46a4b298ebeib4122f08 R VGLK action lock sh
1vmlockd:166859132 S 1vm 926af2flacac46a4b298ebe1b4122f08 R VGLK res_lock cl 834 mode sh
1wmlockd:16659132 5 1unm M2iaf f1acaAbia4k20tbelb412268 R ViLK 1ock san sh at /dev/mapper/926af2flacac46a4b298ebe1b4122f08-lvmlock:69206016
1mlockd:166659132 S 1wm 92baf2flacac46a4b298ebelb4122f108 R VGLK res _lock rv 8 read vb 1010 7847
lvmlockd: 1666859132 send vgs[19817] cl 834 lock vg rv 0
1vmlockd: 1666859132 recv vgs[ 19817] cl 834 lock vg “926af2flacac46a4b298ebe1b4122f08' mode un flags 0
1vmlockd: 1666859132 S lvm_926af2f1acac46a4b298ebe1b4122f08 R VGLK action lock un
1vmlockd: 1666859132 S 1vm 926af2flacac46a4b298ebe1b4122f08 R VGLK res unlock cl 834
1vmlockd: 1666859132 S 1vm _926af2f1lacac46a4b298ebe1b4122f68 R VGLK res_unlock sh_count 4294967295
lvmlockd: 1666859132 send vgs[19817] cl 834 lock vg rv 0
lvmlockd: 1666859132 close vgs[19817] cl 834 fd 9

Comment 6 David Teigland 2022-11-04 14:32:44 UTC
I don't see a thread safety issue.  r->sh_count is only used by res_lock(), res_convert(), res_unlock().  These are only called by res_process() which is only called from the lockspace_thread.  There is one lockspace_thread for each lockspace/VG.  The sh_count value is increased when an lvm command requests shared lock, e.g. to read the VG, and then decreases when the command unlocks it or exits.  I suspect the problem is due to a missing unlock from an lvm command.  One area where we have problems like this is dmeventd where lvm commands are not run normally.  What is the lvm version here?  If it's not recent please try a recent version since it's possible this has been fixed.

Comment 7 shan.wu 2022-11-05 11:59:46 UTC
(In reply to David Teigland from comment #6)
> I don't see a thread safety issue.  r->sh_count is only used by res_lock(),
> res_convert(), res_unlock().  These are only called by res_process() which
> is only called from the lockspace_thread.  There is one lockspace_thread for
> each lockspace/VG.  The sh_count value is increased when an lvm command
> requests shared lock, e.g. to read the VG, and then decreases when the
> command unlocks it or exits.  I suspect the problem is due to a missing
> unlock from an lvm command.  One area where we have problems like this is
> dmeventd where lvm commands are not run normally.  What is the lvm version
> here?  If it's not recent please try a recent version since it's possible
> this has been fixed.

The lvm version is 2.02 180.Is it because the lvm command does not increase the r->sh_count by 1? I have found this phenomenon on multiple physical machines for the same vg, so it may not be a thread safety problem. The final result is that the vg is always locked with a shared lock, lvcreate or lvextend will fail, and I must restart lvmlockd and sanlock.
When this problem occurs, I execute vgs - vvvv, but I see no exception information. How can I find the root cause of this problem?

Comment 8 David Teigland 2022-11-07 14:33:41 UTC
2.02.180 is very old, so it's very likely this has been fixed already.  Please try a recent lvm release.

Comment 9 shan.wu 2022-11-16 02:24:48 UTC
(In reply to David Teigland from comment #8)
> 2.02.180 is very old, so it's very likely this has been fixed already. 
> Please try a recent lvm release.

Without upgrading the kernel, I tried to update the lvm to lvm2.03, and then installed it from the scpm. But I found that the thin-provisioning-tools and libcorosync-devel dependencies could not be resolved. So I want to ask how I can compile lvm2.03 in the current Centos7?

Comment 10 David Teigland 2022-11-16 16:29:46 UTC
> Without upgrading the kernel, I tried to update the lvm to lvm2.03, and then
> installed it from the scpm. But I found that the thin-provisioning-tools and
> libcorosync-devel dependencies could not be resolved. So I want to ask how I
> can compile lvm2.03 in the current Centos7?

You will need to compile lvm2 from source, and enable/disable some configure options.  I've never done this so I don't know which configure options to use, but I think it should be possible.