Bug 1374786
| Summary: | 'R VGLK lock_san acquire error' potential errors when attempting shared volume configurations | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Corey Marthaler <cmarthal> |
| Component: | lvm2 | Assignee: | David Teigland <teigland> |
| lvm2 sub component: | LVM lock daemon / lvmlockd | QA Contact: | cluster-qe <cluster-qe> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | low | ||
| Priority: | unspecified | CC: | agk, heinzm, jbrassow, mcsontos, prajnoha, rbednar, teigland, zkabelac |
| Version: | 7.3 | Keywords: | Reopened |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | lvm2-2.02.176-5.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-04-10 15:18:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Corey Marthaler
2016-09-09 15:42:04 UTC
I think dd was hitting the disk with enough io to delay the reads/writes acquiring the lock beyond the 10 sec io timeout. The io timeout caused the lock request to fail with a timeout error. lvmlockd should probably recognized when a lock request fails due to an io timeout and just retry it. I've seen -212 a few times as well. I'm assuming this is the exact same issue. Oct 20 11:03:40 host-083 qarshd[15762]: Running cmdline: pvs --noheadings -o pv_name,pv_uuid,vg_name Oct 20 11:03:40 host-083 sanlock[14324]: 2016-10-20 11:03:40-0500 85461 [14330]: r4 ballot 6 larger2 lver[1943] 7 dblock 6 Oct 20 11:03:40 host-083 sanlock[14324]: 2016-10-20 11:03:40-0500 85461 [14330]: r4 ballot 6 larger2 mbal[1943] 1944 dblock 3536 Oct 20 11:03:40 host-083 sanlock[14324]: 2016-10-20 11:03:40-0500 85461 [14330]: r4 ballot 6 larger2 inp[1943] 0 0 0 dblock 1536 1 85461 Oct 20 11:03:40 host-083 sanlock[14324]: 2016-10-20 11:03:40-0500 85461 [14330]: r4 paxos_acquire 6 ballot error -212 Oct 20 11:03:40 host-083 sanlock[14324]: 2016-10-20 11:03:40-0500 85461 [14330]: r4 acquire_token disk error -212 Oct 20 11:03:40 host-083 sanlock[14324]: 2016-10-20 11:03:40-0500 85461 [14330]: r4 cmd_acquire 2,9,14335 acquire_token -212 Oct 20 11:03:40 host-083 lvmlockd[14335]: 1476979420 S lvm_global R VGLK lock_san acquire error -212 3.10.0-513.el7.x86_64 lvm2-2.02.166-1.el7 BUILT: Wed Sep 28 02:26:52 CDT 2016 lvm2-libs-2.02.166-1.el7 BUILT: Wed Sep 28 02:26:52 CDT 2016 lvm2-cluster-2.02.166-1.el7 BUILT: Wed Sep 28 02:26:52 CDT 2016 device-mapper-1.02.135-1.el7 BUILT: Wed Sep 28 02:26:52 CDT 2016 device-mapper-libs-1.02.135-1.el7 BUILT: Wed Sep 28 02:26:52 CDT 2016 device-mapper-event-1.02.135-1.el7 BUILT: Wed Sep 28 02:26:52 CDT 2016 device-mapper-event-libs-1.02.135-1.el7 BUILT: Wed Sep 28 02:26:52 CDT 2016 device-mapper-persistent-data-0.6.3-1.el7 BUILT: Fri Jul 22 05:29:13 CDT 2016 cmirror-2.02.166-1.el7 BUILT: Wed Sep 28 02:26:52 CDT 2016 sanlock-3.4.0-1.el7 BUILT: Fri Jun 10 11:41:03 CDT 2016 sanlock-lib-3.4.0-1.el7 BUILT: Fri Jun 10 11:41:03 CDT 2016 lvm2-lockd-2.02.166-1.el7 BUILT: Wed Sep 28 02:26:52 CDT 2016 pushing this out to the next release Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. bugzilla is absolutely idiotic https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=e52d2e3bd86a006bcc322649ac9cb7c52117b787 There are a number of sanlock errors which mean the lock couldn't be acquired, but the caller can retry. The two errors above (-202 and -212) have been added to this set. -202: there were i/o timeouts while trying to acquire the lock. -212: there was (an unusal) sort of lock contention and another host holds the lock. All of the failures that can be retried will often be masked when a retry is successful, but if the condition is persistent enough, the retry limit (configurable) is reached, and the error is reported by the command, as seen above. Testing that produces any sort of lock contention and retries is sufficient here. Reproducing those two specific errors would not be possible and isn't necessary. Also, those error numbers won't be reported to the command any longer (they were sanlock-specific), instead the command should report lvm-specific error number -221 which represents any lock manager error that can't be handled. Marking verified with the latest rpms. Saw no errors after 100 iterations. 3.10.0-826.el7.x86_64 lvm2-2.02.176-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 lvm2-libs-2.02.176-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 lvm2-cluster-2.02.176-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 lvm2-lockd-2.02.176-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 lvm2-python-boom-0.8.1-5.el7 BUILT: Wed Dec 6 04:15:40 CST 2017 cmirror-2.02.176-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 device-mapper-1.02.145-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 device-mapper-libs-1.02.145-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 device-mapper-event-1.02.145-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 device-mapper-event-libs-1.02.145-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 device-mapper-persistent-data-0.7.3-3.el7 BUILT: Tue Nov 14 05:07:18 CST 2017 sanlock-3.6.0-1.el7 BUILT: Tue Dec 5 11:47:21 CST 2017 sanlock-lib-3.6.0-1.el7 BUILT: Tue Dec 5 11:47:21 CST 2017 ============================================================ Iteration 100 of 100 started at Fri Jan 5 17:31:36 CST 2018 ============================================================ SCENARIO (raid1) - [open_EXT4_on_raid_fsadm_resize_attempt] Create raid, add fs, and then attempt to resize it while it's mounted mckinley-03: lvcreate --activate ey --nosync --type raid1 -m 1 -n open_fsadm_resize -L 4G raid_sanity WARNING: New raid1 won't be synchronised. Don't read what you didn't write! Placing an EXT4 on open_fsadm_resize volume mke2fs 1.42.9 (28-Dec-2013) Starting dd io to raid fs to be resized Attempt to resize the open raid filesystem multiple times with lvextend/fsadm on mckinley-03 (lvextend -L +2G -r /dev/raid_sanity/open_fsadm_resize) resize2fs 1.42.9 (28-Dec-2013) (lvextend -L +2G -r /dev/raid_sanity/open_fsadm_resize) resize2fs 1.42.9 (28-Dec-2013) (lvextend -L +2G -r /dev/raid_sanity/open_fsadm_resize) resize2fs 1.42.9 (28-Dec-2013) (lvextend -L +2G -r /dev/raid_sanity/open_fsadm_resize) resize2fs 1.42.9 (28-Dec-2013) (lvextend -L +2G -r /dev/raid_sanity/open_fsadm_resize) resize2fs 1.42.9 (28-Dec-2013) (lvextend -L +2G -r /dev/raid_sanity/open_fsadm_resize) resize2fs 1.42.9 (28-Dec-2013) (lvextend -L +2G -r /dev/raid_sanity/open_fsadm_resize) resize2fs 1.42.9 (28-Dec-2013) (lvextend -L +2G -r /dev/raid_sanity/open_fsadm_resize) resize2fs 1.42.9 (28-Dec-2013) (lvextend -L +2G -r /dev/raid_sanity/open_fsadm_resize) resize2fs 1.42.9 (28-Dec-2013) (lvextend -L +2G -r /dev/raid_sanity/open_fsadm_resize) resize2fs 1.42.9 (28-Dec-2013) perform raid scrubbing (lvchange --syncaction repair) on raid raid_sanity/open_fsadm_resize Deactivating raid open_fsadm_resize... and removing Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0853 |