Bug 213754
Summary: | volume group can be reported as inconsistent while displaying from one node and deleting from another | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> | ||||||||
Component: | cmirror | Assignee: | Jonathan Earl Brassow <jbrassow> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> | ||||||||
Severity: | low | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 4 | CC: | agk, ccaulfie, cfeist, dwysocha, mbroz | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2008-08-05 21:37:58 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Corey Marthaler
2006-11-02 20:54:11 UTC
This can happen in a cluster when using clvmd, but operating on a NON-CLUSTERED vg. IOW, single machine mirroring is employed, but clvmd is the method of activation. Will try just single machine next. Haven't seen it when bypassing clvmd. Insufficient info here about what was done, but if VG 'vg' is non-clustered, it should not be accessible across a cluster. If it's visible on more than one machine and not marked CLUSTERED then expect problems. I've isolated the code to here: lib/metadata/metadata.c:_vg_read /* FIXME Also ensure contents same - checksum compare? */ if (correct_vg->seqno != vg->seqno) { inconsistent = 1; if (vg->seqno > correct_vg->seqno) correct_vg = vg; } I don't know why the sequence numbers would be different if we are holding a lock... we are, aren't we? Adding a simple print, we get: correct_vg->seqno(805) != vg->seqno(806) Volume group "vg" inconsistent Created attachment 142406 [details]
Output of a failed 'lvs'
This 'lvs -vvvv' gives the inconsistent error
Created attachment 142418 [details]
Metadata at seqno 3315
Created attachment 142419 [details]
Metadata at seqno 3316
These are the two metadata sets that were committed, nothing seems wrong with
them.
I took out the partitions and used the whole device for the PVs. I still get the "inconsistent" message, but it prints things out properly. Now I can remember if this is equivalent to what I was seeing before... Volume group "vg" inconsistent LV VG Attr LSize Origin Snap% Move Log Copy% LogVol00 VolGroup00 -wi-ao 36.62G LogVol01 VolGroup00 -wi-ao 512.00M lv vg mwi-a- 5.00G lv_mlog 1.80 In any case, it is still present. Here are the locking operations made during 'create/change/delete': CREATING: Performing lock operation on V_vg: LCK_WRITE/VG (0x4) [3288] Performing lock operation on vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYNpgq8hAq2sB2Y0ycqavEyuEFPy8KRAeU: LCK_READ/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x99) [3288] Performing lock operation on vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYNpgq8hAq2sB2Y0ycqavEyuEFPy8KRAeU: <UNKNOWN>/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x98) [3288] Performing lock operation on vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYNpgq8hAq2sB2Y0ycqavEyuEFPy8KRAeU: LCK_UNLOCK/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x9e) [3288] Performing lock operation on vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx: LCK_READ/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x99) [3288] Logical volume "lv" created Performing lock operation on V_vg: LCK_UNLOCK/VG (0x6) [3288] CHANGING: Performing lock operation on V_vg: LCK_WRITE/VG (0x4) [3359] Performing lock operation on vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx: <UNKNOWN>/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x98) [3359] Performing lock operation on vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx: LCK_UNLOCK/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x9e) [3359] Performing lock operation on V_vg: LCK_UNLOCK/VG (0x6) [3359] REMOVING: Performing lock operation on V_vg: LCK_WRITE/VG (0x4) [3405] Performing lock operation on vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx: LCK_EXCL/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x9d) [3405] Performing lock operation on vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx: <UNKNOWN>/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x98) [3405] Performing lock operation on vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx: LCK_UNLOCK/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x9e) [3405] Logical volume "lv" successfully removed Performing lock operation on V_vg: LCK_UNLOCK/VG (0x6) [3405] And here are the lock ops for an 'lvs': #cluster_locking.c:435 Performing lock operation on V_VolGroup00: LCK_READ/VG (0x1) [31631] #cluster_locking.c:435 Performing lock operation on V_VolGroup00: LCK_UNLOCK/VG (0x6) [31631] #cluster_locking.c:435 Performing lock operation on V_vg: LCK_READ/VG (0x1) [31631] #cluster_locking.c:435 Performing lock operation on V_vg: LCK_UNLOCK/VG (0x6) [31631] *** #toollib.c:348 Volume group "vg" inconsistent *** #cluster_locking.c:435 Performing lock operation on V_vg: LCK_WRITE/VG (0x4) [31631] #cluster_locking.c:435 Performing lock operation on V_vg: LCK_UNLOCK/VG (0x6) [31631] Doesn't matter if O_DIRECT_SUPPORT is not defined (unused). Interesting... I run lvs in ddd (aka gdb) and simply stop right after I aquire the VG read lock. This should have the effect of stopping the create/change/delete loop on the other machine, but it doesn't... it just keeps going. locking problem in lvm vg locks should have been PR not CR Have not seen inconsistent errors while executing the test case in comment #0. Marking verified. Fixed in current release (4.7). |