Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1610245

Summary: Deactivating cache pool sub volumes should fail if corigin is active
Product: Red Hat Enterprise Linux 7 Reporter: Roman Bednář <rbednar>
Component: lvm2Assignee: Zdenek Kabelac <zkabelac>
lvm2 sub component: Cache Logical Volumes QA Contact: cluster-qe <cluster-qe>
Status: CLOSED NOTABUG Docs Contact:
Severity: low    
Priority: unspecified CC: agk, cmarthal, heinzm, jbrassow, msnitzer, prajnoha, rbednar, zkabelac
Version: 7.6Keywords: Regression, TestBlocker
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-28 11:09:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
lvchange_vvvv none

Description Roman Bednář 2018-07-31 09:45:33 UTC
Created attachment 1471758 [details]
lvchange_vvvv

When attempting to deactivate a sub volume of cache pool while cache origin is active, the command should fail as it does with non clustered vg. Instead it exits with 0 and seems to do nothing.

So this is applicable to clustered setup only, local vg seems to behave as expected (shown in step 1).

Attaching -vvvv output of lvchange.


1) attempt cache sub deactivation on non-clustered vg (fails as expected):

[root@virt-365 ~]# lvchange -an cache_local/corigin_corig
  connect() failed on local socket: No such file or directory
  Internal cluster locking initialisation failed.
  WARNING: Falling back to local file-based locking.
  Volume Groups with the clustered attribute will be inaccessible.
  Device cache_local-corigin_corig (253:4) is used by another device.


2) switch vg to clustered, start cluster and try again:

[root@virt-365 ~]# vgchange -cy cache_local 
  connect() failed on local socket: No such file or directory
  Internal cluster locking initialisation failed.
  WARNING: Falling back to local file-based locking.
  Volume Groups with the clustered attribute will be inaccessible.
LVM cluster daemon (clvmd) is not running. Make volume group "cache_local" clustered anyway? [y/n]: y
  Volume group "cache_local" successfully changed

[root@virt-365 ~]# pcs cluster start --all

[root@virt-365 ~]# lvs -a cache_local
  LV                      VG          Attr       LSize  Pool              Origin          Data%  Meta%  Move Log Cpy%Sync Convert
  corigin                 cache_local Cwi-a-C---  4.00g [deactivate_pool] [corigin_corig] 0.00   3.45            0.00            
  [corigin_corig]         cache_local owi-aoC---  4.00g                                                                          
  [deactivate_pool]       cache_local Cwi---C---  2.00g                                   0.00   3.45            0.00            
  [deactivate_pool_cdata] cache_local Cwi-ao----  2.00g                                                                          
  [deactivate_pool_cmeta] cache_local ewi-ao---- 12.00m                                                                          
  [lvol0_pmspare]         cache_local ewi------- 12.00m0m                                                                          


3) deactivate command gives no output, exits with 0 and actually does nothing:
[root@virt-365 ~]# lvchange -an cache_local/corigin_corig

[root@virt-365 ~]# echo $?
0

[root@virt-365 ~]# lvs -a cache_local
  LV                      VG          Attr       LSize  Pool              Origin          Data%  Meta%  Move Log Cpy%Sync Convert
  corigin                 cache_local Cwi-a-C---  4.00g [deactivate_pool] [corigin_corig] 0.00   3.45            0.00            
  [corigin_corig]         cache_local owi-aoC---  4.00g                                                                          
  [deactivate_pool]       cache_local Cwi---C---  2.00g                                   0.00   3.45            0.00            
  [deactivate_pool_cdata] cache_local Cwi-ao----  2.00g                                                                          
  [deactivate_pool_cmeta] cache_local ewi-ao---- 12.00m                                                                          
  [lvol0_pmspare]         cache_local ewi------- 12.00m  


[root@virt-365 ~]# pcs resource 
 Clone Set: dlm-clone [dlm]
     Started: [ virt-365 virt-366 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ virt-365 virt-366 ]



lvm2-2.02.180-1.el7.x86_64

============================================================
There's a test for this scenario as well, putting it here for completeness:



SCENARIO - [create_cache_then_deactivate_pool]
Create cache, then attempt to deactivate pool volume

*** Cache info for this scenario ***
*  origin (slow):  /dev/sdd1
*  pool (fast):    /dev/sda1
************************************

Adding "slow" and "fast" tags to corresponding pvs
Create origin (slow) volume
lvcreate --wipesignatures y --activate ey -L 4G -n corigin cache_sanity @slow

Create cache data and cache metadata (fast) volumes
lvcreate --activate ey -L 2G -n deactivate_pool cache_sanity @fast
lvcreate --activate ey -L 12M -n deactivate_pool_meta cache_sanity @fast

Create cache pool volume by combining the cache data and cache metadata (fast) volumes with policy: cleaner  mode: writethrough
lvconvert --yes --type cache-pool --cachepolicy cleaner --cachemode writethrough -c 64 --poolmetadata cache_sanity/deactivate_pool_meta cache_sanity/deactivate_pool
  WARNING: Converting cache_sanity/deactivate_pool and cache_sanity/deactivate_pool_meta to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Create cached volume by combining the cache pool (fast) and origin (slow) volumes
lvconvert --yes --type cache --cachemetadataformat 1 --cachepool cache_sanity/deactivate_pool cache_sanity/corigin

Attempting to deactivate cache pool sub volumes with an active corigin
Deactivating volume: corigin_corig
should not have been able to deactivate cache pool

Comment 3 Zdenek Kabelac 2018-07-31 12:29:47 UTC
AFAIK there is no bug when we talk about  clustered locking.

With clustered locking ONLY top-level devices do take a lock.

So in this case - if "corigin" LV has been activated - it takes lock (with UUID used as resource name and can be validated inside  dlm  sysfs entries).

However there are no cluster locks taken for  _cdata, _cmeta, _corigin LVs - so if someone runs 'lvchange -an xxx_corigin'  - the lock state is empty - thus LV appears to be inactivate and lvchange will successfully exit.

This is different from the file-locking state - where every active device basically appears like a lock holder - thus in file-locking mode we can report LV in use.

So I'll likely close this one as it's working as designed - unless I'm missing something else in this BZ ?

Comment 4 Roman Bednář 2018-08-01 12:56:39 UTC
> so if someone runs 'lvchange -an xxx_corigin'  - the lock state is empty -
> thus LV appears to be inactivate and lvchange will successfully exit.

This is the part which made me file this bz actually. From lvs output this lv appears to be active actually:

[corigin_corig]         cache_local owi-aoC---

lvchange seems to behave well in a sense it does not do anything an returns 0 if the volume is already inactive, but in this case we're running deactivation on lv that looks active and get the same result. Based on the info provided I get the impression that it might really be a bug:

> However there are no cluster locks taken for  _cdata, _cmeta, _corigin LVs - so if
> someone runs 'lvchange -an xxx_corigin'  - the lock state is empty - thus LV appears
> to be inactivate and lvchange will successfully exit.

So with cluster locking the sub volumes/hidden lvs have empty lock == are really supposed to be inactive


> This is different from the file-locking state - where every active device basically 
> appears like a lock holder - thus in file-locking mode we can report LV in use.

and active == locked for file-locking, with the statement above it implies that lock == active and no lock == inactive for both cluster and file locking. Or is this assumption wrong?


If the above is not correct then what's the relation between activation state and lock state then? Can user even see the lock state? Cause this is quite obscure, since it sounds like the lv can be active *with* a lock (file-locking) or active *without* a lock (cluster-locking). In the first case lvchange sees the lock and refuses to deactivate the volume in the second lvchange ignores the active state completely and silently exits with no message.



Now a bit of history: I did some digging and this scenario was there to check if thinpool can not be deactivated if origin is active. However based on BZ #1592491 it's been changed to try deactivating the cache pool sub volumes ("corigin_corig", "deactivate_pool_cdata", "deactivate_pool_cmeta" in our case). I'm not exactly sure why.

Comment 5 Zdenek Kabelac 2018-08-01 13:26:56 UTC
(In reply to Roman Bednář from comment #4)
> > so if someone runs 'lvchange -an xxx_corigin'  - the lock state is empty -
> > thus LV appears to be inactivate and lvchange will successfully exit.
> 
> This is the part which made me file this bz actually. From lvs output this
> lv appears to be active actually:
> 
> [corigin_corig]         cache_local owi-aoC---

So here the confusion can be cause by the fact that ONLY IF you happen to be on a host where 'particular' DM device exists in the table - we do show values about being opened and having some other device characteristics.

So yeah - while 'activate/deactivation' is driven by the state of top-level locking - the actual individual device state is always printed by real sneak&peak to dm table - if device is there - it's status is checked and information are printed - even through there is  'no lock' held by any individual subLV.

> If the above is not correct then what's the relation between activation
> state and lock state then? Can user even see the lock state? Cause this is

Cluster locking is simply different from file locking.

> quite obscure, since it sounds like the lv can be active *with* a lock
> (file-locking) or active *without* a lock (cluster-locking). In the first

Design decision is to only grab  locks for 'top-level' devices in cluster.
With file-locking there is no need to protect agaist activation from other nodes as you can always quickly check state of local  dm table - thus file-locking is significantly more simpler.

In short - with file-locking - you instantly see if any LV device is active - with  cluster-locking - we only grab locks for top-level devices - this makes it simpler in some case - in some others it's more complex - but in general taking less locks is usually better.

> case lvchange sees the lock and refuses to deactivate the volume in the
> second lvchange ignores the active state completely and silently exits with
> no message.

You need to see it in a way -   'subLV' aka 'component activate' support is designed and work ONLY locally as  an AID for 'recovery'.
It's never ever meant to be used as some 'normal'

So users are NOT supposed to manually deactivate subLVs and even play with them - unless the are resolving locally some disaster case.

Component activation support just makes sure that if LVs are active in parallel on other nodes, you shall not be able to activate them locally. But if the 'top-level' LV is active and keeps 'top-level-lock' on the node - and you try to deactivate locally some subLV which you do not have activate locally - you get the right answer that local deactivation  has been successful - as in terms of LOCKING - there has been no lock released.

This is an optimization to avoid always validation all possibly top-level locks with every command -  since as said - the use case as sort of 'service-mode' operation - and we do not want to pay the price of major slowdown with regular workflow (since yes working with locks IS expensive).



> Now a bit of history: I did some digging and this scenario was there to
> check if thinpool can not be deactivated if origin is active. However based
> on BZ #1592491 it's been changed to try deactivating the cache pool sub
> volumes ("corigin_corig", "deactivate_pool_cdata", "deactivate_pool_cmeta"
> in our case). I'm not exactly sure why.

I do assume it's some extended testing checking whether 'component activation' works reasonably well to not leave system with inconsistent state.

Comment 6 Zdenek Kabelac 2018-08-01 13:32:57 UTC
I also probably should note that support of component activate is not meant to be used as a 'vehicle' to manually activate & deactivates components to make available or unavailable user's top-level devices.

Active 'component' device usually makes top-level device inaccessible
as well as it's impossible to exclusively activate 'subLV' while top-level device is already active.

Primary use-case  (and in fact the only major one) is to simplify access to component devices of i.e. failing thin-pool. 
User can then easily activate & grab copy of _tmeta or cmeta  device.

Comment 7 Corey Marthaler 2018-08-01 19:36:39 UTC
Did something change since 7.5? Certainly these deactivate attempts returned non zero in 7.5 and earlier? This same test case works fine (ie returns non zero) in single machine mode and in cluster lvmlockd mode. Also, this test case seems to come up every other release with a new behavior change. :( First bug 1108380, then bug 1592491, and now this one.

If this is no longer a valid test scenario, then we'll take it out and stop attempting this and then the behaviors can continue to change going forward and not cause test failures.



## SINGLE MACHINE

SCENARIO - [create_cache_then_deactivate_pool]
Create cache, then attempt to deactivate pool volume

*** Cache info for this scenario ***
*  origin (slow):  /dev/sdh1
*  pool (fast):    /dev/sdd1
************************************

Adding "slow" and "fast" tags to corresponding pvs
Create origin (slow) volume
lvcreate --wipesignatures y  -L 4G -n corigin cache_sanity @slow

Create cache data and cache metadata (fast) volumes
lvcreate  -L 2G -n deactivate_pool cache_sanity @fast
lvcreate  -L 12M -n deactivate_pool_meta cache_sanity @fast

Create cache pool volume by combining the cache data and cache metadata (fast) volumes with policy: smq  mode: writethrough
lvconvert --yes --type cache-pool --cachepolicy smq --cachemode writethrough -c 64 --poolmetadata cache_sanity/deactivate_pool_meta cache_sanity/deactivate_pool
  WARNING: Converting cache_sanity/deactivate_pool and cache_sanity/deactivate_pool_meta to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Create cached volume by combining the cache pool (fast) and origin (slow) volumes
lvconvert --yes --type cache --cachemetadataformat 2 --cachepool cache_sanity/deactivate_pool cache_sanity/corigin

Attempting to deactivate cache pool sub volumes with an active corigin
Deactivating volume: corigin_corig
  Device cache_sanity-corigin_corig (253:9) is used by another device.
unable to deactivate corigin_corig volume
Deactivating volume: deactivate_pool_cdata
  Device cache_sanity-deactivate_pool_cdata (253:7) is used by another device.
unable to deactivate deactivate_pool_cdata volume
Deactivating volume: deactivate_pool_cmeta
  Device cache_sanity-deactivate_pool_cmeta (253:8) is used by another device.
unable to deactivate deactivate_pool_cmeta volume

Separating cache pool (lvconvert --splitcache) cache_sanity/corigin from cache origin
Removing cache pool cache_sanity/deactivate_pool
Removing cache origin volume cache_sanity/corigin
lvremove -f /dev/cache_sanity/corigin


## LVM LOCKD

SCENARIO - [create_cache_then_deactivate_pool]
Create cache, then attempt to deactivate pool volume

*** Cache info for this scenario ***
*  origin (slow):  /dev/mapper/mpatha1
*  pool (fast):    /dev/mapper/mpathd1
************************************

Adding "slow" and "fast" tags to corresponding pvs
Create origin (slow) volume
lvcreate --wipesignatures y --activate ey -L 4G -n corigin cache_sanity @slow

Create cache data and cache metadata (fast) volumes
lvcreate --activate ey -L 2G -n deactivate_pool cache_sanity @fast
lvcreate --activate ey -L 12M -n deactivate_pool_meta cache_sanity @fast

Create cache pool volume by combining the cache data and cache metadata (fast) volumes with policy: cleaner  mode: writethrough
lvconvert --yes --type cache-pool --cachepolicy cleaner --cachemode writethrough -c 32 --poolmetadata cache_sanity/deactivate_pool_meta cache_sanity/deactivate_pool
  WARNING: Converting cache_sanity/deactivate_pool and cache_sanity/deactivate_pool_meta to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Create cached volume by combining the cache pool (fast) and origin (slow) volumes
lvconvert --yes --type cache --cachemetadataformat 2 --cachepool cache_sanity/deactivate_pool cache_sanity/corigin

Attempting to deactivate cache pool sub volumes with an active corigin
Deactivating volume: corigin_corig
  Device cache_sanity-corigin_corig (253:22) is used by another device.
unable to deactivate corigin_corig volume
Deactivating volume: deactivate_pool_cdata
  Device cache_sanity-deactivate_pool_cdata (253:20) is used by another device.
unable to deactivate deactivate_pool_cdata volume
Deactivating volume: deactivate_pool_cmeta
  Device cache_sanity-deactivate_pool_cmeta (253:21) is used by another device.
unable to deactivate deactivate_pool_cmeta volume

Uncaching cache origin (lvconvert --uncache) cache_sanity/corigin from cache origin
Removing cache origin volume cache_sanity/corigin
lvremove -f /dev/cache_sanity/corigin



## CLVMD

SCENARIO - [create_cache_then_deactivate_pool]
Create cache, then attempt to deactivate pool volume

*** Cache info for this scenario ***
*  origin (slow):  /dev/mapper/mpathb1
*  pool (fast):    /dev/mapper/mpathd1
************************************

Adding "slow" and "fast" tags to corresponding pvs
Create origin (slow) volume
lvcreate --wipesignatures y --activate ey -L 4G -n corigin cache_sanity @slow

Create cache data and cache metadata (fast) volumes
lvcreate --activate ey -L 2G -n deactivate_pool cache_sanity @fast
lvcreate --activate ey -L 12M -n deactivate_pool_meta cache_sanity @fast

Create cache pool volume by combining the cache data and cache metadata (fast) volumes with policy: smq  mode: writeback
lvconvert --yes --type cache-pool --cachepolicy smq --cachemode writeback -c 64 --poolmetadata cache_sanity/deactivate_pool_meta cache_sanity/deactivate_pool
  WARNING: Converting cache_sanity/deactivate_pool and cache_sanity/deactivate_pool_meta to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Create cached volume by combining the cache pool (fast) and origin (slow) volumes
lvconvert --yes --type cache --cachemetadataformat 2 --cachepool cache_sanity/deactivate_pool cache_sanity/corigin

Attempting to deactivate cache pool sub volumes with an active corigin
Deactivating volume: corigin_corig
should not have been able to deactivate cache pool

### NOTE: like discussed above I believe, it's not actually being deactivated, just reporting that the deactivate worked, which presumabily it used to not do in 7.5.

[root@mckinley-03 ~]# lvchange -an cache_sanity/deactivate_pool_cdata
[root@mckinley-03 ~]# lvchange -an cache_sanity/deactivate_pool_cmeta
# all three are still active
[root@mckinley-03 ~]# lvs -a -o +devices
  LV                      VG               Attr       LSize   Pool              Origin          Data%  Meta%  Move Log Cpy%Sync Convert Devices                 
  corigin                 cache_sanity     Cwi-a-C---   4.00g [deactivate_pool] [corigin_corig] 0.03   2.41            0.00             corigin_corig(0)        
  [corigin_corig]         cache_sanity     owi-aoC---   4.00g                                                                           /dev/mapper/mpathb1(0)  
  [deactivate_pool]       cache_sanity     Cwi---C---   2.00g                                   0.03   2.41            0.00             deactivate_pool_cdata(0)
  [deactivate_pool_cdata] cache_sanity     Cwi-ao----   2.00g                                                                           /dev/mapper/mpathd1(0)  
  [deactivate_pool_cmeta] cache_sanity     ewi-ao----  12.00m                                                                           /dev/mapper/mpathd1(512)
  [lvol0_pmspare]         cache_sanity     ewi-------  12.00m                                                                           /dev/mapper/mpatha1(0)  
[root@mckinley-03 ~]# lvchange -an cache_sanity/deactivate_pool_cmeta
[root@mckinley-03 ~]# echo $?
0