This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1473201 - validator mismatch (old=index vs new=sm_bitmap) error and switching thin pool lvm to read-only mode
validator mismatch (old=index vs new=sm_bitmap) error and switching thin pool...
Status: NEW
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2 (Show other bugs)
7.3
x86_64 Linux
unspecified Severity unspecified
: rc
: ---
Assigned To: LVM and device-mapper development team
cluster-qe@redhat.com
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-20 04:33 EDT by Eugene
Modified: 2017-08-17 14:31 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/var/log/messages (265.32 KB, text/plain)
2017-07-20 04:33 EDT, Eugene
no flags Details

  None (edit)
Description Eugene 2017-07-20 04:33:01 EDT
Created attachment 1301596 [details]
/var/log/messages

Hello,

A few days ago thin pool was corrupted on one of the servers and it is caused corruption of FS on VMs:
------------------------- 
Jul 15 05:39:38 compute-02 kernel: device-mapper: block manager: validator mismatch (old=index vs new=sm_bitmap) for block 2521
Jul 15 05:39:38 compute-02 kernel: device-mapper: space map common: dm_tm_shadow_block() failed
Jul 15 05:39:38 compute-02 kernel: device-mapper: space map common: unable to decrement a reference count below 0
Jul 15 05:39:38 compute-02 kernel: device-mapper: thin: 253:9: metadata operation 'dm_pool_commit_metadata' failed: error = -22
Jul 15 05:39:38 compute-02 kernel: device-mapper: thin: 253:9: aborting current metadata transaction
Jul 15 05:39:38 compute-02 kernel: device-mapper: thin: 253:9: switching pool to read-only mode
Jul 15 05:39:38 compute-02 kernel: device-mapper: thin: 253:9: metadata operation 'dm_pool_commit_metadata' failed: error = -1
Jul 15 05:39:38 compute-02 kernel: device-mapper: thin: 253:9: aborting current metadata transaction
-------------------------
[#]> dmsetup status | grep cinder--volumes--hdd-cinder--volumes--hdd--pool-tpool
cinder--volumes--hdd-cinder--volumes--hdd--pool-tpool: 0 7271776256 thin-pool 43 7471/23552 919068/1775336 - ro no_discard_passdown queue_if_no_space needs_check 
------------------------- 

I have the identical lvm configuration on two hardware servers
-------------------------
Linux compute-02 3.10.0-514.16.1.el7.x86_64 #1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

filter = [ "a|^/dev/sda|", "a|^/dev/sdb|", "a|^/dev/sdi|", "r|.*/|" ]
global_filter = [ "a|^/dev/sda|", "a|^/dev/sdb|", "a|^/dev/sdi|", "r|.*/|" ]

[#]> pvs /dev/sdb[4-5]
  PV         VG                 Fmt  Attr PSize   PFree  
  /dev/sdb4  cinder-volumes-hdd lvm2 a--    3.00t      0 
  /dev/sdb5  cinder-volumes-hdd lvm2 a--  570.99g 174.40g

[#]> vgs cinder-volumes-hdd
  VG                 #PV #LV #SN Attr   VSize VFree  
  cinder-volumes-hdd   2  24   0 wz--n- 3.56t 174.40g

[#]> lvs cinder-volumes-hdd/cinder-volumes-hdd-pool
  LV                      VG                 Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  cinder-volumes-hdd-pool cinder-volumes-hdd twi-aotz-- 3.39t             52.05  28.03

 --- Physical volumes ---
  PV Name               /dev/sdb4     
  PV UUID               CUFme3-uVXt-6qf1-IZLY-Rdsk-O2fV-PibutC
  PV Status             allocatable
  Total PE / Free PE    786431 / 0
   
  PV Name               /dev/sdb5     
  PV UUID               2Pr211-RCyv-c2Ey-l5M5-auI7-w95h-1u4clD
  PV Status             allocatable
  Total PE / Free PE    146173 / 44890
------------------------- 

According to the monitoring graph Data and Meta usage for this thin pool was ~50%, so thin pool was not overloaded. Also I checked the RAID and smart for all disks in the RAID and all is OK.

Here is the log from repair attempt:
------------------------- 
[#]> vgchange -an cinder-volumes-hdd
[#]> lvconvert --verbose --repair cinder-volumes-hdd/cinder-volumes-hdd-pool
  Using default stripesize 64.00 KiB.
    activation/volume_list configuration setting not defined: Checking only host tags for cinder-volumes-hdd/lvol0_pmspare.
    Creating cinder--volumes--hdd-lvol0_pmspare
    Loading cinder--volumes--hdd-lvol0_pmspare table (253:7)
    Resuming cinder--volumes--hdd-lvol0_pmspare (253:7)
    activation/volume_list configuration setting not defined: Checking only host tags for cinder-volumes-hdd/cinder-volumes-hdd-pool_tmeta.
    Creating cinder--volumes--hdd-cinder--volumes--hdd--pool_tmeta
    Loading cinder--volumes--hdd-cinder--volumes--hdd--pool_tmeta table (253:8)
    Resuming cinder--volumes--hdd-cinder--volumes--hdd--pool_tmeta (253:8)
    Executing: /usr/sbin/thin_repair  -i /dev/mapper/cinder--volumes--hdd-cinder--volumes--hdd--pool_tmeta -o /dev/mapper/cinder--volumes--hdd-lvol0_pmspare
    Piping: /usr/sbin/thin_dump /dev/mapper/cinder--volumes--hdd-lvol0_pmspare
  Transaction id 44 from pool "cinder-volumes-hdd/cinder-volumes-hdd-pool" does not match repaired transaction id 43 from /dev/mapper/cinder--volumes--hdd-lvol0_pmspare.
    Removing cinder--volumes--hdd-cinder--volumes--hdd--pool_tmeta (253:8)
    Removing cinder--volumes--hdd-lvol0_pmspare (253:7)
  WARNING: recovery of pools without pool metadata spare LV is not automated.
  WARNING: If everything works, remove cinder-volumes-hdd/cinder-volumes-hdd-pool_meta0 volume.
  WARNING: Use pvmove command to move cinder-volumes-hdd/cinder-volumes-hdd-pool_tmeta on the best fitting PV.
[#]> vgchange -ay cinder-volumes-hdd
[#]> dmsetup status | grep cinder--volumes--hdd-cinder--volumes--hdd--pool-tpool
cinder--volumes--hdd-cinder--volumes--hdd--pool-tpool: 0 7271776256 thin-pool 45 7582/23552 894047/1775336 - rw no_discard_passdown queue_if_no_space - 
------------------------- 

Also I noticed quite strange warning that prints during eatch lvm modifications(create/delete ->/snapshot/volume) for a long time. Also it was fixed by restart of lvm2-lvmetad.service
-------------------------
Jul 11 05:00:37 hydra-compute-02 lvm[1677]: WARNING: Device for PV 2Pr211-RCyv-c2Ey-l5M5-auI7-w95h-1u4clD not found or rejected by a filter.
Jul 11 05:00:37 hydra-compute-02 lvm[1677]: Cannot change VG cinder-volumes-hdd while PVs are missing.
Jul 11 05:00:37 hydra-compute-02 lvm[1677]: Consider vgreduce --removemissing.
Jul 11 05:00:37 hydra-compute-02 lvm[1677]: Cannot process volume group cinder-volumes-hdd
Jul 11 05:00:37 hydra-compute-02 lvm[1677]: Failed to extend thin pool cinder--volumes--hdd-cinder--volumes--hdd--pool-tpool.
-------------------------

I have attached the full log /var/log/messages Could you please advice what can be the root of the issue?
Comment 2 Zdenek Kabelac 2017-07-20 05:02:00 EDT
Few comments:

Let's start from the end.

--

Lvm2 is informing user that it CANNOT work without fixing VG first.
VG has a missing PV device (UUID  2Pr211-RCyv-c2Ey-l5M5-auI7-w95h-1u4clD)

So was there some 'disk' damage/lost ?

Lvm2 requires to first put VG into a consistent state.
(vgreduce --removemissing).

--

lvconvert --repair 

This command is capable of handling only simple errors - however here we seems to be dealing with multiple errors at once - thus beyond capabilities of this command and does require manual user/admin interaction.

I'd probably propose here to attach details into BZ  - i.e. adding metadata for lvm2  & thin-pool for look.

--

Before we start this however it's good to also be aware what happened before actual error of thin-pool device.

--

As we are getting to the front/beginning of your report - metadata corruption was caused by  FS  corruption on VM -  this is rather hard case to solve - since it's quite unclear how big corruption has happened (hence the attachment of metadata is needed).

Here applies generic advice - lvm2 usually is always discouraging usage of weird 'stacks' where actual thin-pool data & metadata devices are not placed on direct raw devices but our routed via numerous stacked loop devices and adds many more possible points of failure (basically there could be a large presentation why this is all wrong - generic rule applies -  layer should never be mixed -  fs-layer ->  block->layer  (and never BACK to fs-layer).

Damage of thin-pool metadata might appear to be unfixable and thus whole content of thin-pool can be irreversibly lost. While thin-pool itself has large set of protections against continued usage of damaged metadata and there is usually large chance to fix a single initial failure -  it's getting increasingly hard to fix errors when you combined them with errors of stacked devices and layers...

That said - for fixing this individual case - metadata needs to be collected and examined in which 'state' they are and if there is some chance to rescue them.
Comment 3 Eugene 2017-07-20 09:21:39 EDT
Hello Zdenec,

Thank for prompt reply and detailed explanation. 

>Lvm2 is informing user that it CANNOT work without fixing VG first.
>VG has a missing PV device (UUID  2Pr211-RCyv-c2Ey-l5M5-auI7-w95h-1u4clD)
>So was there some 'disk' damage/lost ?

No, there was no damage or lost. As I said before pvs command shows that sdb5 is operable and not missing, but lvm prints to /var/log/messages that it is missing each time when I create lv or snapshot. Moreover lvm stopped printing this warning after restart of lvm2-lvmetad.service. Also I would like to add that I extended VG a ~month ago and it looks like that system prints message about missing device from the very beginning. In other words smth went wrong while extending the VG, but pvs showed that all is OK.
-------
fdisk /dev/sdb      (created new partition /dev/sdb5)
partprobe /dev/sdb
pvcreate /dev/sdb5
vgextend cinder-volumes-hdd /dev/sdb5
lvextend -L+550G cinder-volumes-hdd/cinder-volumes-hdd-pool
--------

Also 'lvs -a -o+devices' shows '/dev/sdb5(0)', so sdb5 was not used during all the time.

>As we are getting to the front/beginning of your report - metadata corruption was caused by  FS  corruption on VM -  
>this is rather hard case to solve - since it's quite unclear how 
>big corruption has happened (hence the attachment of metadata is needed).

Do I understand you correctly that it is possible to break the thin LVM metadata on the hypervisor by corrupting the file system inside VMs? I thought that it is not possible. You comments is highly appreciated. Just in case here is my setup:
RAID->OS Linux(/dev/sda->PV->VG->thin pool LV->LV)->iscsi connection->Instance/VM(/dev/vda->VG->LV(not thin)->ext4->OS)
-------
LV inside OS:

  --- Logical volume ---
  LV Path                /dev/cinder-volumes-hdd/volume-2892d42b-18f7-4d87-8810-1a2f99db0a80
  LV Name                volume-2892d42b-18f7-4d87-8810-1a2f99db0a80
  VG Name                cinder-volumes-hdd
  LV UUID                DYl0pu-2393-r6CM-fLlx-81x8-VfDa-VaBxi1
  LV Write Access        read/write
  LV Creation host, time compute-01.local, 2017-05-19 13:10:39 +0000
  LV Pool name           cinder-volumes-hdd-pool
  LV Status              available
  # open                 1
  LV Size                30.00 GiB
  Mapped size            100.00%
  Current LE             7680
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:42

Libvirt xml description of disk:

   <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source file='/dev/disk/by-path/ip-10.30.30.100:3260-iscsi-iqn.2010-10.org.openstack:volume-2892d42b-18f7-4d87-8810-1a2f99db0a80-lun-0'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <serial>649e5a35-b280-45d2-8dce-97a06aa74465</serial>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
  </disk>
-------


>That said - for fixing this individual case - metadata needs to be collected and examined in which 'state' 
>they are and if there is some chance to rescue them.

No need to recover data. I restored the corrupted VMs from backup and moved all other instances to another thin pool.

Note You need to log in before you can comment on or make changes to this bug.