Created attachment 1378229 [details] Tarfile containing a compressed image of the meta device and vgcfgbackup file Description of problem: I took a diskimage prior to any repair. I attempted a repair on 2.02.116 with thin_repair 0.3.1 on kernel 4.4.98 which failed. Then I restored from my image and tried again from a bootable archlinux USB key running: LVM Version: 2.02.176(2) (2017-11-03) thin_repair --version: 0.7.5 Linux archiso 4.13.12-1-ARCH #1 SMP PREEMPT Wed Nov 8 11:54:06 CET 2017 x86_64 GNU/Linux And I got the same failure. Executing: /usr/bin/thin_check --clear-needs-check-flag /dev/mapper/pve-data_tmeta examining superblock examining devices tree examining mapping tree missing all mappings for devices: [0, -] value size mismatch: expected 8, but got 24. This is not the btree you are looking for. (block 70) /usr/bin/thin_check failed: 1 Check of pool pve/data failed (status:1). Manual repair required! Removing pve-data_tdata (254:3) Removing pve-data_tmeta (254:2) Version-Release number of selected component (if applicable): How reproducible: Unknown Steps to Reproduce: 1. Run out of space. 2. Reboot. 3. Repeat same thing that caused you to run out of space again. 4. Reboot. 5. LVM failed to come up, and thin_repair can't fix it, even when given a 300MB lv to repair into. Actual results: No mapping for any thinLV, data inaccessible. Expected results: Data to be recoverable at least. Additional info: Proxmox VE 4.4-20/2650b7b5 Linux atomic 4.4.98-2-pve #1 SMP PVE 4.4.98-101 (Mon, 18 Dec 2017 13:36:02 +0100) x86_64 GNU/Linux LVM 2.02.116 thin_repair 0.3.1
I forgot to add checksum SHA512(lvm-meta-cfgbackup_atomic.tar)= f10730f50ae9c51bbf35c769d22640d1ca6764fd29291b80d6f581bab8f674ec387bc61b5678a585919cf611856ad1bd414fff85c527448816924c9c45035b9b
Created attachment 1380237 [details] Requested metadata and cfg Similar circumstances to Alex. Unrecodnized overprovisioned disk and then power outage causing corruption.
Created attachment 1380239 [details] meta1, meta2, and /etc/lvm
Created attachment 1382185 [details] Kernel logs
[Excerpts from the first time running out of space (and successfully rebooting and everything ending up fine.)] Dec 23 19:53:56 atomic lvm[1109]: Thin pve-data-tpool is now 95% full. Dec 23 20:56:32 atomic kernel: [5126470.856309] device-mapper: thin: 251:4: metadata operation 'dm_pool_alloc_data_block' failed: error = -28 Dec 23 20:56:32 atomic kernel: [5126470.856350] device-mapper: thin: 251:4: aborting current metadata transaction Dec 23 20:56:33 atomic kernel: [5126471.462387] device-mapper: thin: 251:4: switching pool to read-only mode Dec 23 20:57:00 atomic kernel: [5126499.188962] Buffer I/O error on dev dm-23, logical block 46016, lost async page write Dec 23 20:57:31 atomic kernel: [5126529.684233] buffer_io_error: 23 callbacks suppressed Dec 23 20:57:53 atomic kernel: [5126551.427673] buffer_io_error: 164 callbacks suppressed <...> Dec 23 21:17:31 atomic kernel: [5127730.237332] buffer_io_error: 19462 callbacks suppressed Dec 23 21:17:31 atomic kernel: [5127730.237744] Buffer I/O error on dev dm-23, logical block 22480714, lost async page write Dec 23 21:17:34 atomic kernel: [5127732.988048] VFS: Dirty inode writeback failed for block device dm-23 (err=-5). <...> reboot <...> Dec 23 21:46:29 atomic kernel: [ 0.918952] device-mapper: uevent: version 1.0.3 Dec 23 21:46:29 atomic kernel: [ 0.919008] device-mapper: ioctl: 4.34.0-ioctl (2015-10-28) initialised: dm-devel <...> Dec 23 21:46:31 atomic lvm[1059]: Thin pve-data-tpool is now 98% full. Dec 23 21:46:29 atomic kernel: [ 9.077473] Adding 7340028k swap on /dev/mapper/pve-swap. Priority:-1 extents:1 across:7340028k FS Dec 23 21:46:29 atomic kernel: [ 17.543264] device-mapper: thin: Data device (dm-3) discard unsupported: Disabling discard passdown. <...> (At this point everything seems fine and I copy the same file over again, run out of space again and...) <...> Dec 23 22:11:07 atomic kernel: [ 1507.448976] device-mapper: thin: 251:4: metadata operation 'dm_pool_alloc_data_block' failed: error = -28 Dec 23 22:11:07 atomic kernel: [ 1507.448995] device-mapper: thin: 251:4: aborting current metadata transaction Dec 23 22:11:07 atomic kernel: [ 1507.505452] device-mapper: thin: 251:4: switching pool to read-only mode Dec 23 22:11:08 atomic kernel: [ 1507.601996] attempt to access beyond end of device Dec 23 22:11:08 atomic kernel: [ 1507.601999] dm-2: rw=0, want=2800344, limit=180224 Dec 23 22:11:08 atomic kernel: [ 1507.602002] device-mapper: thin: __process_bio_read_only: dm_thin_find_block() failed: error = -5 <..thousand lines later..> Dec 23 22:11:08 atomic kernel: [ 1507.617040] Buffer I/O error on dev dm-22, logical block 24090176, async page read <...> Dec 23 22:11:21 atomic kernel: [ 1521.163384] device-mapper: thin: dm_thin_get_highest_mapped_block returned -5 Dec 23 22:11:21 atomic kernel: [ 1521.163419] device-mapper: thin: dm_thin_get_highest_mapped_block returned -15 <...> Dec 23 22:11:31 atomic kernel: [ 1530.602487] device-mapper: btree spine: node_check failed: blocknr 0 != wanted 21938 Dec 23 22:11:31 atomic kernel: [ 1530.602523] device-mapper: block manager: btree_node validator check failed for block 21938 Dec 23 22:11:31 atomic kernel: [ 1530.602568] device-mapper: btree spine: node_check failed: blocknr 0 != wanted 21938 Dec 23 22:11:31 atomic kernel: [ 1530.602600] device-mapper: block manager: btree_node validator check failed for block 21938 Dec 23 22:11:31 atomic kernel: [ 1530.602634] device-mapper: btree spine: node_check failed: blocknr 0 != wanted 21938 <...> Dec 23 22:11:31 atomic kernel: [ 1531.523072] device-mapper: thin: dm_thin_get_highest_mapped_block returned -22 <...> Dec 23 22:20:17 atomic kernel: [ 2057.082854] device-mapper: thin: __process_bio_read_only: dm_thin_find_block() failed: error = -5 Dec 23 22:20:17 atomic kernel: [ 2057.082893] Buffer I/O error on dev dm-14, logical block 1108686, async page read Dec 23 22:21:24 atomic kernel: [ 0.000000] Initializing cgroup subsys cpuset (reboot) <...> Dec 23 22:21:24 atomic lvm[425]: Check of pool pve/data failed (status:1). Manual repair required! (Now it seems to be FUBAR'd... I took a full disk image at this point. Follows is excerpt from thin_check) examining mapping tree missing all mappings for devices: [0, -] value size mismatch: expected 8, but got 24. This is not the btree you are looking for. (block 70)
Created attachment 1382186 [details] My condensed version of the kernel logs and story.
thin_repair still fails to recover any volumes, even with the recent changes (0.8.5)