Bug 1532071
| Summary: | Out of space on Thin LVM, thin_repair failed to recover properly | ||
|---|---|---|---|
| Product: | [Community] LVM and device-mapper | Reporter: | Alex <qwertylex> |
| Component: | lvm2 | Assignee: | Joe Thornber <thornber> |
| lvm2 sub component: | Thin Provisioning | QA Contact: | cluster-qe <cluster-qe> |
| Status: | ASSIGNED --- | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | agk, heinzm, jbrassow, msnitzer, prajnoha, redhat, terry, thornber, zkabelac |
| Version: | 2.02.116 | Flags: | rule-engine:
lvm-technical-solution?
rule-engine: lvm-test-coverage? |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
I forgot to add checksum SHA512(lvm-meta-cfgbackup_atomic.tar)= f10730f50ae9c51bbf35c769d22640d1ca6764fd29291b80d6f581bab8f674ec387bc61b5678a585919cf611856ad1bd414fff85c527448816924c9c45035b9b Created attachment 1380237 [details]
Requested metadata and cfg
Similar circumstances to Alex. Unrecodnized overprovisioned disk and then power outage causing corruption.
Created attachment 1380239 [details]
meta1, meta2, and /etc/lvm
Created attachment 1382185 [details]
Kernel logs
[Excerpts from the first time running out of space (and successfully rebooting and everything ending up fine.)]
Dec 23 19:53:56 atomic lvm[1109]: Thin pve-data-tpool is now 95% full.
Dec 23 20:56:32 atomic kernel: [5126470.856309] device-mapper: thin: 251:4: metadata operation 'dm_pool_alloc_data_block' failed: error = -28
Dec 23 20:56:32 atomic kernel: [5126470.856350] device-mapper: thin: 251:4: aborting current metadata transaction
Dec 23 20:56:33 atomic kernel: [5126471.462387] device-mapper: thin: 251:4: switching pool to read-only mode
Dec 23 20:57:00 atomic kernel: [5126499.188962] Buffer I/O error on dev dm-23, logical block 46016, lost async page write
Dec 23 20:57:31 atomic kernel: [5126529.684233] buffer_io_error: 23 callbacks suppressed
Dec 23 20:57:53 atomic kernel: [5126551.427673] buffer_io_error: 164 callbacks suppressed
<...>
Dec 23 21:17:31 atomic kernel: [5127730.237332] buffer_io_error: 19462 callbacks suppressed
Dec 23 21:17:31 atomic kernel: [5127730.237744] Buffer I/O error on dev dm-23, logical block 22480714, lost async page write
Dec 23 21:17:34 atomic kernel: [5127732.988048] VFS: Dirty inode writeback failed for block device dm-23 (err=-5).
<...>
reboot
<...>
Dec 23 21:46:29 atomic kernel: [ 0.918952] device-mapper: uevent: version 1.0.3
Dec 23 21:46:29 atomic kernel: [ 0.919008] device-mapper: ioctl: 4.34.0-ioctl (2015-10-28) initialised: dm-devel
<...>
Dec 23 21:46:31 atomic lvm[1059]: Thin pve-data-tpool is now 98% full.
Dec 23 21:46:29 atomic kernel: [ 9.077473] Adding 7340028k swap on /dev/mapper/pve-swap. Priority:-1 extents:1 across:7340028k FS
Dec 23 21:46:29 atomic kernel: [ 17.543264] device-mapper: thin: Data device (dm-3) discard unsupported: Disabling discard passdown.
<...>
(At this point everything seems fine and I copy the same file over again, run out of space again and...)
<...>
Dec 23 22:11:07 atomic kernel: [ 1507.448976] device-mapper: thin: 251:4: metadata operation 'dm_pool_alloc_data_block' failed: error = -28
Dec 23 22:11:07 atomic kernel: [ 1507.448995] device-mapper: thin: 251:4: aborting current metadata transaction
Dec 23 22:11:07 atomic kernel: [ 1507.505452] device-mapper: thin: 251:4: switching pool to read-only mode
Dec 23 22:11:08 atomic kernel: [ 1507.601996] attempt to access beyond end of device
Dec 23 22:11:08 atomic kernel: [ 1507.601999] dm-2: rw=0, want=2800344, limit=180224
Dec 23 22:11:08 atomic kernel: [ 1507.602002] device-mapper: thin: __process_bio_read_only: dm_thin_find_block() failed: error = -5
<..thousand lines later..>
Dec 23 22:11:08 atomic kernel: [ 1507.617040] Buffer I/O error on dev dm-22, logical block 24090176, async page read
<...>
Dec 23 22:11:21 atomic kernel: [ 1521.163384] device-mapper: thin: dm_thin_get_highest_mapped_block returned -5
Dec 23 22:11:21 atomic kernel: [ 1521.163419] device-mapper: thin: dm_thin_get_highest_mapped_block returned -15
<...>
Dec 23 22:11:31 atomic kernel: [ 1530.602487] device-mapper: btree spine: node_check failed: blocknr 0 != wanted 21938
Dec 23 22:11:31 atomic kernel: [ 1530.602523] device-mapper: block manager: btree_node validator check failed for block 21938
Dec 23 22:11:31 atomic kernel: [ 1530.602568] device-mapper: btree spine: node_check failed: blocknr 0 != wanted 21938
Dec 23 22:11:31 atomic kernel: [ 1530.602600] device-mapper: block manager: btree_node validator check failed for block 21938
Dec 23 22:11:31 atomic kernel: [ 1530.602634] device-mapper: btree spine: node_check failed: blocknr 0 != wanted 21938
<...>
Dec 23 22:11:31 atomic kernel: [ 1531.523072] device-mapper: thin: dm_thin_get_highest_mapped_block returned -22
<...>
Dec 23 22:20:17 atomic kernel: [ 2057.082854] device-mapper: thin: __process_bio_read_only: dm_thin_find_block() failed: error = -5
Dec 23 22:20:17 atomic kernel: [ 2057.082893] Buffer I/O error on dev dm-14, logical block 1108686, async page read
Dec 23 22:21:24 atomic kernel: [ 0.000000] Initializing cgroup subsys cpuset
(reboot)
<...>
Dec 23 22:21:24 atomic lvm[425]: Check of pool pve/data failed (status:1). Manual repair required!
(Now it seems to be FUBAR'd... I took a full disk image at this point. Follows is excerpt from thin_check)
examining mapping tree
missing all mappings for devices: [0, -]
value size mismatch: expected 8, but got 24. This is not the btree you are looking for. (block 70)
Created attachment 1382186 [details]
My condensed version of the kernel logs and story.
thin_repair still fails to recover any volumes, even with the recent changes (0.8.5) |
Created attachment 1378229 [details] Tarfile containing a compressed image of the meta device and vgcfgbackup file Description of problem: I took a diskimage prior to any repair. I attempted a repair on 2.02.116 with thin_repair 0.3.1 on kernel 4.4.98 which failed. Then I restored from my image and tried again from a bootable archlinux USB key running: LVM Version: 2.02.176(2) (2017-11-03) thin_repair --version: 0.7.5 Linux archiso 4.13.12-1-ARCH #1 SMP PREEMPT Wed Nov 8 11:54:06 CET 2017 x86_64 GNU/Linux And I got the same failure. Executing: /usr/bin/thin_check --clear-needs-check-flag /dev/mapper/pve-data_tmeta examining superblock examining devices tree examining mapping tree missing all mappings for devices: [0, -] value size mismatch: expected 8, but got 24. This is not the btree you are looking for. (block 70) /usr/bin/thin_check failed: 1 Check of pool pve/data failed (status:1). Manual repair required! Removing pve-data_tdata (254:3) Removing pve-data_tmeta (254:2) Version-Release number of selected component (if applicable): How reproducible: Unknown Steps to Reproduce: 1. Run out of space. 2. Reboot. 3. Repeat same thing that caused you to run out of space again. 4. Reboot. 5. LVM failed to come up, and thin_repair can't fix it, even when given a 300MB lv to repair into. Actual results: No mapping for any thinLV, data inaccessible. Expected results: Data to be recoverable at least. Additional info: Proxmox VE 4.4-20/2650b7b5 Linux atomic 4.4.98-2-pve #1 SMP PVE 4.4.98-101 (Mon, 18 Dec 2017 13:36:02 +0100) x86_64 GNU/Linux LVM 2.02.116 thin_repair 0.3.1