Bug 1286500 - Tool thin_dump failing to show 'mappings'
Tool thin_dump failing to show 'mappings'
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel (Show other bugs)
7.2
x86_64 Linux
unspecified Severity medium
: rc
: ---
Assigned To: Mike Snitzer
Bruno Goncalves
:
: 1290824 (view as bug list)
Depends On:
Blocks: 1313485 1295577
  Show dependency treegraph
 
Reported: 2015-11-29 22:56 EST by Gudge
Modified: 2016-11-03 10:25 EDT (History)
12 users (show)

See Also:
Fixed In Version: kernel-3.10.0-366.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1290912 (view as bug list)
Environment:
Last Closed: 2016-11-03 10:25:32 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch on Kernel 3.10.0-327.4.5 (6.50 KB, patch)
2016-02-16 11:37 EST, Gudge
shankhabanerjee: review? (thornber)
Details | Diff

  None (edit)
Description Gudge 2015-11-29 22:56:03 EST
Description of problem:
Tool thin_dump failing to show 'mappings'

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux 7.2

How reproducible:
Everytime.

Steps to Reproduce: (Reproducible on a Virtual Machine)
1. Create a new Volume Group and a thin pool and thin volume within the thin pool.

   pvcreate /dev/sdc       # Size of the physical disk is 75GB
   vgcreate VG /dev/sdc
   lvcreate -y --extents 100%free --thin VG/POOL
   lvcreate -y --name LV1 --virtualsize 30GB --thinpool VG/POOL 

2. Create xfs file system on the logical volume and mount it.
    mkfs.xfs -L STP /dev/mapper/VG-LV1
    mkdir -p /LV1
    mount -L STP /LV1

3. Two Shell Scripts:

a. Script for writing data to the logical volume: dd.sh
 
#! /bin/bash

i="0"
while true; do
    i=$[$i+1]
    if [ $i -eq 5 ]; then
        exit 0
    fi
    t='t'_$i.txt
    dd if=/dev/zero of=$t bs=2GB count=2 seek=1
    sync
done

b. Script for dumping thin dump data to a file  thin_dump.sh
#! /bin/bash

thin_dump_data='/lvm_scripts/thin_dump_data'

POOL='/dev/mapper/VG-POOL'
POOL_TMETA=$POOL'_tmeta'
POOL_TPOOL=$POOL'-tpool'
i="0"
while true; do
    i=$[$i+1]

    if [ $i -eq 100 ]; then
        exit 0
    fi

    t='thinDump'_$i.xml

    echo "Iteration : " $i
    t='thinDump'_$i.xml

    dmsetup message $POOL_TPOOL 0 reserve_metadata_snap
    echo "Reserve message snapshot message status : " $?

    block_no=`dmsetup status $POOL_TPOOL | cut -f 7 -d " "`
    thin_dump -r -f xml $POOL_TMETA -m $block_no > $thin_dump_data/$t
    echo "Thin dump message status " : $?

    dmsetup message $POOL_TPOOL 0 release_metadata_snap
    echo "Release snapshot message status " : $?

    sleep 3
done


4. Create directory structure:

mkdir -p /lvm_scripts/thin_dump_data

5. Please place script 3(a) and 3(b) in the directory
/lvm_scripts

6. On one shell inside the mounted volume (/LV1)
Please run dd.sh

This will start dumping data on the logical volume.

7. On the other shell please run thin_dump.sh

This will start dumping thin dump output in xml format in /lvm_scripts/thin_dump_data


8. Once the run for dd.sh finishes you can kill thin_dump.sh

9) If you look at the xml files generated by thin dump
you will notice than some of them are empty.

No single mappings or range mappings specified
 

10) I have written another script which will just parse the xml file and
calculate the mappings and dump the size based on the number of mappings and chunk size. 


#!/usr/bin/python

import xml.etree.ElementTree as ET
from xml.etree.ElementTree import ParseError
import os

class Device:
    def __init__(self, device_id, mapped_blocks, chunk_size):
        self.device_id = int(device_id)
        self.mapped_blocks = int(mapped_blocks)
        self.chunk_size = chunk_size
        self.block_tuple_list = []

    def add_block_info(self, data, length='1'):
        self.block_tuple_list.append((int(data), int(length)))


def get_size(device):
    total_blocks = sum(x[1] for x in device.block_tuple_list)
    total_size = total_blocks * device.chunk_size * 1024
    total_size_kb = total_size / 1024
    total_size_mb = total_size / (1024 * 1024)
    total_size_gb = total_size / (1024 * 1024 * 1024)
    print 'device_id = {} chunk_size = {} KB total_blocks = {} mapped_blocks = {} size = {} {} KB {} MB {} GB'.format(device.device_id, device.chunk_size, total_blocks, device.mapped_blocks, total_size, total_size_kb, total_size_mb, total_size_gb)


def lvm_size_wrapper(xfile):
    if not os.path.exists(xfile):
        return
    print 'File = {}'.format(xfile)
    try:
        tree = ET.parse(xfile)
        root = tree.getroot()
    except ParseError:
        print 'Could not parse file'
        return
    nr_data_blocks = root.attrib['nr_data_blocks']
    data_block_size = root.attrib['data_block_size']
    lvm_chunk_size = (int(data_block_size) * 512)/1024
    print 'nr_data_blocks = {} data_block_size = {} chunk_size = {} KB'.format(nr_data_blocks, data_block_size, lvm_chunk_size)
    device_id_list = []
    device_id_map = {}
    for child in root:
        device = Device(child.attrib['dev_id'], child.attrib['mapped_blocks'], lvm_chunk_size)
        device_id_list.append(device)
        device_id_map[device.device_id] = device

        for e in child:
            if e.tag == 'single_mapping':
                device.add_block_info(e.attrib['data_block'])
            elif e.tag == 'range_mapping':
                device.add_block_info(e.attrib['data_begin'], e.attrib['length'])

    for d in device_id_list:
        get_size(d)
def lvm_helper():
    path='/lvm_scripts/thin_dump_data'
    list_of_files = []
    for (dirpath, dirnames, filenames) in os.walk(path):
        for filename in filenames:
            if filename.endswith('.xml'):
                list_of_files.append(os.sep.join([dirpath, filename]))

    for f in list_of_files:
        lvm_size_wrapper(f)

lvm_helper()



11. This will show you the size gradually increases and then suddenly drops to zero and then picks up again
from where it dropped.


Actual results:
Thin dump does not dump mappings correctly even when run on the snapshot of the metadata.


Expected results:
This dump should report all the block mappings correctly every time it is run on the snapshot of the metadata.


Additional info:
It takes hardly 2 minutes to reproduce the issue and it is 100% reproducible everytime. This is not just on RedHat7 but on CentOS7 as well.
Comment 1 Gudge 2015-12-01 09:57:04 EST
Please do let me know if you are able to reproduce the issue. If need be I can share my VM. That may helpful in reproducing the issue.

Thanks for all your help.
Comment 2 Zdenek Kabelac 2015-12-01 11:02:15 EST
I've tried simplier reproducer myself locally:

shell1:

lvcreate -T -L20 vg/pool
while :
do
  lvcreate -V 20 vg/pool -n th
  dd if=/dev/zero of=/dev/vg/th bs=1M count=1 conv=fdatasync
  lvremove -ff vg/th
done

----

shell2

while :
do
        dmsetup message vg-pool-tpool 0 reserve_metadata_snap
        thin_dump -r -f xml /dev/mapper/vg-pool_tmeta -m $(dmsetup status vg-pool-tpool | cut -f 7 -d " ")
        dmsetup message vg-pool-tpool 0 release_metadata_snap
done

----


end result was machine deadlock on bare metal T61 in a few seconds of parallel run:


device-mapper: block manager: recursive lock detected in metadata
device-mapper: space map common: dm_tm_shadow_block() failed
device-mapper: space map common: unable to decrement a reference count below 0
device-mapper: space map common: dm_tm_shadow_block() failed
device-mapper: space map common: unable to decrement a reference count below 0
device-mapper: space map common: dm_tm_shadow_block() failed
device-mapper: space map common: unable to decrement a reference count below 0
device-mapper: space map common: dm_tm_shadow_block() failed
device-mapper: block manager: validator mismatch (old=sm_bitmap vs new=btree_node) for block 1
device-mapper: thin: dm_thin_get_highest_mapped_block returned -22
device-mapper: block manager: validator mismatch (old=sm_bitmap vs new=btree_node) for block 1
device-mapper: block manager: validator mismatch (old=sm_bitmap vs new=btree_node) for block 1
device-mapper: block manager: validator mismatch (old=sm_bitmap vs new=btree_node) for block 1
device-mapper: block manager: validator mismatch (old=sm_bitmap vs new=btree_node) for block 1
device-mapper: block manager: validator mismatch (old=sm_bitmap vs new=btree_node) for block 1
device-mapper: block manager: validator mismatch (old=sm_bitmap vs new=btree_node) for block 1
device-mapper: thin: dm_thin_get_highest_mapped_block returned -22
device-mapper: space map common: dm_tm_shadow_block() failed
device-mapper: bufio: leaked buffer 6, hold count 1, list 0
------------[ cut here ]------------
kernel BUG at drivers/md/dm-bufio.c:1484!
invalid opcode: 0000 [#1] SMP 
Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison libcrc32c xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_analog snd_hda_codec_generic iTCO_wdt iTCO_vendor_support arc4 ppdev coretemp kvm_intel kvm mt7601u iwl3945 iwlegacy joydev mac80211 snd_hda_intel snd_hda_codec i2c_i801 cfg80211 snd_hda_core lpc_ich snd_hwdep r592 snd_seq memstick snd_seq_device snd_pcm e1000e snd_timer shpchp thinkpad_acpi ptp snd pps_core wmi parport_pc soundcore tpm_tis fjes parport rfkill tpm acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace binfmt_misc sunrpc loop sdhci_pci i915 sdhci i2c_algo_bit drm_kms_helper drm mmc_core serio_raw ata_generic yenta_socket pata_acpi video
CPU: 1 PID: 103 Comm: kworker/u4:7 Not tainted 4.3.0-0.rc7.git2.2.fc24.x86_64 #1
Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011
Workqueue: dm-thin do_worker [dm_thin_pool]
task: ffff8800b9b80000 ti: ffff880135e30000 task.ti: ffff880135e30000
RIP: 0010:[<ffffffff8160b2da>]  [<ffffffff8160b2da>] dm_bufio_client_destroy+0x14a/0x1d0
RSP: 0018:ffff880135e33be8  EFLAGS: 00010287
RAX: ffff880135def238 RBX: ffff880135def200 RCX: 0000000000000000
RDX: ffff880135def220 RSI: ffff88013bb0dff8 RDI: ffff88013bb0dff8
RBP: ffff880135e33c10 R08: 0000000000000001 R09: 0000000000000523
R10: ffff8800ac637000 R11: 0000000000000523 R12: ffff880135def248
R13: 0000000000000002 R14: ffff880135def228 R15: ffff880135def210
FS:  0000000000000000(0000) GS:ffff88013bb00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fd6f4874c30 CR3: 0000000001c0b000 CR4: 00000000000006e0
Stack:
 ffff88013204d0f0 ffff88013217b000 00000000ffffffea ffff8800ac636800
 ffff8800a9e37e20 ffff880135e33c28 ffffffffa07561f5 ffff88013217b000
 ffff880135e33c40 ffffffffa077124a ffff88013217b158 ffff880135e33c68
Call Trace:
 [<ffffffffa07561f5>] dm_block_manager_destroy+0x15/0x20 [dm_persistent_data]
 [<ffffffffa077124a>] __destroy_persistent_data_objects+0x3a/0x40 [dm_thin_pool]
 [<ffffffffa0773593>] dm_pool_abort_metadata+0x63/0xa0 [dm_thin_pool]
 [<ffffffffa076cfbe>] metadata_operation_failed+0x5e/0x100 [dm_thin_pool]
 [<ffffffffa076e0bb>] alloc_data_block.isra.48+0x8b/0x190 [dm_thin_pool]
 [<ffffffffa07708c0>] process_cell+0x2b0/0x510 [dm_thin_pool]
 [<ffffffff810d4008>] ? dequeue_entity+0x3b8/0xa80
 [<ffffffff81655fd7>] ? skb_release_data+0xa7/0xd0
 [<ffffffffa076b075>] ? process_prepared+0x75/0xc0 [dm_thin_pool]
 [<ffffffffa076fd3e>] do_worker+0x26e/0x830 [dm_thin_pool]
 [<ffffffff810b759e>] process_one_work+0x19e/0x3f0
 [<ffffffff810b783e>] worker_thread+0x4e/0x450
 [<ffffffff8177a801>] ? __schedule+0x371/0x980
 [<ffffffff810b77f0>] ? process_one_work+0x3f0/0x3f0
 [<ffffffff810b77f0>] ? process_one_work+0x3f0/0x3f0
 [<ffffffff810bd6a8>] kthread+0xd8/0xf0
 [<ffffffff810bd5d0>] ? kthread_worker_fn+0x160/0x160
 [<ffffffff8177f25f>] ret_from_fork+0x3f/0x70
 [<ffffffff810bd5d0>] ? kthread_worker_fn+0x160/0x160
Code: d2 75 7e 48 8b 53 50 48 85 d2 75 54 48 8b bb 80 00 00 00 e8 49 af ff ff 48 89 df e8 51 73 bf ff 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 49 89 d7 41 8b 57 40 49 8b 77 28 44 89 
RIP  [<ffffffff8160b2da>] dm_bufio_client_destroy+0x14a/0x1d0
 RSP <ffff880135e33be8>
---[ end trace 8ef1e20cefdaef36 ]---
BUG: unable to handle kernel paging request at ffffffffffffffd8
IP: [<ffffffff810bdd00>] kthread_data+0x10/0x20
PGD 1c0e067 PUD 1c10067 PMD 0 
Oops: 0000 [#2] SMP 
Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison libcrc32c xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_analog snd_hda_codec_generic iTCO_wdt iTCO_vendor_support arc4 ppdev coretemp kvm_intel kvm mt7601u iwl3945 iwlegacy joydev mac80211 snd_hda_intel snd_hda_codec i2c_i801 cfg80211 snd_hda_core lpc_ich snd_hwdep r592 snd_seq memstick snd_seq_device snd_pcm e1000e snd_timer shpchp thinkpad_acpi ptp snd pps_core wmi parport_pc soundcore tpm_tis fjes parport rfkill tpm acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace binfmt_misc sunrpc loop sdhci_pci i915 sdhci i2c_algo_bit drm_kms_helper drm mmc_core serio_raw ata_generic yenta_socket pata_acpi video
CPU: 0 PID: 103 Comm: kworker/u4:7 Tainted: G      D         4.3.0-0.rc7.git2.2.fc24.x86_64 #1
Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011
task: ffff8800b9b80000 ti: ffff880135e30000 task.ti: ffff880135e30000
RIP: 0010:[<ffffffff810bdd00>]  [<ffffffff810bdd00>] kthread_data+0x10/0x20
RSP: 0018:ffff880135e338b8  EFLAGS: 00010002
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81f27e40
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800b9b80000
RBP: ffff880135e338b8 R08: ffff8800b9b80088 R09: 00000020e0e3ef77
R10: ffff8800b9b80060 R11: 0000000000000003 R12: 0000000000016c80
R13: ffff8800b9b80000 R14: ffff88013ba16c80 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88013ba00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000028 CR3: 0000000001c0b000 CR4: 00000000000006f0
Stack:
 ffff880135e338d0 ffffffff810b8401 ffff88013ba16c80 ffff880135e33918
 ffffffff8177aab0 ffff880100000000 ffff8800b9b80000 ffff880135e34000
 ffff880135e33970 ffff880135e33970 ffff880135e33468 ffff880135e33468
Call Trace:
 [<ffffffff810b8401>] wq_worker_sleeping+0x11/0x90
 [<ffffffff8177aab0>] __schedule+0x620/0x980
 [<ffffffff8177ae43>] schedule+0x33/0x80
 [<ffffffff810a229a>] do_exit+0x80a/0xae0
 [<ffffffff8101888a>] oops_end+0x9a/0xd0
 [<ffffffff81018d4b>] die+0x4b/0x70
 [<ffffffff81015d11>] do_trap+0xb1/0x140
 [<ffffffff810160b9>] do_error_trap+0x89/0x110
 [<ffffffff8160b2da>] ? dm_bufio_client_destroy+0x14a/0x1d0
 [<ffffffff8177e96e>] ? _raw_spin_unlock_irqrestore+0xe/0x10
 [<ffffffff8118901e>] ? irq_work_queue+0x8e/0xa0
 [<ffffffff810f47f1>] ? console_unlock+0x201/0x520
 [<ffffffff81016670>] do_invalid_op+0x20/0x30
 [<ffffffff81780a9e>] invalid_op+0x1e/0x30
 [<ffffffff8160b2da>] ? dm_bufio_client_destroy+0x14a/0x1d0
 [<ffffffff8160b2fc>] ? dm_bufio_client_destroy+0x16c/0x1d0
 [<ffffffffa07561f5>] dm_block_manager_destroy+0x15/0x20 [dm_persistent_data]
 [<ffffffffa077124a>] __destroy_persistent_data_objects+0x3a/0x40 [dm_thin_pool]
 [<ffffffffa0773593>] dm_pool_abort_metadata+0x63/0xa0 [dm_thin_pool]
 [<ffffffffa076cfbe>] metadata_operation_failed+0x5e/0x100 [dm_thin_pool]
 [<ffffffffa076e0bb>] alloc_data_block.isra.48+0x8b/0x190 [dm_thin_pool]
 [<ffffffffa07708c0>] process_cell+0x2b0/0x510 [dm_thin_pool]
 [<ffffffff810d4008>] ? dequeue_entity+0x3b8/0xa80
 [<ffffffff81655fd7>] ? skb_release_data+0xa7/0xd0
 [<ffffffffa076b075>] ? process_prepared+0x75/0xc0 [dm_thin_pool]
 [<ffffffffa076fd3e>] do_worker+0x26e/0x830 [dm_thin_pool]
 [<ffffffff810b759e>] process_one_work+0x19e/0x3f0
 [<ffffffff810b783e>] worker_thread+0x4e/0x450
 [<ffffffff8177a801>] ? __schedule+0x371/0x980
 [<ffffffff810b77f0>] ? process_one_work+0x3f0/0x3f0
 [<ffffffff810b77f0>] ? process_one_work+0x3f0/0x3f0
 [<ffffffff810bd6a8>] kthread+0xd8/0xf0
 [<ffffffff810bd5d0>] ? kthread_worker_fn+0x160/0x160
 [<ffffffff8177f25f>] ret_from_fork+0x3f/0x70
 [<ffffffff810bd5d0>] ? kthread_worker_fn+0x160/0x160
Code: cf 48 89 e7 e8 92 e4 6b 00 e9 55 ff ff ff e8 18 12 fe ff 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 87 80 05 00 00 55 48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 
RIP  [<ffffffff810bdd00>] kthread_data+0x10/0x20
 RSP <ffff880135e338b8>
CR2: ffffffffffffffd8
---[ end trace 8ef1e20cefdaef37 ]---
Fixing recursive fault but reboot is needed!
Comment 3 Mike Snitzer 2015-12-01 11:58:22 EST
(In reply to Zdenek Kabelac from comment #2)
> I've tried simplier reproducer myself locally:
> 
> shell1:
> 
> lvcreate -T -L20 vg/pool
> while :
> do
>   lvcreate -V 20 vg/pool -n th
>   dd if=/dev/zero of=/dev/vg/th bs=1M count=1 conv=fdatasync
>   lvremove -ff vg/th
> done
> 
> ----
> 
> shell2
> 
> while :
> do
>         dmsetup message vg-pool-tpool 0 reserve_metadata_snap
>         thin_dump -r -f xml /dev/mapper/vg-pool_tmeta -m $(dmsetup status
> vg-pool-tpool | cut -f 7 -d " ")
>         dmsetup message vg-pool-tpool 0 release_metadata_snap
> done
> 
> ----
> 
> 
> end result was machine deadlock on bare metal T61 in a few seconds of
> parallel run:

We obviously need to add much more safety to the kernel interfaces.
But IIRC the thin-pool should be suspended _before_ the 'reserve_metadata_snap'.
(So it seems like the kernel code should fail the 'reserve_metadata_snap' if the pool isn't suspended).

That said, I could be mistaken.  Joe is really the one who needs to weigh-in here.
Comment 4 Joe Thornber 2015-12-01 12:05:41 EST
I'll take the bug.  Not able to look at it for a couple of days however.
Comment 5 Gudge 2015-12-01 12:17:06 EST
> We obviously need to add much more safety to the kernel interfaces.
> But IIRC the thin-pool should be suspended _before_ the
> 'reserve_metadata_snap'.
> (So it seems like the kernel code should fail the 'reserve_metadata_snap' if
> the pool isn't suspended).
> 
> That said, I could be mistaken.  Joe is really the one who needs to weigh-in
> here.

I did try suspending the metadata volume. On my setup If you suspend the metadata volume I am not able to take a snapshot. The command hangs.

dmsetup suspend /dev/mapper/vg-pool_tmeta
dmsetup message vg-pool-tpool 0 reserve_metadata_snap  ---> Hangs
Comment 6 Mike Snitzer 2015-12-01 19:06:44 EST
(In reply to Gudge from comment #5)
> > We obviously need to add much more safety to the kernel interfaces.
> > But IIRC the thin-pool should be suspended _before_ the
> > 'reserve_metadata_snap'.
> > (So it seems like the kernel code should fail the 'reserve_metadata_snap' if
> > the pool isn't suspended).
> > 
> > That said, I could be mistaken.  Joe is really the one who needs to weigh-in
> > here.
> 
> I did try suspending the metadata volume. On my setup If you suspend the
> metadata volume I am not able to take a snapshot. The command hangs.
> 
> dmsetup suspend /dev/mapper/vg-pool_tmeta
> dmsetup message vg-pool-tpool 0 reserve_metadata_snap  ---> Hangs

No, I was saying suspend the thin-pool (not its underlying metadata device).

But I checked with Joe and he said that the suspend is only needed to get consistent usage info (and all mappings, associated with outstanding IO, on disk).  The thin-pool suspend isn't required to avoid crashes that were reported via comment#2.

So there are 2 different things related to this BZ that need to be explored.
Comment 7 Gudge 2015-12-01 20:06:21 EST
(In reply to Mike Snitzer from comment #6)
> No, I was saying suspend the thin-pool (not its underlying metadata device).
> 

Side question:

If one suspends the thin pool then would not it effect the I/O's on the logical volumes which are part of the thin pool. 

Will they wait for the thin pool to be resume or the I/O's will go through.
Comment 8 Joe Thornber 2015-12-04 09:10:47 EST
Addressing the original issue here rather than Kabi's crash (which I haven't reproduced yet):

If the pool is active the kernel can be updating the metadata at any time.  This means userland tools cannot expect a consistent view of the metadata from an active pool (think of btree nodes being updated by the kernel at the same time as thin_dump is reading them).

There is a facility to get around this called metadata snapshots.  Basically the pool is told, via a dmsetup message to take a snapshot of the metadata.  You then pass the -m switch to thin_dump (I recommend not giving the snap location, thin_dump will pick up the snap location itself).  Make sure you drop the metadata snap once you've finished examining the metadata since it causes a small performance penalty for thin IO.

I've written an example test for you that takes you through this process:

https://github.com/jthornber/device-mapper-test-suite/commit/d46300f61bae34f242fe63a7a77ccb343d86c1d5

Note the results of thin_dump reflect the metadata as it was when the 'reserve_metadata_snap' message was sent to the pool.

The thin devices and pools do not need to be suspended before taking the metadata snap.
Comment 9 Joe Thornber 2015-12-04 09:45:36 EST
Kabi's bug reproduced with this test:

https://github.com/jthornber/device-mapper-test-suite/commit/3e2c2df1a79eda6cb640e2fd5466b708df3d56c3
Comment 12 Bruno Goncalves 2015-12-11 09:46:03 EST
*** Bug 1290824 has been marked as a duplicate of this bug. ***
Comment 14 Gudge 2016-02-16 11:37 EST
Created attachment 1127630 [details]
Patch on Kernel 3.10.0-327.4.5

Hi,
Could someone please review the patch and let me know if it is correct. 
I have ported the patches mentioned on Comment 10 to 3.10.0-327.4.5

Thanks
Comment 15 Gudge 2016-02-16 12:18:40 EST
(In reply to Gudge from comment #14)
> Created attachment 1127630 [details]
> Patch on Kernel 3.10.0-327.4.5
> 
> Hi,
> Could someone please review the patch and let me know if it is correct. 
> I have ported the patches mentioned on Comment 10 to 3.10.0-327.4.5
> 
> Thanks

The patch does fix the issue reported in Comment 2.

I do still see metadata corruption while taking a snap and while writing to thin volume in parallel (on patched 3.10.0-327.4.5).

I do not have a small reproducible test case.

Thanks
Comment 16 Rafael Aquini 2016-03-21 07:20:15 EDT
Patch(es) available on kernel-3.10.0-366.el7
Comment 19 Bruno Goncalves 2016-05-19 10:54:18 EDT
The patch worked well.
Tested on RHEL-7.2 with kernel 3.10.0-366.el7


# dmtest run --profile mytest --suite thin-provisioning -t /ToolsTests/
ToolsTests
  metadata_snap_stress1...PASS
  metadata_snap_stress2...iteration 0
iteration 1
iteration 2
iteration 3
iteration 4
iteration 5
iteration 6
iteration 7
iteration 8
iteration 9
PASS
<snip>
Comment 22 errata-xmlrpc 2016-11-03 10:25:32 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2574.html

Note You need to log in before you can comment on or make changes to this bug.