Bug 1902515

Summary: Calling release_metadata_snap causes device-mapper: space map common: ref count insert failed
Product: Red Hat Enterprise Linux 7 Reporter: bugzilla
Component: lvm2Assignee: Joe Thornber <thornber>
lvm2 sub component: Thin Provisioning QA Contact: cluster-qe <cluster-qe>
Status: CLOSED WONTFIX Docs Contact:
Severity: unspecified    
Priority: unspecified CC: agk, cswanson, heinzm, jbrassow, msnitzer, prajnoha, thornber, zkabelac
Version: 7.8   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-29 07:27:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description bugzilla 2020-11-29 21:08:40 UTC
Description of problem:

A few days ago I ran a reserve_metadata_snap to do some large metadata dumps. During that time the metadata usage went from 85% to 98% so there was a large amount of new meta. When I ran release_metadata_snap the kernel started spewing errors, so many that I have organized them into counts to represent which ones are most common:

~]# dmesg | cut -c17-|sort |uniq -c | sort -n
     30 device-mapper: space map metadata: unable to allocate new metadata block
   1639 device-mapper: space map common: dm_tm_shadow_block() failed
  12793 device-mapper: space map common: ref count insert failed


Version-Release number of selected component (if applicable):

We are running Linux 4.19.93 on el7.8.  If this is a known issue that has been fixed and a RH kernel or upstream patch has not hit 4.19, then please link to the patch so we can apply it.

How reproducible:

Not sure.

Steps to Reproduce:
1. dmsetup message /dev/mapper/data-data--pool-tpool 0 reserve_metadata_snap
2. make a bunch of snapshots while doing parallel IO on multiple snapshots
3. wait a while
4. dmsetup message /dev/mapper/data-data--pool-tpool 0 release_metadata_snap

Actual results:

Lots of kernel messages, possibly metadata corruption.

Expected results:

Should just release the metadata snap


Additional info:

This document indicates: Documentation/device-mapper/thin-provisioning.txt
  Snapshots are created with another message to the pool.
  N.B.  If the origin device that you wish to snapshot is active, you
  must suspend it before creating the snapshot to avoid corruption.
  This is NOT enforced at the moment, so please be careful!

However, it does not mention that reserve/release_metadata_snap should be called while suspended.

This bug looks related, and supposes that metadata reserve/release might be invoked while suspending the pool, "But [Mike Snitzer] checked with Joe and he said that the suspend is only needed to get consistent usage info (and all mappings, associated with outstanding IO, on disk).  The thin-pool suspend isn't required to avoid crashes that were reported"
  https://bugzilla.redhat.com/show_bug.cgi?id=1286500

If reserve/release does need to operate while suspended then perhaps it would be a good idea to update documentation and provide a printk error when calling reserve/release so the user can be informed that suspend is necessary.

Comment 6 RHEL Program Management 2022-05-29 07:27:22 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.