1639470 – Queued pool create message exists for existing volume

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1639470 - Queued pool create message exists for existing volume

Summary: Queued pool create message exists for existing volume

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	7.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Zdenek Kabelac
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-10-15 19:27 UTC by bugzilla
Modified:	2021-09-03 12:52 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-11-18 16:44:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description bugzilla 2018-10-15 19:27:37 UTC

Description of problem:

We tried to snapshot a volume and got an LVM error about a volume unrelated to the one being snapshotted.


Version-Release number of selected component (if applicable):

lvm2-2.02.177-4.el7.x86_64


How reproducible:

We've seen it twice, each on different systems at different customers, but we do not have a way to reproduce this.


Actual results:

Note that this attempts to create a snapshot for proxysql1_0, but the error is for mgmt.

[root@hv2 ~]# lvcreate -n 2018-10-15_14-46-34_manual_proxysql1_0 -s centos_hv2/proxysql1_0
  Can't create snapshot 2018-10-03_14-00-03_hourly_mgmt as origin mgmt is not suspended.
  Failed to suspend centos_hv2/pool0 with queued messages.



Expected results:

[root@hv2 ~]# lvcreate -n 2018-10-15_14-46-34_manual_proxysql1_0 -s centos_hv2/proxysql1_0
  Logical volume "2018-10-15_14-46-34_manual_proxysql1_0" created.



Additional info:

We were able to get the volume group back into working order with the following series of operations:

[root@hv2 ~]# lvremove /dev/centos_hv2/2018-10-03_14-00-03_hourly_mgmt
  Logical volume "2018-10-03_14-00-03_hourly_mgmt" successfully removed

[root@hv2 ~]# lvcreate -n 2018-10-15_14-46-34_manual_proxysql1_0 -s centos_hv2/proxysql1_0
  device-mapper: message ioctl on  (253:7) failed: File exists
  Failed to process thin pool message "create_snap 333 15".
  Failed to suspend thin snapshot origin centos_hv2/proxysql1_0.
  Internal error: Writing metadata in critical section.
  Releasing activation in critical section.
  libdevmapper exiting with 1 device(s) still suspended.

[root@hv2 ~]# dmsetup message centos_hv2-pool0-tpool 0 'delete 333'

[root@hv2 ~]# lvcreate -n 2018-10-15_14-46-34_manual_proxysql1_0 -s centos_hv2/proxysql1_0
  Attempted to decrement suspended device counter below zero.
  device-mapper: reload ioctl on  (253:25) failed: No data available
  Failed to activate thin 2018-10-15_14-46-34_manual_proxysql1_0.

[root@hv2 ~]# lvchange -an centos_hv2

[root@hv2 ~]# lvchange -ay centos_hv2
  Thin pool centos_hv2-pool0-tpool (253:7) transaction_id is 342, while expected 344.
  Thin pool centos_hv2-pool0-tpool (253:7) transaction_id is 342, while expected 344.
  Thin pool centos_hv2-pool0-tpool (253:7) transaction_id is 342, while expected 344.
  Thin pool centos_hv2-pool0-tpool (253:7) transaction_id is 342, while expected 344.
  Thin pool centos_hv2-pool0-tpool (253:7) transaction_id is 342, while expected 344.
  Thin pool centos_hv2-pool0-tpool (253:7) transaction_id is 342, while expected 344.
  Thin pool centos_hv2-pool0-tpool (253:7) transaction_id is 342, while expected 344.
  Thin pool centos_hv2-pool0-tpool (253:7) transaction_id is 342, while expected 344.
  Thin pool centos_hv2-pool0-tpool (253:7) transaction_id is 342, while expected 344.

[root@hv2 ~]# vgcfgbackup centos_hv2 --quiet --file pool0-cfg

[root@hv2 ~]# vim pool0-cfg # Modified pool transaction_id to 342

[root@hv2 ~]# vgcfgrestore -f pool0-cfg centos_hv2 --force


After this, we were able to snapshot without error.

After solving the problem, we looked at an old VG backup and noticed that the pool contains a create message for a volume that already exists. Is this normal, or is this the reason for the problem? What might cause the message to be queued for a volume that exists? Those sections are both pasted here:

pool0 {
        id = "MMEkjB-ddEk-y49D-LGXW-yRU1-Oz8Y-a3Itj0"
        status = ["READ", "WRITE", "VISIBLE"]
        flags = []
        creation_time = 1536186477      # 2018-09-05 18:27:57 -0400
        creation_host = "hv2-cm"
        segment_count = 1

        segment1 {
                start_extent = 0
                extent_count = 524288   # 2 Terabytes

                type = "thin-pool"
                metadata = "pool0_tmeta"
                pool = "pool0_tdata"
                transaction_id = 334
                chunk_size = 128        # 64 Kilobytes
                discards = "passdown"
                zero_new_blocks = 1

                message1 {
                        create = "2018-10-03_14-00-03_hourly_mgmt"
                }
        }
}

2018-10-03_14-00-03_hourly_mgmt {
        id = "iiv4D2-toU3-2aH1-6Q6N-Aq3y-RurL-UtaZ6l"
        status = ["READ", "WRITE", "VISIBLE"]
        flags = ["ACTIVATION_SKIP"]
        creation_time = 1538589603      # 2018-10-03 14:00:03 -0400
        creation_host = "hv2-cm"
        segment_count = 1

        segment1 {
                start_extent = 0
                extent_count = 4096     # 16 Gigabytes

                type = "thin"
                thin_pool = "pool0"
                transaction_id = 333
                device_id = 333
                origin = "mgmt"
        }
}

Comment 2 Zdenek Kabelac 2020-11-03 19:04:14 UTC

This logic has been improved with this patches backported to stable-2.02 branch:

https://www.redhat.com/archives/lvm-devel/2020-October/msg00090.html
https://www.redhat.com/archives/lvm-devel/2020-October/msg00091.html
https://www.redhat.com/archives/lvm-devel/2020-October/msg00092.html
https://www.redhat.com/archives/lvm-devel/2020-October/msg00096.html
https://www.redhat.com/archives/lvm-devel/2020-October/msg00094.html
https://www.redhat.com/archives/lvm-devel/2020-October/msg00097.html
https://www.redhat.com/archives/lvm-devel/2020-October/msg00098.html

Comment 3 Zdenek Kabelac 2020-11-18 16:44:58 UTC

Likely not going to reach RH7 - only upstream version will provide improved behavior.

Note You need to log in before you can comment on or make changes to this bug.