1511127 – scary messages when creating multiple vdo devices

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1511127 - scary messages when creating multiple vdo devices

Summary: scary messages when creating multiple vdo devices

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	kmod-kvdo
Sub Component:
Version:	7.5
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Thomas Jaskiewicz
QA Contact:	Jakub Krysl
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-11-08 18:06 UTC by Corey Marthaler
Modified:	2021-09-03 12:05 UTC (History)
CC List:	6 users (show)
Fixed In Version:	6.1.0.168
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-10-30 09:38:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:3094	0	None	None	None	2018-10-30 09:39:26 UTC

Description Corey Marthaler 2017-11-08 18:06:26 UTC

Description of problem:

Is there a real corruption problem here? If not customers aren't going to want to see "error" and "corrupt" messages.

vdo create --name vdo1 --device /dev/mapper/mpathb
[...]
vdo create --name vdo7 --device /dev/mapper/mpathh
Creating VDO vdo7
Starting VDO vdo7
Starting compression on VDO vdo7
VDO instance 6 volume is ready at /dev/mapper/vdo7

[root@harding-02 ~]# vdo list
vdo3
vdo2
vdo1
vdo7
vdo6
vdo5
vdo4

[root@harding-02 ~]# grep -i corrupt /var/log/messages
Nov  8 12:00:39 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:39 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:39 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:51 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:51 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:51 harding-02 kernel: : UDS Error: Corrupt saved component (1030)

[root@harding-02 ~]# grep -i error /var/log/messages
Nov  8 12:00:38 harding-02 UDS/vdoformat[4572]: ERROR  (vdoformat/4572) loadVolumeGeometry ID mismatch, expected 5, got 268482810: VDO Status: Component id mismatch in decoder (2059)
Nov  8 12:00:38 harding-02 UDS/vdoformat[4572]: ERROR  (vdoformat/4572) decodeSuperBlock ID mismatch, expected 0, got 268482810: VDO Status: Component id mismatch in decoder (2059)
Nov  8 12:00:39 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:39 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:39 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:49 harding-02 UDS/vdoformat[4624]: ERROR  (vdoformat/4624) loadVolumeGeometry ID mismatch, expected 5, got 268482810: VDO Status: Component id mismatch in decoder (2059)
Nov  8 12:00:49 harding-02 UDS/vdoformat[4624]: ERROR  (vdoformat/4624) decodeSuperBlock ID mismatch, expected 0, got 268482810: VDO Status: Component id mismatch in decoder (2059)
Nov  8 12:00:51 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:51 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:51 harding-02 kernel: : UDS Error: Corrupt saved component (1030)


Version-Release number of selected component (if applicable):
3.10.0-772.el7.x86_64

lvm2-2.02.176-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
lvm2-libs-2.02.176-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
lvm2-cluster-2.02.176-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
lvm2-lockd-2.02.176-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
lvm2-python-boom-0.8-2.el7    BUILT: Fri Nov  3 07:48:54 CDT 2017
cmirror-2.02.176-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
device-mapper-1.02.145-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
device-mapper-libs-1.02.145-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
device-mapper-event-1.02.145-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
device-mapper-event-libs-1.02.145-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
device-mapper-persistent-data-0.7.3-2.el7    BUILT: Tue Oct 10 04:00:07 CDT 2017
sanlock-3.5.0-1.el7    BUILT: Wed Apr 26 09:37:30 CDT 2017
sanlock-lib-3.5.0-1.el7    BUILT: Wed Apr 26 09:37:30 CDT 2017
vdo-6.1.0.34-8    BUILT: Fri Nov  3 06:58:45 CDT 2017
kmod-kvdo-6.1.0.34-7.el7    BUILT: Fri Nov  3 06:44:06 CDT 2017


How reproducible:
Often

Comment 2 Bryan Gurney 2017-11-08 19:02:22 UTC

This is reproducible if you create just one VDO volume.  One example on a test VM, for the creation of "vdo2" on /dev/sdc (a 20 GB device, which I zeroed prior to the "vdo create" command):

Nov  8 13:54:03 rhel75test-20171023 UDS/vdoformat[13914]: ERROR  (vdoformat/13914) loadVolumeGeometry ID mismatch, expected 5, got 0: VDO Status: Component id mismatch in decoder (2059)
Nov  8 13:54:03 rhel75test-20171023 UDS/vdoformat[13914]: ERROR  (vdoformat/13914) decodeSuperBlock version mismatch, expected 12.0, got 0.0: VDO Status: Unsupported component version (2058)
Nov  8 13:54:04 rhel75test-20171023 kernel: kvdo19:dmsetup: starting device 'vdo2' device instantiation 19 (ti=ffff9c78c0e91040) write policy sync
Nov  8 13:54:04 rhel75test-20171023 kernel: kvdo19:dmsetup: underlying device, REQ_FLUSH: supported, REQ_FUA: not supported
Nov  8 13:54:04 rhel75test-20171023 kernel: kvdo19:dmsetup: zones: 1 logical, 1 physical, 1 hash; base threads: 5
Nov  8 13:54:04 rhel75test-20171023 kernel: kvdo19:dmsetup: Setting UDS index target state to online
Nov  8 13:54:04 rhel75test-20171023 kernel: kvdo19:dmsetup: device 'vdo2' started
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Initializing index session for index: dev=/dev/sdc offset=4096 size=2781704192
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Loading index from saved state and rebuilding if necessary.
Nov  8 13:54:04 rhel75test-20171023 kernel: kvdo19:dmsetup: resuming device 'vdo2'
Nov  8 13:54:04 rhel75test-20171023 kernel: kvdo19:dmsetup: device 'vdo2' resumed
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: bad region table magic number: UDS Error: Corrupt saved component (1030)
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: cannot read superblock header: UDS Error: Corrupt saved component (1030)
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Failed to initialize grid: UDS Error: Corrupt saved component (1030)
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Closed index session (0, ffff8c95f7557960)
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Initializing index session for index: dev=/dev/sdc offset=4096 size=2781704192
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Creating new index.
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Using 2 indexing zones for concurrency.
Nov  8 13:54:04 rhel75test-20171023 kernel: kvdo19:packerQ: compression is enabled
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Created index session (34)
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Opened block context (25)

The VDO volume was successfully created.

Comment 3 Andy Walsh 2017-11-08 19:22:13 UTC

I believe this is the by product of fixing rhbz#1507996.  I believe the fix was to try and read a geometry block and if you can't (resulting in these errors above), then you're safe to create a new VDO volume.

It will need to be investigated on whether we can squelch these messages in that situation.

Comment 4 Bryan Gurney 2017-11-08 19:32:48 UTC

If you then try to create a VDO volume after having removed the previous one, you see a remnant of what's covered in BZ 1510558, where vdoformat leaves the previous metadata alone:

Nov  8 14:08:08 rhel75test-20171023 kernel: kvdo20:dmsetup: starting device 'vdo2' device instantiation 20 (ti=ffff9c78c0a7a040) write policy sync
Nov  8 14:08:08 rhel75test-20171023 kernel: kvdo20:dmsetup: underlying device, REQ_FLUSH: supported, REQ_FUA: not supported
Nov  8 14:08:08 rhel75test-20171023 kernel: kvdo20:dmsetup: zones: 1 logical, 1 physical, 1 hash; base threads: 5
Nov  8 14:08:08 rhel75test-20171023 kernel: kvdo20:dmsetup: Setting UDS index target state to online
Nov  8 14:08:08 rhel75test-20171023 kernel: kvdo20:dmsetup: device 'vdo2' started
Nov  8 14:08:08 rhel75test-20171023 kernel: uds: kvdo20:dedupeQ: Initializing index session for index: dev=/dev/sdc offset=4096 size=2781704192
Nov  8 14:08:08 rhel75test-20171023 kernel: uds: kvdo20:dedupeQ: Loading index from saved state and rebuilding if necessary.
Nov  8 14:08:08 rhel75test-20171023 kernel: kvdo20:dmsetup: resuming device 'vdo2'
Nov  8 14:08:08 rhel75test-20171023 kernel: kvdo20:dmsetup: device 'vdo2' resumed
Nov  8 14:08:08 rhel75test-20171023 kernel: uds: kvdo20:dedupeQ: Using 2 indexing zones for concurrency.
Nov  8 14:08:08 rhel75test-20171023 kernel: kvdo20:packerQ: compression is enabled
Nov  8 14:08:09 rhel75test-20171023 kernel: uds: kvdo20:dedupeQ: read index page map, last update 0
Nov  8 14:08:09 rhel75test-20171023 kernel: uds: kvdo20:dedupeQ: index_0: loaded index from chapter 0 through chapter 0
Nov  8 14:08:09 rhel75test-20171023 kernel: uds: kvdo20:dedupeQ: Created index session (35)
Nov  8 14:08:09 rhel75test-20171023 kernel: uds: kvdo20:dedupeQ: Opened block context (26)

Comment 5 Thomas Jaskiewicz 2017-11-10 05:11:17 UTC

There is indeed no corruption in this case.  The first time we setup a new index we will try to load the index, notice that it has not been initialized yet, and then proceed to create a new index.   The "corrupt saved component" message is being logged 3 times when this happens.

We will change this to a single message, and "Corrupt saved component (1030)" will be replaced by "No index found (1061)".

Comment 6 Thomas Jaskiewicz 2017-11-10 05:11:29 UTC

There is indeed no corruption in this case.  The first time we setup a new index we will try to load the index, notice that it has not been initialized yet, and then proceed to create a new index.   The "corrupt saved component" message is being logged 3 times when this happens.

We will change this to a single message, and "Corrupt saved component (1030)" will be replaced by "No index found (1061)".

Comment 8 Andy Walsh 2017-11-17 22:41:49 UTC

The QA contact got reset to vdo-qe instead of jkrysl.  Jakub, can you please take assignment and re-ack the qa_ack?

Thank You,
-Andy

Comment 9 Andy Walsh 2017-11-17 22:43:29 UTC

The pm_ack got reset as a result of moving the component from vdo to kmod-kvdo.  Louis, can you please re-ack the pm_ack?

Thank You,
-Andy

Comment 12 Jakub Krysl 2017-11-21 11:47:41 UTC

kmod-kvdo-6.1.0.55-9.el7

[ 1207.270696] uds: loading out-of-tree module taints kernel. 
[ 1207.278022] uds: modprobe: loaded version 6.1.0.31 
[ 1207.284636] kvdo: modprobe: loaded version 6.1.0.55 
[ 1211.534732] kvdo0:dmsetup: starting device 'vdo' device instantiation 0 (ti=ffffa71806aaa040) write policy sync 
[ 1211.544928] kvdo0:dmsetup: zones: 1 logical, 1 physical, 1 hash; base threads: 5 
[ 1211.852678] kvdo0:dmsetup: Setting UDS index target state to online 
[ 1211.859003] kvdo0:dmsetup: uds: kvdo0:dedupeQ: loading or rebuilding index: dev=/dev/sdc offset=4096 size=2781704192 
[ 1211.859269] uds: kvdo0:dedupeQ: Failed loading or rebuilding index: UDS Error: No index found (1061) 
uds: kvdo0:dedupeQ: creating index: dev=/dev/sdc offset=4096 size=2781704192 
uds: kvdo0:dedupeQ: Using 12 indexing zones for concurrency. 
 
[ 1211.891255] device 'vdo' started 
[ 1211.894816] kvdo0:dmsetup: resuming device 'vdo' 
[ 1211.899486] kvdo0:dmsetup: device 'vdo' resumed 
[ 1211.920910] kvdo0:packerQ: compression is enabled 


# grep -i corrupt /var/log/messages
# grep -i error /var/log/messages
Nov 21 11:26:35 storageqe-24 UDS/vdoformat[14044]: ERROR  (vdoformat/14044) release version 0 requires an upgrade: VDO Status: Unsupported component version (2058)
Nov 21 11:26:38 storageqe-24 UDS/vdoformat[14044]: ERROR  (vdoformat/14044) decodeSuperBlock ID mismatch, expected 0, got 5: VDO Status: Component id mismatch in decoder (2059)
Nov 21 11:26:40 storageqe-24 kernel: : UDS Error: No index found (1061)

So there is no more corruption message, but each VDO creation produces 3 error messages in this version. AFAIK they all are product of missing UDS index, which is expected in this situation. So is there a way to get rid of these too?

Comment 13 Andy Walsh 2017-11-21 14:04:06 UTC

Clearing the 'vdo-internal' needinfo, and adding it for Tom (the new assignee)

Tom, can you please review and respond with your thoughts?

Comment 14 Thomas Jaskiewicz 2017-11-27 21:42:05 UTC

The 2 vdoFormat ERROR messages are produced by a different component than the kernel UDS Error message.  These will be hard to deal with.

But the 2 vdoFormat messages should be condensed into a single message.  Maybe even both of them can be eliminated.  I'll ask sweettea.

Comment 15 Jakub Krysl 2017-11-28 08:31:01 UTC

Thanks Tom, returning to be further fixed.

Comment 18 sclafani 2018-03-29 20:18:24 UTC

Everyone who sees even the reduced messages seems to get scared/sidetracked by them, so it's probably worth fixing this sooner than later.

Comment 19 sclafani 2018-05-01 23:42:29 UTC

None of these should be logged anymore when starting up a fresh VDO.

Comment 21 Jakub Krysl 2018-07-04 09:22:20 UTC

Tested on:
RHEL-7.6-20180626.0
kernel-3.10.0-915.el7
kmod-vdo-6.1.1.99-1.el7
vdo-6.1.1.99-2.el7

# vdo create --name vdo --device /dev/sdb --verbose
Creating VDO vdo
    grep MemAvailable /proc/meminfo
    pvcreate -qq --test /dev/sdb
    modprobe kvdo
    vdoformat --uds-checkpoint-frequency=0 --uds-memory-size=0.25 /dev/disk/by-id/scsi-360fff19abdd9b56dfb9bf59625f4c9f7
    vdodumpconfig /dev/disk/by-id/scsi-360fff19abdd9b56dfb9bf59625f4c9f7
Starting VDO vdo
    dmsetup status vdo
    grep MemAvailable /proc/meminfo
    modprobe kvdo
    vdodumpconfig /dev/disk/by-id/scsi-360fff19abdd9b56dfb9bf59625f4c9f7
    dmsetup create vdo --uuid VDO-0e71c26d-c718-46a7-8ebd-56c5b3237fc3 --table '0 4278400280 vdo /dev/disk/by-id/scsi-360fff19abdd9b56dfb9bf59625f4c9f7 4096 disabled 0 32768 16380 on auto vdo ack=1,bio=4,bioRotationInterval=64,cpu=2,hash=1,logical=1,physical=1'
    dmsetup status vdo
Starting compression on VDO vdo
    dmsetup message vdo 0 compression on
    dmsetup status vdo
VDO instance 18 volume is ready at /dev/mapper/vdo

/var/log/messages:
[159630.984103] kvdo18:dmsetup: starting device 'vdo'
[159631.007087] kvdo18:dmsetup: underlying device, REQ_FLUSH: not supported, REQ_FUA: not supported
[159631.050807] kvdo18:dmsetup: Using write policy sync automatically.
[159631.081297] kvdo18:dmsetup: zones: 1 logical, 1 physical, 1 hash; base threads: 5
[159631.295523] kvdo18:journalQ: VDO commencing normal operation
[159631.322357] kvdo18:dmsetup: uds: kvdo18:dedupeQ: creating index: dev=/dev/disk/by-id/scsi-360fff19abdd9b56dfb9bf59625f4c9f7 offset=4096 size=2781704192
uds: kvdo18:dedupeQ: Using 6 indexing zones for concurrency.

[159631.411527] Setting UDS index target state to online
[159631.436282] kvdo18:dmsetup: device 'vdo' started
[159631.458530] kvdo18:dmsetup: resuming device 'vdo'
[159631.481002] kvdo18:dmsetup: device 'vdo' resumed
[159631.524222] kvdo18:packerQ: compression is enabled

The messages are the same when creating multiple VDOs.

Comment 23 errata-xmlrpc 2018-10-30 09:38:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3094

Note You need to log in before you can comment on or make changes to this bug.