Bug 1511127

Summary:	scary messages when creating multiple vdo devices
Product:	Red Hat Enterprise Linux 7	Reporter:	Corey Marthaler <cmarthal>
Component:	kmod-kvdo	Assignee:	Thomas Jaskiewicz <tjaskiew>
Status:	CLOSED ERRATA	QA Contact:	Jakub Krysl <jkrysl>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	7.5	CC:	awalsh, bgurney, jkrysl, limershe, tjaskiew, vdo-internal
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	6.1.0.168	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-10-30 09:38:49 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Corey Marthaler 2017-11-08 18:06:26 UTC

Description of problem:

Is there a real corruption problem here? If not customers aren't going to want to see "error" and "corrupt" messages.

vdo create --name vdo1 --device /dev/mapper/mpathb
[...]
vdo create --name vdo7 --device /dev/mapper/mpathh
Creating VDO vdo7
Starting VDO vdo7
Starting compression on VDO vdo7
VDO instance 6 volume is ready at /dev/mapper/vdo7

[root@harding-02 ~]# vdo list
vdo3
vdo2
vdo1
vdo7
vdo6
vdo5
vdo4

[root@harding-02 ~]# grep -i corrupt /var/log/messages
Nov  8 12:00:39 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:39 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:39 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:51 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:51 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:51 harding-02 kernel: : UDS Error: Corrupt saved component (1030)

[root@harding-02 ~]# grep -i error /var/log/messages
Nov  8 12:00:38 harding-02 UDS/vdoformat[4572]: ERROR  (vdoformat/4572) loadVolumeGeometry ID mismatch, expected 5, got 268482810: VDO Status: Component id mismatch in decoder (2059)
Nov  8 12:00:38 harding-02 UDS/vdoformat[4572]: ERROR  (vdoformat/4572) decodeSuperBlock ID mismatch, expected 0, got 268482810: VDO Status: Component id mismatch in decoder (2059)
Nov  8 12:00:39 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:39 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:39 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:49 harding-02 UDS/vdoformat[4624]: ERROR  (vdoformat/4624) loadVolumeGeometry ID mismatch, expected 5, got 268482810: VDO Status: Component id mismatch in decoder (2059)
Nov  8 12:00:49 harding-02 UDS/vdoformat[4624]: ERROR  (vdoformat/4624) decodeSuperBlock ID mismatch, expected 0, got 268482810: VDO Status: Component id mismatch in decoder (2059)
Nov  8 12:00:51 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:51 harding-02 kernel: : UDS Error: Corrupt saved component (1030)
Nov  8 12:00:51 harding-02 kernel: : UDS Error: Corrupt saved component (1030)


Version-Release number of selected component (if applicable):
3.10.0-772.el7.x86_64

lvm2-2.02.176-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
lvm2-libs-2.02.176-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
lvm2-cluster-2.02.176-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
lvm2-lockd-2.02.176-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
lvm2-python-boom-0.8-2.el7    BUILT: Fri Nov  3 07:48:54 CDT 2017
cmirror-2.02.176-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
device-mapper-1.02.145-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
device-mapper-libs-1.02.145-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
device-mapper-event-1.02.145-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
device-mapper-event-libs-1.02.145-2.el7    BUILT: Fri Nov  3 07:46:53 CDT 2017
device-mapper-persistent-data-0.7.3-2.el7    BUILT: Tue Oct 10 04:00:07 CDT 2017
sanlock-3.5.0-1.el7    BUILT: Wed Apr 26 09:37:30 CDT 2017
sanlock-lib-3.5.0-1.el7    BUILT: Wed Apr 26 09:37:30 CDT 2017
vdo-6.1.0.34-8    BUILT: Fri Nov  3 06:58:45 CDT 2017
kmod-kvdo-6.1.0.34-7.el7    BUILT: Fri Nov  3 06:44:06 CDT 2017


How reproducible:
Often

Comment 2 Bryan Gurney 2017-11-08 19:02:22 UTC

This is reproducible if you create just one VDO volume.  One example on a test VM, for the creation of "vdo2" on /dev/sdc (a 20 GB device, which I zeroed prior to the "vdo create" command):

Nov  8 13:54:03 rhel75test-20171023 UDS/vdoformat[13914]: ERROR  (vdoformat/13914) loadVolumeGeometry ID mismatch, expected 5, got 0: VDO Status: Component id mismatch in decoder (2059)
Nov  8 13:54:03 rhel75test-20171023 UDS/vdoformat[13914]: ERROR  (vdoformat/13914) decodeSuperBlock version mismatch, expected 12.0, got 0.0: VDO Status: Unsupported component version (2058)
Nov  8 13:54:04 rhel75test-20171023 kernel: kvdo19:dmsetup: starting device 'vdo2' device instantiation 19 (ti=ffff9c78c0e91040) write policy sync
Nov  8 13:54:04 rhel75test-20171023 kernel: kvdo19:dmsetup: underlying device, REQ_FLUSH: supported, REQ_FUA: not supported
Nov  8 13:54:04 rhel75test-20171023 kernel: kvdo19:dmsetup: zones: 1 logical, 1 physical, 1 hash; base threads: 5
Nov  8 13:54:04 rhel75test-20171023 kernel: kvdo19:dmsetup: Setting UDS index target state to online
Nov  8 13:54:04 rhel75test-20171023 kernel: kvdo19:dmsetup: device 'vdo2' started
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Initializing index session for index: dev=/dev/sdc offset=4096 size=2781704192
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Loading index from saved state and rebuilding if necessary.
Nov  8 13:54:04 rhel75test-20171023 kernel: kvdo19:dmsetup: resuming device 'vdo2'
Nov  8 13:54:04 rhel75test-20171023 kernel: kvdo19:dmsetup: device 'vdo2' resumed
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: bad region table magic number: UDS Error: Corrupt saved component (1030)
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: cannot read superblock header: UDS Error: Corrupt saved component (1030)
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Failed to initialize grid: UDS Error: Corrupt saved component (1030)
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Closed index session (0, ffff8c95f7557960)
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Initializing index session for index: dev=/dev/sdc offset=4096 size=2781704192
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Creating new index.
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Using 2 indexing zones for concurrency.
Nov  8 13:54:04 rhel75test-20171023 kernel: kvdo19:packerQ: compression is enabled
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Created index session (34)
Nov  8 13:54:04 rhel75test-20171023 kernel: uds: kvdo19:dedupeQ: Opened block context (25)

The VDO volume was successfully created.

Comment 3 Andy Walsh 2017-11-08 19:22:13 UTC

I believe this is the by product of fixing rhbz#1507996.  I believe the fix was to try and read a geometry block and if you can't (resulting in these errors above), then you're safe to create a new VDO volume.

It will need to be investigated on whether we can squelch these messages in that situation.

Comment 4 Bryan Gurney 2017-11-08 19:32:48 UTC

If you then try to create a VDO volume after having removed the previous one, you see a remnant of what's covered in BZ 1510558, where vdoformat leaves the previous metadata alone:

Nov  8 14:08:08 rhel75test-20171023 kernel: kvdo20:dmsetup: starting device 'vdo2' device instantiation 20 (ti=ffff9c78c0a7a040) write policy sync
Nov  8 14:08:08 rhel75test-20171023 kernel: kvdo20:dmsetup: underlying device, REQ_FLUSH: supported, REQ_FUA: not supported
Nov  8 14:08:08 rhel75test-20171023 kernel: kvdo20:dmsetup: zones: 1 logical, 1 physical, 1 hash; base threads: 5
Nov  8 14:08:08 rhel75test-20171023 kernel: kvdo20:dmsetup: Setting UDS index target state to online
Nov  8 14:08:08 rhel75test-20171023 kernel: kvdo20:dmsetup: device 'vdo2' started
Nov  8 14:08:08 rhel75test-20171023 kernel: uds: kvdo20:dedupeQ: Initializing index session for index: dev=/dev/sdc offset=4096 size=2781704192
Nov  8 14:08:08 rhel75test-20171023 kernel: uds: kvdo20:dedupeQ: Loading index from saved state and rebuilding if necessary.
Nov  8 14:08:08 rhel75test-20171023 kernel: kvdo20:dmsetup: resuming device 'vdo2'
Nov  8 14:08:08 rhel75test-20171023 kernel: kvdo20:dmsetup: device 'vdo2' resumed
Nov  8 14:08:08 rhel75test-20171023 kernel: uds: kvdo20:dedupeQ: Using 2 indexing zones for concurrency.
Nov  8 14:08:08 rhel75test-20171023 kernel: kvdo20:packerQ: compression is enabled
Nov  8 14:08:09 rhel75test-20171023 kernel: uds: kvdo20:dedupeQ: read index page map, last update 0
Nov  8 14:08:09 rhel75test-20171023 kernel: uds: kvdo20:dedupeQ: index_0: loaded index from chapter 0 through chapter 0
Nov  8 14:08:09 rhel75test-20171023 kernel: uds: kvdo20:dedupeQ: Created index session (35)
Nov  8 14:08:09 rhel75test-20171023 kernel: uds: kvdo20:dedupeQ: Opened block context (26)

Comment 5 Thomas Jaskiewicz 2017-11-10 05:11:17 UTC

There is indeed no corruption in this case.  The first time we setup a new index we will try to load the index, notice that it has not been initialized yet, and then proceed to create a new index.   The "corrupt saved component" message is being logged 3 times when this happens.

We will change this to a single message, and "Corrupt saved component (1030)" will be replaced by "No index found (1061)".

Comment 6 Thomas Jaskiewicz 2017-11-10 05:11:29 UTC

There is indeed no corruption in this case.  The first time we setup a new index we will try to load the index, notice that it has not been initialized yet, and then proceed to create a new index.   The "corrupt saved component" message is being logged 3 times when this happens.

We will change this to a single message, and "Corrupt saved component (1030)" will be replaced by "No index found (1061)".

Comment 8 Andy Walsh 2017-11-17 22:41:49 UTC

The QA contact got reset to vdo-qe instead of jkrysl.  Jakub, can you please take assignment and re-ack the qa_ack?

Thank You,
-Andy

Comment 9 Andy Walsh 2017-11-17 22:43:29 UTC

The pm_ack got reset as a result of moving the component from vdo to kmod-kvdo.  Louis, can you please re-ack the pm_ack?

Thank You,
-Andy

Comment 12 Jakub Krysl 2017-11-21 11:47:41 UTC

kmod-kvdo-6.1.0.55-9.el7

[ 1207.270696] uds: loading out-of-tree module taints kernel. 
[ 1207.278022] uds: modprobe: loaded version 6.1.0.31 
[ 1207.284636] kvdo: modprobe: loaded version 6.1.0.55 
[ 1211.534732] kvdo0:dmsetup: starting device 'vdo' device instantiation 0 (ti=ffffa71806aaa040) write policy sync 
[ 1211.544928] kvdo0:dmsetup: zones: 1 logical, 1 physical, 1 hash; base threads: 5 
[ 1211.852678] kvdo0:dmsetup: Setting UDS index target state to online 
[ 1211.859003] kvdo0:dmsetup: uds: kvdo0:dedupeQ: loading or rebuilding index: dev=/dev/sdc offset=4096 size=2781704192 
[ 1211.859269] uds: kvdo0:dedupeQ: Failed loading or rebuilding index: UDS Error: No index found (1061) 
uds: kvdo0:dedupeQ: creating index: dev=/dev/sdc offset=4096 size=2781704192 
uds: kvdo0:dedupeQ: Using 12 indexing zones for concurrency. 
 
[ 1211.891255] device 'vdo' started 
[ 1211.894816] kvdo0:dmsetup: resuming device 'vdo' 
[ 1211.899486] kvdo0:dmsetup: device 'vdo' resumed 
[ 1211.920910] kvdo0:packerQ: compression is enabled 


# grep -i corrupt /var/log/messages
# grep -i error /var/log/messages
Nov 21 11:26:35 storageqe-24 UDS/vdoformat[14044]: ERROR  (vdoformat/14044) release version 0 requires an upgrade: VDO Status: Unsupported component version (2058)
Nov 21 11:26:38 storageqe-24 UDS/vdoformat[14044]: ERROR  (vdoformat/14044) decodeSuperBlock ID mismatch, expected 0, got 5: VDO Status: Component id mismatch in decoder (2059)
Nov 21 11:26:40 storageqe-24 kernel: : UDS Error: No index found (1061)

So there is no more corruption message, but each VDO creation produces 3 error messages in this version. AFAIK they all are product of missing UDS index, which is expected in this situation. So is there a way to get rid of these too?

Comment 13 Andy Walsh 2017-11-21 14:04:06 UTC

Clearing the 'vdo-internal' needinfo, and adding it for Tom (the new assignee)

Tom, can you please review and respond with your thoughts?

Comment 14 Thomas Jaskiewicz 2017-11-27 21:42:05 UTC

The 2 vdoFormat ERROR messages are produced by a different component than the kernel UDS Error message.  These will be hard to deal with.

But the 2 vdoFormat messages should be condensed into a single message.  Maybe even both of them can be eliminated.  I'll ask sweettea.

Comment 15 Jakub Krysl 2017-11-28 08:31:01 UTC

Thanks Tom, returning to be further fixed.

Comment 18 sclafani 2018-03-29 20:18:24 UTC

Everyone who sees even the reduced messages seems to get scared/sidetracked by them, so it's probably worth fixing this sooner than later.

Comment 19 sclafani 2018-05-01 23:42:29 UTC

None of these should be logged anymore when starting up a fresh VDO.

Comment 21 Jakub Krysl 2018-07-04 09:22:20 UTC

Tested on:
RHEL-7.6-20180626.0
kernel-3.10.0-915.el7
kmod-vdo-6.1.1.99-1.el7
vdo-6.1.1.99-2.el7

# vdo create --name vdo --device /dev/sdb --verbose
Creating VDO vdo
    grep MemAvailable /proc/meminfo
    pvcreate -qq --test /dev/sdb
    modprobe kvdo
    vdoformat --uds-checkpoint-frequency=0 --uds-memory-size=0.25 /dev/disk/by-id/scsi-360fff19abdd9b56dfb9bf59625f4c9f7
    vdodumpconfig /dev/disk/by-id/scsi-360fff19abdd9b56dfb9bf59625f4c9f7
Starting VDO vdo
    dmsetup status vdo
    grep MemAvailable /proc/meminfo
    modprobe kvdo
    vdodumpconfig /dev/disk/by-id/scsi-360fff19abdd9b56dfb9bf59625f4c9f7
    dmsetup create vdo --uuid VDO-0e71c26d-c718-46a7-8ebd-56c5b3237fc3 --table '0 4278400280 vdo /dev/disk/by-id/scsi-360fff19abdd9b56dfb9bf59625f4c9f7 4096 disabled 0 32768 16380 on auto vdo ack=1,bio=4,bioRotationInterval=64,cpu=2,hash=1,logical=1,physical=1'
    dmsetup status vdo
Starting compression on VDO vdo
    dmsetup message vdo 0 compression on
    dmsetup status vdo
VDO instance 18 volume is ready at /dev/mapper/vdo

/var/log/messages:
[159630.984103] kvdo18:dmsetup: starting device 'vdo'
[159631.007087] kvdo18:dmsetup: underlying device, REQ_FLUSH: not supported, REQ_FUA: not supported
[159631.050807] kvdo18:dmsetup: Using write policy sync automatically.
[159631.081297] kvdo18:dmsetup: zones: 1 logical, 1 physical, 1 hash; base threads: 5
[159631.295523] kvdo18:journalQ: VDO commencing normal operation
[159631.322357] kvdo18:dmsetup: uds: kvdo18:dedupeQ: creating index: dev=/dev/disk/by-id/scsi-360fff19abdd9b56dfb9bf59625f4c9f7 offset=4096 size=2781704192
uds: kvdo18:dedupeQ: Using 6 indexing zones for concurrency.

[159631.411527] Setting UDS index target state to online
[159631.436282] kvdo18:dmsetup: device 'vdo' started
[159631.458530] kvdo18:dmsetup: resuming device 'vdo'
[159631.481002] kvdo18:dmsetup: device 'vdo' resumed
[159631.524222] kvdo18:packerQ: compression is enabled

The messages are the same when creating multiple VDOs.

Comment 23 errata-xmlrpc 2018-10-30 09:38:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3094