Bug 1535476 - VDO allows removal with backing device missing
Summary: VDO allows removal with backing device missing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: vdo
Version: 7.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Joseph Chapman
QA Contact: Jakub Krysl
URL:
Whiteboard:
Depends On:
Blocks: 1577173 1684248
TreeView+ depends on / blocked
 
Reported: 2018-01-17 13:34 UTC by Jakub Krysl
Modified: 2019-08-06 13:08 UTC (History)
8 users (show)

Fixed In Version: 6.1.2.35
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1684248 (view as bug list)
Environment:
Last Closed: 2019-08-06 13:08:04 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2233 None None None 2019-08-06 13:08:14 UTC

Description Jakub Krysl 2018-01-17 13:34:11 UTC
Description of problem:
When for some reason the backing iSCSI device is not reachable (accidental removal, lost connection etc.), VDO still tries to remove it leading to following calltrace in /var/log/messages:
( command 'vdo remove --name vdo --device /dev/sdm' )
[10856.609571] kvdo1:dmsetup: suspending device 'vdo_test'
[10856.661388] kvdo1:dmsetup: device 'vdo_test' suspended
[10856.666582] kvdo1:dmsetup: stopping device 'vdo_test'
[10856.673044] kvdo1:journalQ: Completing write VIO of type 10 for physical block 679128 with error: System error -5 (-5)
[10856.683823] kvdo1:dmsetup: stopKernelLayer: Close device failed -5 (System error -5: System error -5)
[10856.693075] kvdo1:dmsetup: uds: kvdo1:dedupeQ: index_0: beginning save (vcn 4294967295)
[10856.693770] uds: kvdo1:dedupeQ: save: cannot prepare index ris_prepareSave: Input/output error (5)
uds: kvdo1:dedupeQ: index_0: save failed
[10856.693773] uds: kvdo1:dedupeQ: index router save state problem: Input/output error (5)
uds: kvdo1:dedupeQ: ignoring error from saveIndexSession: Input/output error (5)
[10856.700763] kvdo1:dedupeQ: Error closing index dev=/dev/sdm offset=4096 size=2781704192: System error 5 (5)

[10856.738882] Setting UDS index target state to closed
[10856.773637] ------------[ cut here ]------------
[10856.778254] WARNING: CPU: 0 PID: 31634 at fs/sysfs/group.c:219 sysfs_remove_group+0x93/0xa0
[10856.786608] sysfs group ffffffff980e4100 not found for kobject 'target34:0:0'
[10856.793736] Modules linked in: kvdo(O) uds(O) macsec vsock_diag vsock sctp_diag sctp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi xfs nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi raid456 kvm_intel kvm async_raid6_recov async_memcpy async_pq irqbypass raid6_pq libcrc32c crc32_pclmul ghash_clmulni_intel aesni_intel async_xor lrw gf128mul xor iTCO_wdt async_tx glue_helper iTCO_vendor_support joydev ablk_helper cryptd ses enclosure mei_me sg mei lpc_ich i2c_i801 ioatdma pcspkr ipmi_ssif ipmi_si ipmi_devintf ipmi_msghandler shpchp wmi acpi_power_meter acpi_pad dm_multipath ip_tables ext4 mbcache jbd2 raid1 sd_mod crc_t10dif crct10dif_generic ast i2c_algo_bit drm_kms_helper ahci syscopyarea sysfillrect sysimgblt fb_sys_fops ttm libahci ixgbe drm libata mpt3sas crct10dif_pclmul crct10dif_common crc32c_intel mdio ptp i2c_core raid_class pps_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: uds]
[10856.894660] CPU: 0 PID: 31634 Comm: dmsetup Tainted: G        W  O   ------------   3.10.0-830.el7.x86_64 #1
[10856.904474] Hardware name: Supermicro SSG-6028R-OSD072/X10DRH-iT, BIOS 1.1 05/15/2015
[10856.912298] Call Trace:
[10856.914764]  [<ffffffff97af4dea>] dump_stack+0x19/0x1b
[10856.919900]  [<ffffffff97490008>] __warn+0xd8/0x100
[10856.924771]  [<ffffffff9749008f>] warn_slowpath_fmt+0x5f/0x80
[10856.930518]  [<ffffffff9769dc38>] ? kernfs_find_and_get_ns+0x48/0x60
[10856.936868]  [<ffffffff976a1923>] sysfs_remove_group+0x93/0xa0
[10856.942695]  [<ffffffff97874127>] dpm_sysfs_remove+0x57/0x60
[10856.948361]  [<ffffffff978674c5>] device_del+0x45/0x210
[10856.953579]  [<ffffffff9789e4c7>] scsi_target_reap_usercontext+0x27/0x40
[10856.960270]  [<ffffffff974afb57>] execute_in_process_context+0x67/0x70
[10856.966786]  [<ffffffff978a0188>] scsi_target_reap+0xc8/0xf0
[10856.972439]  [<ffffffff978a1ec6>] scsi_device_dev_release_usercontext+0xe6/0x130
[10856.979828]  [<ffffffff974afb57>] execute_in_process_context+0x67/0x70
[10856.986357]  [<ffffffff978a1ddc>] scsi_device_dev_release+0x1c/0x20
[10856.992634]  [<ffffffff97866a82>] device_release+0x32/0xa0
[10856.998116]  [<ffffffff9774880e>] kobject_release+0x7e/0x1b0
[10857.003767]  [<ffffffff977486a8>] kobject_put+0x28/0x60
[10857.008985]  [<ffffffff97866ce7>] put_device+0x17/0x20
[10857.014116]  [<ffffffff97892eab>] scsi_device_put+0x2b/0x30
[10857.019690]  [<ffffffffc02f3560>] scsi_disk_put+0x30/0x50 [sd_mod]
[10857.025869]  [<ffffffffc02f3782>] sd_release+0x42/0xc0 [sd_mod]
[10857.031805]  [<ffffffff9765709c>] __blkdev_put+0x16c/0x1a0
[10857.037282]  [<ffffffff97657a1c>] blkdev_put+0x4c/0x130
[10857.042504]  [<ffffffffc0071493>] dm_put_table_device+0x63/0xa0 [dm_mod]
[10857.049202]  [<ffffffffc00748d4>] dm_put_device+0x54/0xa0 [dm_mod]
[10857.055390]  [<ffffffffc0a25958>] vdoDtr+0x108/0x1b0 [kvdo]
[10857.060959]  [<ffffffff97545c80>] ? dyntick_save_progress_counter+0x30/0x30
[10857.067913]  [<ffffffffc0075670>] dm_table_destroy+0x70/0x120 [dm_mod]
[10857.074440]  [<ffffffffc00716f6>] __dm_destroy+0x136/0x230 [dm_mod]
[10857.080706]  [<ffffffffc0073b43>] dm_destroy+0x13/0x20 [dm_mod]
[10857.086636]  [<ffffffffc0079afe>] dev_remove+0x11e/0x1a0 [dm_mod]
[10857.092726]  [<ffffffffc007a99f>] ctl_ioctl+0x20f/0x4e0 [dm_mod]
[10857.098730]  [<ffffffffc00799e0>] ? dev_suspend+0x260/0x260 [dm_mod]
[10857.105095]  [<ffffffffc007ac7e>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
[10857.111098]  [<ffffffff9762ca9d>] do_vfs_ioctl+0x33d/0x540
[10857.116598]  [<ffffffff976d470f>] ? file_has_perm+0x9f/0xb0
[10857.122173]  [<ffffffff9762cd41>] SyS_ioctl+0xa1/0xc0
[10857.127217]  [<ffffffff97b07afd>] system_call_fastpath+0x16/0x1b
[10857.133213] ---[ end trace 377630da41905649 ]---
[10857.145381] kvdo1:dmsetup: device 'vdo_test' stopped


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
Easy way to reproduce this is to disconnect iSCSI LUN before VDO removal:
1. connect iscsi LUN
2. vdo create --name vdo --device /dev/iSCSI_LUN
3. disconnect iSCSI LUN: iscsiadm -m node -u
4. vdo remove --all --verbose

Actual results:
VDO successfully removed from local machine, no changes on LUN (superblock not cleaned)
calltrace in /var/log/messages

Expected results:
Possibly error saying the device is not reachable
? promt to remove VDO ?
no calltrace

Additional info:

Comment 2 Jakub Krysl 2018-01-17 13:38:04 UTC
Versions used:
vdo-6.1.0.114-14
kmod-kvdo-6.1.0.114-11
iscsi-initiator-utils-6.2.0.874-6

Comment 3 Jakub Krysl 2018-01-17 14:26:44 UTC
Also when said LUN is added later ('iscsiadm -m node -l') and removed again ('iscsiadm -m node -u)', same calltrace appears on removal. So the calltrace itself might be issue of iscsi-initiator-utils. These 2 steps can be repeated with the same result. I will wait with creating BZ for iscsi component before it is clear how this state happened...

Comment 4 Jakub Krysl 2018-01-17 15:20:31 UTC
BZ 1534568 appears to have almost the same calltrace "sysfs group ffffffffad4e4100 not found for kobject '3:0:0:0'".

Comment 5 Ewan D. Milne 2018-01-17 17:15:46 UTC
I expect you will find this was caused by the sysfs changes introduced in
kernel-3.10.0-828.el7.  Suggest you re-test with -827 and if the problem
does not appear, this would be a duplicate of bug 1534568.

Comment 6 Jakub Krysl 2018-01-17 17:45:42 UTC
With kernel .827 this is not reproducible, I am only getting this occasionally:
[  117.902936] blk_update_request: I/O error, dev dm-3, sector 4294967168                                
[  117.909471] blk_update_request: I/O error, dev dm-3, sector 4294967168
[  117.915992] Buffer I/O error on dev dm-3, logical block 536870896, async page read 
So that part will be solved in BZ 1534568 as it is duplicate of that one.

What is missing here is the "ERROR: Device under VDO not found. Overwrite with --force". Changing the name to reflect that.
The reasoning behind this is that vdo is not able to clean up the superblock when it cannot reach the device itself. Server admin should be made aware of this remnant so he can clean it up himself to avoid errors like:
"vdo: ERROR - vdoformat: Cannot format device already containing a valid VDO!"

Comment 9 Jakub Krysl 2018-07-04 10:10:10 UTC
Tested on:
RHEL-7.6-20180626.0
kernel-3.10.0-915.el7
kmod-vdo-6.1.1.99-1.el7
vdo-6.1.1.99-2.el7

Now when removing vdo with backing device missing the admin is notified in terminal there is the missing device issue and pointed to solution using --force. But at that point the device is already removed, just the superblock is not cleared:
# vdo remove --name vdo --verbose
Removing VDO vdo
Stopping VDO vdo
    dmsetup status vdo
    mount
    udevadm settle
    dmsetup remove vdo
vdo: ERROR - Device /dev/disk/by-id/scsi-360fff19abdd9b56dfb9bf59625f4c9f7 not found. Remove VDO with --force.
[root@storageqe-74 vdo]# vdo remove --name vdo --verbose
Removing VDO vdo
Stopping VDO vdo
    dmsetup status vdo
vdo: ERROR - Device /dev/disk/by-id/scsi-360fff19abdd9b56dfb9bf59625f4c9f7 not found. Remove VDO with --force.
# iscsiadm -m node -l
Logging in to [iface: default, target: iqn.2001-05.com.equallogic:0-af1ff6-6db5d9bd9-f7c9f42596f59bfb-vdo-general, portal: *****] (multiple)
Login to [iface: default, target: iqn.2001-05.com.equallogic:0-af1ff6-6db5d9bd9-f7c9f42596f59bfb-vdo-general, portal: *****] successful.
[root@storageqe-74 vdo]# vdo remove --name vdo --verbose
Removing VDO vdo
Stopping VDO vdo
    dmsetup status vdo
    dd if=/dev/zero of=/dev/disk/by-id/scsi-360fff19abdd9b56dfb9bf59625f4c9f7 oflag=direct bs=4096 count=1

/var/log/messages:

[162456.240454] kvdo2:dmsetup: suspending device 'vdo'
[162456.265472] kvdo2:dmsetup: synchronous flush failed: System error -5 (-5)
[162456.300191] kvdo2:dmsetup: suspend of device 'vdo' failed with error: -5
[162456.332379] kvdo2:dmsetup: stopping device 'vdo'
[162456.356679] kvdo2:journalQ: Completing write VIO of type 10 for physical block 679128 with error: System error -5 (-5)
[162456.407045] kvdo2:dmsetup: stopKernelLayer: Close device failed -5 (System error -5: System error -5)
[162456.450874] kvdo2:dmsetup: uds: kvdo2:dedupeQ: index_0: beginning save (vcn 4294967295)
[162456.451745] uds: kvdo2:dedupeQ: save: cannot prepare index ris_prepareSave: Input/output error (5)
uds: kvdo2:dedupeQ: index_0: save failed
[162456.451750] uds: kvdo2:dedupeQ: index router save state problem: Input/output error (5)
uds: kvdo2:dedupeQ: ignoring error from saveIndexSession: Input/output error (5)
[162456.473634] kvdo2:dedupeQ: Error closing index dev=/dev/disk/by-id/scsi-360fff19abdd9b56dfb9bf59625f4c9f7 offset=4096 size=2781704192: System error 5 (5)

[162456.685616] Setting UDS index target state to closed
[162456.753027] kvdo2:dmsetup: device 'vdo' stopped

All those messages in /var/log/messages appear when the command 'vdo remove' is run the first time. Though all changes requested in comment #6 are done, all these errors could be prevented if vdo stopped removal on first sight of issues.  As this BZ is about the removal itself and not just the comment #6 and VDO actually still allows the removal, I am not sure this BZ should be closed. Probably we should discuss this a bit more to come with proper way how to tackle those new errors.

Comment 14 Jakub Krysl 2019-05-27 15:18:09 UTC
vdo-6.1.2.41-4.el7.x86_64

VDO now sends error before actually doing anything. Reconnecting the device starts the VDO again
# vdo create --name vdo --device /dev/sdm --verbose
Creating VDO vdo
    grep MemAvailable /proc/meminfo
    pvcreate --config devices/scan_lvs=1 -qq --test /dev/sdm
    blkid -p /dev/sdm
    modprobe kvdo
    vdoformat --uds-checkpoint-frequency=0 --uds-memory-size=0.25 /dev/disk/by-id/scsi-360fff19aad198f7e32a79578cd06903b
    vdodumpconfig /dev/disk/by-id/scsi-360fff19aad198f7e32a79578cd06903b
Starting VDO vdo
    dmsetup status --target vdo vdo
    grep MemAvailable /proc/meminfo
    modprobe kvdo
    vdodumpconfig /dev/disk/by-id/scsi-360fff19aad198f7e32a79578cd06903b
    dmsetup create vdo --uuid VDO-5e56e8a1-fd8c-426e-9ec1-a8dbc16e12c4 --table '0 12557488 vdo /dev/disk/by-id/scsi-360fff19aad198f7e32a79578cd06903b 4096 disabled 0 32768 16380 on auto vdo ack=1,bio=4,bioRotationInterval=64,cpu=2,hash=1,logical=1,physical=1'
    dmsetup status --target vdo vdo
Starting compression on VDO vdo
    dmsetup message vdo 0 compression on
    vdodmeventd -r vdo
    dmsetup status --target vdo vdo
VDO instance 2 volume is ready at /dev/mapper/vdo

# iscsiadm -m node -u
Logging out of session [sid: 1, target: iqn.2001-05.com.equallogic:0-af1ff6-7e8f19ad9-3b9006cd7895a732-vdo-small, portal: XXXXXX,3260]
Logout of [sid: 1, target: iqn.2001-05.com.equallogic:0-af1ff6-7e8f19ad9-3b9006cd7895a732-vdo-small, portal: XXXXXX,3260] successful.

# vdo remove --all --verbose
Removing VDO vdo
vdo: ERROR - Device /dev/disk/by-id/scsi-360fff19aad198f7e32a79578cd06903b not found. Remove VDO with --force.

Comment 16 errata-xmlrpc 2019-08-06 13:08:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2233


Note You need to log in before you can comment on or make changes to this bug.