Bug 1687005

Summary: Loading table with too big physical size spams of type 6 VIO producing System Error -5 (plus soft lockup)
Product: Red Hat Enterprise Linux 7 Reporter: Andy Walsh <awalsh>
Component: kmod-kvdoAssignee: corwin <corwin>
Status: CLOSED ERRATA QA Contact: Jakub Krysl <jkrysl>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.6CC: awalsh, bgurney, corwin, jkrysl, limershe, sweettea
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 6.1.2.41 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1678785 Environment:
Last Closed: 2019-08-06 13:08:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1678785    
Bug Blocks:    

Description Andy Walsh 2019-03-08 21:15:47 UTC
+++ This bug was initially created as a clone of Bug #1678785 +++

Description of problem:
Loading new table with bigger physical size (without growing the device under) produces these errors:
[ 6425.517509] kvdo1:journalQ: Completing write VIO of type 6 for physical block 108652269 with error: System error -5 (-5) 

This message seems to be produced for every extra physical block (that just is not there), so if the given size is much bigger, this is lots of spam.

Plus if there is iscsi device connected on the system, I am getting iscsid soft lockup trace (leaving the -5 errors in between as they appeared in the log):
[ 6352.289203]  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4301009255, last ping 4301014529, now 4301019648 
[ 6356.005253] kvdo1:journalQ: Completing write VIO of type 6 for physical block 108651718 with error: System error -5 (-5) 
[ 6379.976348] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [iscsid:962] 
[ 6380.104410] kvdo1:journalQ: Completing write VIO of type 6 for physical block 108651719 with error: System error -5 (-5) 
[ 6380.152540] Modules linked in: nfsv3 nfs_acl dm_service_time rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache kvdo(O) uds(O) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc dm_multipath intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp 
[ 6380.203568] kvdo1:journalQ: Completing write VIO of type 6 for physical block 108651720 with error: System error -5 (-5) 
[ 6380.254415]  coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt iTCO_vendor_support 
[ 6380.305065] kvdo1:journalQ: Completing write VIO of type 6 for physical block 108651721 with error: System error -5 (-5) 
[ 6380.355872]  intel_cstate ipmi_ssif intel_uncore intel_rapl_perf i2c_i801 pcspkr sg 
[ 6380.406400] kvdo1:journalQ: Completing write VIO of type 6 for physical block 108651722 with error: System error -5 (-5) 
[ 6380.460047]  lpc_ich hpilo hpwdt ipmi_si ipmi_devintf acpi_tad ipmi_msghandler ioatdma wmi acpi_power_meter xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci libahci drm libata ixgbe crc32c_intel serio_raw tg3 mdio dca dm_mirror dm_region_hash dm_log dm_mod 
[ 6380.460072] CPU: 3 PID: 962 Comm: iscsid Kdump: loaded Tainted: G           O     --------- -  - 4.18.0-67.el8.x86_64 #1 
[ 6380.510838] kvdo1:journalQ: Completing write VIO of type 6 for physical block 108651723 with error: System error -5 (-5) 
[ 6380.563913] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/25/2017 
[ 6380.563918] RIP: 0010:native_queued_spin_lock_slowpath+0x1d9/0x1f0 
[ 6380.563920] Code: ff ff ff f3 90 48 8b 02 48 85 c0 74 f6 48 89 c6 e9 60 ff ff ff 89 c1 e9 6d fe ff ff 85 c9 74 0e 8b 07 84 c0 74 08 f3 90 8b 07 <84> c0 75 f8 b8 01 00 00 00 66 89 07 c3 66 2e 0f 1f 84 00 00 00 00 
[ 6380.563921] RSP: 0018:ffffb092c175fe58 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 
[ 6380.563923] RAX: 0000000000140101 RBX: ffff9531ead1ec88 RCX: 0000000000000001 
[ 6380.563924] RDX: 0000000000000101 RSI: 0000000000000000 RDI: ffff9531ead1ec88 
[ 6380.563925] RBP: ffff9531ead1ec00 R08: 0000000000000008 R09: 0000000000000000 
[ 6380.563926] R10: 0000000000000000 R11: ffff9531ead1ec00 R12: 00007ffe885746c0 
[ 6380.563927] R13: 000000000000000d R14: 0000000000000001 R15: 0000000000000008 
[ 6380.563929] FS:  00007f910f669d00(0000) GS:ffff953237ac0000(0000) knlGS:0000000000000000 
[ 6380.563933] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[ 6380.615645] kvdo1:journalQ: Completing write VIO of type 6 for physical block 108651724 with error: System error -5 (-5) 
[ 6380.670862] CR2: 000055bf6f526094 CR3: 000000026610c006 CR4: 00000000001606e0 
[ 6380.670863] Call Trace: 
[ 6380.670868]  _raw_spin_lock_bh+0x27/0x30 
[ 6380.670873]  lock_sock_nested+0x1e/0x60 
[ 6380.721490] kvdo1:journalQ: Completing write VIO of type 6 for physical block 108651725 with error: System error -5 (-5) 
[ 6380.773148]  sock_setsockopt+0x6d/0xa10 
[ 6380.773152]  __sys_setsockopt+0xce/0xe0 
[ 6380.822598] kvdo1:journalQ: Completing write VIO of type 6 for physical block 108651726 with error: System error -5 (-5) 
[ 6380.874459]  __x64_sys_setsockopt+0x20/0x30 
[ 6380.874463]  do_syscall_64+0x5b/0x1b0 
[ 6380.925584] kvdo1:journalQ: Completing write VIO of type 6 for physical block 108651727 with error: System error -5 (-5) 
[ 6380.976692]  entry_SYSCALL_64_after_hwframe+0x65/0xca 
[ 6380.976695] RIP: 0033:0x7f910e19608e 
[ 6380.976696] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b1 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ca fd 2b 00 f7 d8 64 89 01 48 
[ 6380.976697] RSP: 002b:00007ffe885746b8 EFLAGS: 00000293 ORIG_RAX: 0000000000000036 
[ 6380.976699] RAX: ffffffffffffffda RBX: 000055bf6f535c98 RCX: 00007f910e19608e 
[ 6380.976700] RDX: 000000000000000d RSI: 0000000000000001 RDI: 0000000000000009 
[ 6380.976701] RBP: 000055bf6f530220 R08: 0000000000000008 R09: 0000000000000005 
[ 6380.976702] R10: 00007ffe885746c0 R11: 0000000000000293 R12: 000055bf6f538428 
[ 6380.976702] R13: 0000000000000003 R14: 0000000000000001 R15: 000055bf6f538160

Version-Release number of selected component (if applicable):
kernel-4.18.0-67.el8.x86_64
vdo-6.2.0.293-10.el8.x86_64
kmod-kvdo-6.2.0.293-47.el8.x86_64

How reproducible:
100%

Steps to Reproduce:
1. vdo create --name vdo --device /dev/mapper/rhel_storageqe--74-home
2. dmsetup table                                                                                                                                                                               
mpatha: 0 20981760 multipath 0 1 alua 1 1 service-time 0 1 2 8:16 1 1           
rhel_storageqe--74-home: 0 853467136 linear 8:2 16345088                                                                                                                                                           
rhel_storageqe--74-swap: 0 16343040 linear 8:2 2048                                                                                                                                                                
rhel_storageqe--74-root: 0 104857600 linear 8:2 869812224                                
vdo: 0 845632240 vdo V2 /dev/disk/by-id/dm-uuid-LVM-t9CAkhzrxCmr3d9xS6dx1zjmExag3plHk8LIlJWgv0kmRu26GLtQqwbotyDVPMcM 106683392 4096 32768 16380 on auto vdo maxDiscard 1 ack 1 bio 4 bioRotationInterval 64 cpu 2 h
ash 1 logical 1 physical 1
3. dmsetup suspend vdo
# load table with size 106683392 -> 108683392
4. dmsetup reload vdo --table '0 845632240 vdo V2 /dev/disk/by-id/dm-uuid-LVM-t9CAkhzrxCmr3d9xS6dx1zjmExag3plHk8LIlJWgv0kmRu26GLtQqwbotyDVPMcM 108683392 4096 32768 16380 on auto vdo maxDiscar
d 1 ack 1 bio 4 bioRotationInterval 64 cpu 2 hash 1 logical 1 physical 1'                                                                                                                                          
5. dmsetup resume vdo                   
device-mapper: resume ioctl on vdo  failed: Input/output error                                                                                                                                                     
Command failed.              

Actual results:
spam of [ 6380.510838] kvdo1:journalQ: Completing write VIO of type 6 for physical block 108651723 with error: System error -5 (-5)
plus [ 6379.976348] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [iscsid:962]

Expected results:
No spam, no soft lockup

Additional info:
Placed the spam and soft lockup into single ticket as per discussion with sweettea.

Comment 4 Jakub Krysl 2019-05-31 13:33:04 UTC
As the reproducer depends on modifying physical size in DM table and in RHEL-7 there is old V1 table format without the physical size, it is not possible to modify the size on the run. The issue itself should manifest any time there are enough I/O errors from VDO, but I could not reproduce it. Regression testing passed.
kernel-3.10.0-1048.el7.x86_64
vdo-6.1.2.41-4.el7.x86_64
kmod-kvdo-6.1.2.41-5.el7.x86_64

Comment 6 errata-xmlrpc 2019-08-06 13:08:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2233