Bug 2119143 - vdo filesystem deadlock while doing looping online resizing during I/O
Summary: vdo filesystem deadlock while doing looping online resizing during I/O
Keywords:
Status: CLOSED DUPLICATE of bug 2109047
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kmod-kvdo
Version: 8.7
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Ken Raeburn
QA Contact: vdo-qe
URL:
Whiteboard:
Depends On: 2064802
Blocks: 2109047
TreeView+ depends on / blocked
 
Reported: 2022-08-17 16:49 UTC by Corey Marthaler
Modified: 2022-08-23 17:59 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2064802
Environment:
Last Closed: 2022-08-23 17:59:19 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-131354 0 None None None 2022-08-17 16:51:48 UTC

Comment 4 Corey Marthaler 2022-08-17 16:55:15 UTC
Was this not fixed in rhel8.7?

kernel-4.18.0-417.el8    BUILT: Wed Aug 10 15:40:43 CDT 2022
vdo-6.2.7.17-14.el8    BUILT: Tue Jul 19 10:05:39 CDT 2022
kmod-kvdo-6.2.7.17-87.el8    BUILT: Thu Aug 11 13:47:21 CDT 2022

lvm2-2.03.14-6.el8    BUILT: Fri Jul 29 05:40:53 CDT 2022
lvm2-libs-2.03.14-6.el8    BUILT: Fri Jul 29 05:40:53 CDT 2022



Aug 17 11:37:41 hayes-01 qarshd[5627]: Running cmdline: lvextend --yes --resizefs -L +500M vdo_sanity/vdo_lv
Aug 17 11:37:41 hayes-01 kernel: kvdo4:lvextend: Using write policy sync automatically.
Aug 17 11:37:41 hayes-01 kernel: kvdo4:lvextend: preparing to modify device 'vdo_sanity-vdo_pool-vpool'
Aug 17 11:37:41 hayes-01 kernel: kvdo4:lvextend: Preparing to resize logical to 26470656
Aug 17 11:37:41 hayes-01 kernel: kvdo4:lvextend: Done preparing to resize logical
Aug 17 11:37:41 hayes-01 kernel: kvdo4:lvextend: suspending device 'vdo_sanity-vdo_pool-vpool'
Aug 17 11:37:41 hayes-01 kernel: kvdo4:packerQ: compression is disabled
Aug 17 11:37:41 hayes-01 kernel: kvdo4:packerQ: compression is enabled
Aug 17 11:37:42 hayes-01 kernel: uds: lvextend: beginning save (vcn 10)
Aug 17 11:37:42 hayes-01 kernel: uds: lvextend: finished save (vcn 10)
Aug 17 11:37:42 hayes-01 kernel: kvdo4:lvextend: device 'vdo_sanity-vdo_pool-vpool' suspended
Aug 17 11:37:42 hayes-01 kernel: kvdo4:lvextend: resuming device 'vdo_sanity-vdo_pool-vpool'
Aug 17 11:37:42 hayes-01 kernel: kvdo4:lvextend: Resizing logical to 26470656
Aug 17 11:37:42 hayes-01 kernel: kvdo4:lvextend: Logical blocks now 26470656
Aug 17 11:37:42 hayes-01 kernel: kvdo4:lvextend: device 'vdo_sanity-vdo_pool-vpool' resumed
Aug 17 11:37:42 hayes-01 dmeventd[3089]: No longer monitoring VDO pool vdo_sanity-vdo_pool-vpool.
Aug 17 11:39:26 hayes-01 kernel: kvdo4:lvextend: suspending device 'vdo_sanity-vdo_pool-vpool'
Aug 17 11:39:26 hayes-01 kernel: kvdo4:packerQ: compression is disabled
Aug 17 11:39:26 hayes-01 kernel: kvdo4:packerQ: compression is enabled
Aug 17 11:39:31 hayes-01 kernel: uds: lvextend: beginning save (vcn 28)
Aug 17 11:39:31 hayes-01 kernel: uds: lvextend: finished save (vcn 28)
Aug 17 11:39:31 hayes-01 kernel: kvdo4:lvextend: device 'vdo_sanity-vdo_pool-vpool' suspended
Aug 17 11:39:31 hayes-01 kernel: kvdo4:lvextend: resuming device 'vdo_sanity-vdo_pool-vpool'
Aug 17 11:39:31 hayes-01 kernel: kvdo4:lvextend: device 'vdo_sanity-vdo_pool-vpool' resumed
Aug 17 11:39:31 hayes-01 dmeventd[3089]: Monitoring VDO pool vdo_sanity-vdo_pool-vpool.
Aug 17 11:41:29 hayes-01 kernel: INFO: task kworker/10:5:3997 blocked for more than 120 seconds.
Aug 17 11:41:29 hayes-01 kernel:      Tainted: G           O     --------- -  - 4.18.0-417.el8.x86_64 #1
Aug 17 11:41:29 hayes-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 17 11:41:29 hayes-01 kernel: task:kworker/10:5    state:D stack:    0 pid: 3997 ppid:     2 flags:0x80004080
Aug 17 11:41:29 hayes-01 kernel: Workqueue: xfs-conv/dm-2 xfs_end_io [xfs]
Aug 17 11:41:29 hayes-01 kernel: Call Trace:
Aug 17 11:41:29 hayes-01 kernel: __schedule+0x2d1/0x860
Aug 17 11:41:29 hayes-01 kernel: schedule+0x35/0xa0
Aug 17 11:41:29 hayes-01 kernel: rwsem_down_write_slowpath+0x30c/0x5c0
Aug 17 11:41:30 hayes-01 kernel: xfs_trans_alloc_inode+0x86/0x160 [xfs]
Aug 17 11:41:30 hayes-01 kernel: xfs_iomap_write_unwritten+0xaa/0x2a0 [xfs]
Aug 17 11:41:30 hayes-01 kernel: xfs_end_ioend+0x107/0x160 [xfs]
Aug 17 11:41:30 hayes-01 kernel: xfs_end_io+0xaf/0xe0 [xfs]
Aug 17 11:41:30 hayes-01 kernel: process_one_work+0x1a7/0x360
Aug 17 11:41:30 hayes-01 kernel: ? create_worker+0x1a0/0x1a0
Aug 17 11:41:30 hayes-01 kernel: worker_thread+0x30/0x390
Aug 17 11:41:30 hayes-01 kernel: ? create_worker+0x1a0/0x1a0
Aug 17 11:41:30 hayes-01 kernel: kthread+0x10b/0x130
Aug 17 11:41:30 hayes-01 kernel: ? set_kthread_struct+0x50/0x50
Aug 17 11:41:30 hayes-01 kernel: ret_from_fork+0x35/0x40
Aug 17 11:41:30 hayes-01 kernel: INFO: task dd:5492 blocked for more than 120 seconds.
Aug 17 11:41:30 hayes-01 kernel:      Tainted: G           O     --------- -  - 4.18.0-417.el8.x86_64 #1
Aug 17 11:41:30 hayes-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 17 11:41:30 hayes-01 kernel: task:dd              state:D stack:    0 pid: 5492 ppid:  5489 flags:0x00000080
Aug 17 11:41:30 hayes-01 kernel: Call Trace:
Aug 17 11:41:30 hayes-01 kernel: __schedule+0x2d1/0x860
Aug 17 11:41:30 hayes-01 kernel: schedule+0x35/0xa0
Aug 17 11:41:30 hayes-01 kernel: rwsem_down_write_slowpath+0x30c/0x5c0
Aug 17 11:41:30 hayes-01 kernel: xfs_vn_update_time+0xe9/0x1e0 [xfs]
Aug 17 11:41:30 hayes-01 kernel: file_update_time+0xe5/0x130
Aug 17 11:41:30 hayes-01 kernel: xfs_file_write_checks+0x1ee/0x290 [xfs]
Aug 17 11:41:30 hayes-01 kernel: xfs_file_buffered_write+0x6f/0x2c0 [xfs]
Aug 17 11:41:30 hayes-01 kernel: new_sync_write+0x112/0x160
Aug 17 11:41:30 hayes-01 kernel: vfs_write+0xa5/0x1b0
Aug 17 11:41:30 hayes-01 kernel: ksys_write+0x4f/0xb0
Aug 17 11:41:30 hayes-01 kernel: do_syscall_64+0x5b/0x1b0
Aug 17 11:41:30 hayes-01 kernel: entry_SYSCALL_64_after_hwframe+0x61/0xc6
Aug 17 11:41:30 hayes-01 kernel: RIP: 0033:0x7fa57ae175c8
Aug 17 11:41:30 hayes-01 kernel: Code: Unable to access opcode bytes at RIP 0x7fa57ae1759e.
Aug 17 11:41:30 hayes-01 kernel: RSP: 002b:00007ffe9dca6258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Aug 17 11:41:30 hayes-01 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa57ae175c8
Aug 17 11:41:30 hayes-01 kernel: RDX: 0000000000100000 RSI: 00007fa57b1d9000 RDI: 0000000000000001
Aug 17 11:41:30 hayes-01 kernel: RBP: 0000000000100000 R08: 00000000ffffffff R09: 0000000000000000
Aug 17 11:41:30 hayes-01 kernel: R10: 0000000000000022 R11: 0000000000000246 R12: 00007fa57b1d9000
Aug 17 11:41:30 hayes-01 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 00007fa57b1d9000
Aug 17 11:41:30 hayes-01 kernel: INFO: task kworker/u81:0:5613 blocked for more than 120 seconds.
Aug 17 11:41:30 hayes-01 kernel:      Tainted: G           O     --------- -  - 4.18.0-417.el8.x86_64 #1
Aug 17 11:41:30 hayes-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 17 11:41:30 hayes-01 kernel: task:kworker/u81:0   state:D stack:    0 pid: 5613 ppid:     2 flags:0x80004080
Aug 17 11:41:30 hayes-01 kernel: Workqueue: writeback wb_workfn (flush-253:2)
Aug 17 11:41:30 hayes-01 kernel: Call Trace:
Aug 17 11:41:30 hayes-01 kernel: __schedule+0x2d1/0x860
Aug 17 11:41:30 hayes-01 kernel: ? generic_make_request_no_check+0x202/0x330
Aug 17 11:41:30 hayes-01 kernel: ? xfs_buf_read_map+0x1bd/0x320 [xfs]
Aug 17 11:41:30 hayes-01 kernel: schedule+0x35/0xa0
Aug 17 11:41:30 hayes-01 kernel: schedule_timeout+0x278/0x300
Aug 17 11:41:30 hayes-01 kernel: ? bio_add_page+0x46/0x60
Aug 17 11:41:30 hayes-01 kernel: ? _xfs_buf_ioapply+0x2e9/0x430 [xfs]
Aug 17 11:41:30 hayes-01 kernel: ? xfs_buf_read_map+0x1bd/0x320 [xfs]
Aug 17 11:41:30 hayes-01 kernel: wait_for_completion+0x96/0x100
Aug 17 11:41:30 hayes-01 kernel: ? __xfs_buf_submit+0x126/0x1d0 [xfs]
Aug 17 11:41:30 hayes-01 kernel: xfs_buf_iowait+0x22/0xd0 [xfs]
Aug 17 11:41:30 hayes-01 kernel: __xfs_buf_submit+0x126/0x1d0 [xfs]
Aug 17 11:41:30 hayes-01 kernel: xfs_buf_read_map+0x1bd/0x320 [xfs]
Aug 17 11:41:30 hayes-01 kernel: ? xfs_trans_read_buf_map+0x1cf/0x360 [xfs]
Aug 17 11:41:30 hayes-01 kernel: ? xfs_alloc_read_agfl+0x7c/0xc0 [xfs]
Aug 17 11:41:30 hayes-01 kernel: xfs_trans_read_buf_map+0x1cf/0x360 [xfs]
Aug 17 11:41:30 hayes-01 kernel: ? xfs_alloc_read_agfl+0x7c/0xc0 [xfs]
Aug 17 11:41:30 hayes-01 kernel: xfs_alloc_read_agfl+0x7c/0xc0 [xfs]
Aug 17 11:41:30 hayes-01 kernel: xfs_alloc_fix_freelist+0x29c/0x480 [xfs]
Aug 17 11:41:30 hayes-01 kernel: ? prep_new_page+0xf5/0x100
Aug 17 11:41:30 hayes-01 kernel: xfs_alloc_vextent+0x19f/0x540 [xfs]
Aug 17 11:41:30 hayes-01 kernel: xfs_bmap_btalloc+0x432/0x820 [xfs]
Aug 17 11:41:30 hayes-01 kernel: xfs_bmapi_allocate+0xab/0x2f0 [xfs]
Aug 17 11:41:30 hayes-01 kernel: xfs_bmapi_convert_delalloc+0x274/0x460 [xfs]
Aug 17 11:41:30 hayes-01 kernel: xfs_map_blocks+0x1a6/0x3f0 [xfs]
Aug 17 11:41:30 hayes-01 kernel: iomap_writepage_map+0xe0/0x670
Aug 17 11:41:30 hayes-01 kernel: write_cache_pages+0x197/0x420
Aug 17 11:41:30 hayes-01 kernel: ? iomap_invalidatepage+0xe0/0xe0
Aug 17 11:41:30 hayes-01 kernel: ? blk_update_request+0x9f/0x380
Aug 17 11:41:30 hayes-01 kernel: iomap_writepages+0x1c/0x40
Aug 17 11:41:30 hayes-01 kernel: xfs_vm_writepages+0x7e/0xb0 [xfs]
Aug 17 11:41:30 hayes-01 kernel: do_writepages+0xc2/0x1c0
Aug 17 11:41:30 hayes-01 kernel: __writeback_single_inode+0x39/0x300
Aug 17 11:41:30 hayes-01 kernel: writeback_sb_inodes+0x1ea/0x450
Aug 17 11:41:30 hayes-01 kernel: __writeback_inodes_wb+0x5f/0xd0
Aug 17 11:41:30 hayes-01 kernel: wb_writeback+0x24c/0x2e0
Aug 17 11:41:30 hayes-01 kernel: ? get_nr_inodes+0x35/0x60
Aug 17 11:41:30 hayes-01 kernel: wb_workfn+0x380/0x4d0
Aug 17 11:41:30 hayes-01 kernel: ? newidle_balance+0x279/0x3c0
Aug 17 11:41:30 hayes-01 kernel: process_one_work+0x1a7/0x360
Aug 17 11:41:30 hayes-01 kernel: ? create_worker+0x1a0/0x1a0
Aug 17 11:41:30 hayes-01 kernel: worker_thread+0x30/0x390
Aug 17 11:41:30 hayes-01 kernel: ? create_worker+0x1a0/0x1a0
Aug 17 11:41:30 hayes-01 kernel: kthread+0x10b/0x130
Aug 17 11:41:30 hayes-01 kernel: ? set_kthread_struct+0x50/0x50
Aug 17 11:41:30 hayes-01 kernel: ret_from_fork+0x35/0x40
Aug 17 11:43:32 hayes-01 kernel: INFO: task kworker/10:5:3997 blocked for more than 120 seconds.



SCENARIO - open_xfs_fsadm_vdo_resize:  Create an XFS filesysem on VDO, mount it, and then attempt multiple online fsadm resizes with data checking (bug 2064802) 
adding entry to the devices file for /dev/sde1
creating PV on hayes-01 using device /dev/sde1
pvcreate --yes -ff   /dev/sde1
  Physical volume "/dev/sde1" successfully created.
adding entry to the devices file for /dev/sdd1
creating PV on hayes-01 using device /dev/sdd1
pvcreate --yes -ff   /dev/sdd1
  Physical volume "/dev/sdd1" successfully created.
adding entry to the devices file for /dev/sdh1
creating PV on hayes-01 using device /dev/sdh1
pvcreate --yes -ff   /dev/sdh1
  Physical volume "/dev/sdh1" successfully created.
adding entry to the devices file for /dev/sdi1
creating PV on hayes-01 using device /dev/sdi1
pvcreate --yes -ff   /dev/sdi1
  Physical volume "/dev/sdi1" successfully created.
creating VG on hayes-01 using PV(s) /dev/sde1 /dev/sdd1 /dev/sdh1 /dev/sdi1
vgcreate    vdo_sanity /dev/sde1 /dev/sdd1 /dev/sdh1 /dev/sdi1
  Volume group "vdo_sanity" successfully created
lvcreate --yes --type linear -n vdo_pool  -L 50G vdo_sanity  
Logical volume "vdo_pool" created.

lvconvert --yes --type vdo-pool -n vdo_lv  -V 100G vdo_sanity/vdo_pool
The VDO volume can address 46 GB in 23 data slabs, each 2 GB.
    It can grow to address at most 16 TB of physical storage in 8192 slabs.
    If a larger maximum size might be needed, use bigger slabs.
  Logical volume "vdo_lv" created.
  Converted vdo_sanity/vdo_pool to VDO pool volume and created virtual vdo_sanity/vdo_lv VDO volume.
WARNING: Converting logical volume vdo_sanity/vdo_pool to VDO pool volume with formating.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)

mkfs --type xfs -f /dev/vdo_sanity/vdo_lv
meta-data=/dev/vdo_sanity/vdo_lv isize=512    agcount=4, agsize=6553600 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=0 inobtcount=0
data     =                       bsize=4096   blocks=26214400, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=12800, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.

mount  /dev/vdo_sanity/vdo_lv /mnt/vdo_lv
Writing files to /mnt/vdo_lv
/usr/tests/sts-rhel8.7/bin/checkit -w /mnt/vdo_lv -f /tmp/Filesystem.1855530 -n 5000
checkit starting with:
CREATE
Num files:          5000
Random Seed:        5408
Verify XIOR Stream: /tmp/Filesystem.1855530
Working dir:        /mnt/vdo_lv
Checking files from /mnt/vdo_lv
/usr/tests/sts-rhel8.7/bin/checkit -w /mnt/vdo_lv -f /tmp/Filesystem.1855530 -v
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/Filesystem.1855530
Working dir:        /mnt/vdo_lv
Starting dd io to vdo fs to be resized

Attempt to resize the open vdo filesystem multiple times with lvextend/fsadm on hayes-01
+++ itr 1 +++
Adding additional space to vdo_sanity/vdo_lv on hayes-01
lvextend --yes --resizefs -L +500M vdo_sanity/vdo_lv 
  Size of logical volume vdo_sanity/vdo_lv changed from 100.00 GiB (25600 extents) to <100.49 GiB (25725 extents).
  Logical volume vdo_sanity/vdo_lv successfully resized.
meta-data=/dev/mapper/vdo_sanity-vdo_lv isize=512    agcount=4, agsize=6553600 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=0 inobtcount=0
data     =                       bsize=4096   blocks=26214400, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=12800, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 26214400 to 26342400


PRE:104857600.0 POST:105369600.0
PRE:{'104806400'} POST:{'105318400'}
Checking files from /mnt/vdo_lv
/usr/tests/sts-rhel8.7/bin/checkit -w /mnt/vdo_lv -f /tmp/Filesystem.1855530 -v
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/Filesystem.1855530
Working dir:        /mnt/vdo_lv

+++ itr 2 +++
Adding additional space to vdo_sanity/vdo_lv on hayes-01
lvextend --yes --resizefs -L +500M vdo_sanity/vdo_lv 
  Size of logical volume vdo_sanity/vdo_lv changed from <100.49 GiB (25725 extents) to <100.98 GiB (25850 extents).
  Logical volume vdo_sanity/vdo_lv successfully resized.
meta-data=/dev/mapper/vdo_sanity-vdo_lv isize=512    agcount=5, agsize=6553600 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=0 inobtcount=0
data     =                       bsize=4096   blocks=26342400, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=12800, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 26342400 to 26470400


PRE:105369600.0 POST:105881600.0
PRE:{'105318400'} POST:{'105830400'}
Checking files from /mnt/vdo_lv
/usr/tests/sts-rhel8.7/bin/checkit -w /mnt/vdo_lv -f /tmp/Filesystem.1855530 -v
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/Filesystem.1855530
Working dir:        /mnt/vdo_lv

+++ itr 3 +++
Adding additional space to vdo_sanity/vdo_lv on hayes-01
lvextend --yes --resizefs -L +500M vdo_sanity/vdo_lv 
40000+0 records in
40000+0 records out
41943040000 bytes (42 GB, 39 GiB) copied, 625.646 s, 67.0 MB/s

[STUCK]

Comment 5 Ken Raeburn 2022-08-17 17:23:36 UTC
Not fixed in 8.7. As noted in bug #2064802 the core problem (bugs in the suspend/resume cycle used when you do the resizing) seems to come up in more cases in RHEL 8 than in RHEL 9, and the fix that worked for RHEL 9 doesn't cover them all. So I'm still investigating.

Comment 6 corwin 2022-08-23 17:59:19 UTC

*** This bug has been marked as a duplicate of bug 2109047 ***


Note You need to log in before you can comment on or make changes to this bug.