Created attachment 2027179 [details] virt-resize executed with LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 log Description of problem: When resizing a `qcow2` volume, the `virt-resize` fails. Version-Release number of selected component (if applicable): guestfish 1.52.0 How reproducible: Consistently fails on any qcow2 image file. Steps to Reproduce: 1. Download arch basic image from >https://geo.mirror.pkgbuild.com/images/v20240415.229275/Arch-Linux-x86_64-basic.qcow2 and save it as beforeresize.qcow2 2. Execute: >qemu-img create -f qcow2 -o preallocation=metadata,compression_type=zstd newdisk.qcow2 100G 3. Execute: >virt-resize --expand /dev/sda3 beforeresize.qcow2 newdisk.qcow2 Actual results: virt-resize: error: libguestfs error: appliance closed the connection unexpectedly. This usually means the libguestfs appliance crashed. ... Expected results: newdisk.qcow2 should be a copy of beforeresize.qcow2, but with /dev/sda3 increased to 100 GB. Additional info: Happens with other unrelated images too. Never used to happen. This was introduced in the past week or two by an Arch Linux pacman -Syu upgrade. OS: Arch Linux x86_64 Kernel: 6.8.5-arch1-1 CPU: Intel i5-6600K
[ 18.533479] BUG: kernel NULL pointer dereference, address: 0000000000000600 [ 18.534302] #PF: supervisor read access in kernel mode [ 18.534862] #PF: error_code(0x0000) - not-present page [ 18.535429] PGD 0 P4D 0 [ 18.535715] Oops: 0000 [#1] PREEMPT SMP PTI [ 18.536175] CPU: 0 PID: 43 Comm: kswapd0 Not tainted 6.8.5-arch1-1 #1 5f12b795066ab8d27a5fe9971245067df4fb99ed [ 18.537241] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 18.538220] RIP: 0010:memcg_page_state+0x9/0x30 [ 18.538721] Code: c3 cc cc cc cc eb f9 e9 05 b8 ff ff 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 <48> 8b 87 00 06 00 00 48 63 f6 31 d2 48 8b 04 f0 48 85 c0 48 0f 48 [ 18.540694] RSP: 0018:ffff9c4840163af0 EFLAGS: 00010246 [ 18.541256] RAX: 00000000fffff33f RBX: ffff9c4840163bc0 RCX: 0000000000000002 [ 18.542026] RDX: 0000000000000001 RSI: 0000000000000033 RDI: 0000000000000000 [ 18.542791] RBP: 0000000000000000 R08: ffff8d64c314e000 R09: 0000000000000000 [ 18.543554] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8d650ffdb780 [ 18.544317] R13: ffff8d64c1c28400 R14: 0000000000000000 R15: ffff8d64c31e1d80 [ 18.545083] FS: 0000000000000000(0000) GS:ffff8d650de00000(0000) knlGS:0000000000000000 [ 18.545948] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.546570] CR2: 0000000000000600 CR3: 00000000040f6003 CR4: 0000000000370ef0 [ 18.547338] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.548102] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.548873] Call Trace: [ 18.549147] <TASK> [ 18.549389] ? __die+0x23/0x70 [ 18.549732] ? page_fault_oops+0x171/0x4e0 [ 18.550178] ? free_unref_page_list+0x2f4/0x400 [ 18.550677] ? exc_page_fault+0x7f/0x180 [ 18.551106] ? asm_exc_page_fault+0x26/0x30 [ 18.551564] ? memcg_page_state+0x9/0x30 [ 18.552001] zswap_shrinker_count+0xb4/0x120 [ 18.552470] do_shrink_slab+0x37/0x360 [ 18.552885] shrink_slab+0xc7/0x3c0 [ 18.553274] ? try_to_shrink_lruvec+0x1bf/0x290 [ 18.553772] shrink_one+0x123/0x1b0 [ 18.554163] shrink_node+0xa7f/0xbc0 [ 18.554557] ? psi_group_change+0x213/0x3c0 [ 18.555014] balance_pgdat+0x523/0x960 [ 18.555442] ? psi_task_switch+0xd6/0x230 [ 18.555886] ? __switch_to_asm+0x3e/0x70 [ 18.556320] ? finish_task_switch.isra.0+0x94/0x2f0 [ 18.556852] kswapd+0x20d/0x400 [ 18.557203] ? __pfx_autoremove_wake_function+0x10/0x10 [ 18.557774] ? __pfx_kswapd+0x10/0x10 [ 18.558174] kthread+0xe5/0x120 [ 18.558525] ? __pfx_kthread+0x10/0x10 [ 18.558943] ret_from_fork+0x31/0x50 [ 18.559337] ? __pfx_kthread+0x10/0x10 [ 18.559752] ret_from_fork_asm+0x1b/0x30 [ 18.560183] </TASK> [ 18.560429] Modules linked in: vfat fat dm_mod btrfs blake2b_generic xor raid6_pq virtio_snd snd_pcm snd_timer snd soundcore libcrc32c crc8 crc7 crc4 crc_itu_t virtiofs fuse ext4 mbcache jbd2 virtio_vdpa vdpa virtio_mmio virtio_mem virtio_input virtio_dma_buf virtio_balloon virtio_vfio_pci virtio_pci virtio_pci_modern_dev virtio_pci_legacy_dev vfio_pci_core irqbypass vfio iommufd virtio_scsi virtio_rpmsg_bus rpmsg_ns rpmsg_core nd_virtio virtio_net net_failover failover virtio_iommu virtio_crypto crypto_engine virtio_console virtio_rng virtio_bt bluetooth rfkill crc16 ecdh_generic virtio_blk ata_piix trusted asn1_encoder tee crc32c_generic crc32_generic crct10dif_pclmul crc32c_intel crc32_pclmul [ 18.566908] CR2: 0000000000000600 [ 18.567254] ---[ end trace 0000000000000000 ]--- This is a recent kernel bug, we saw it on Arch too: https://github.com/libguestfs/libguestfs/issues/139#issuecomment-2056607791 It's a kernel bug, we have no idea yet what causes it.
Thanks for moving on this so quickly, Richard. I had wondered whether to report it directly as a kernel bug, but decided reporting it here in the first instance.
I have reported my findings in https://lore.kernel.org/all/3iccc6vjl5gminut3lvpl4va2lbnsgku5ei2d7ylftoofy3n2v@gcfdvtsq6dx2/
Hi, I recently updated from 6.8.5 to 6.8.9 and started to have problems with zswap. I noticed "mm: zswap: fix shrinker NULL crash with cgroup_disable=memory" is the only change to zswap between these two versions. So although I have no idea if this is related, I decided to report it here. Here are some stack traces: A hang inside zswap: INFO: task [redacted]:2870 blocked for more than 122 seconds. Tainted: P O 6.8.9-zen1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:[redacted] state:D stack:0 pid:2870 tgid:2870 ppid:2788 flags:0x00004002 Call Trace: <TASK> __schedule+0x5fe/0xaf0 schedule+0x6e/0xc0 schedule_preempt_disabled+0x15/0x30 __mutex_lock+0x28c/0x6a0 __zswap_load+0x5d/0x1f0 ? srso_alias_return_thunk+0x5/0xfbef5 ? xas_store+0x3f1/0x5c0 zswap_load+0xbd/0x270 ? srso_alias_return_thunk+0x5/0xfbef5 swap_read_folio+0x75/0x6e0 ? workingset_refault+0x26e/0x4d0 ? srso_alias_return_thunk+0x5/0xfbef5 ? __read_swap_cache_async+0x1fe/0x2b0 swapin_readahead+0x437/0x450 ? srso_alias_return_thunk+0x5/0xfbef5 ? __filemap_get_folio+0x3b/0x320 do_swap_page+0x1a6/0xaa0 ? __pte_offset_map+0x1d/0xf0 handle_mm_fault+0x7f4/0xc60 do_user_addr_fault+0x46a/0x690 exc_page_fault+0x62/0x150 asm_exc_page_fault+0x26/0x30 RIP: 0033:0x78c1145a5f51 RSP: 002b:00007ffc0fd057b0 EFLAGS: 00010206 RAX: 0000000001e28840 RBX: 0000000000000020 RCX: 000078c1146ecb30 RDX: 000078c1146ecb40 RSI: 000000000211b890 RDI: 000078c1146ecac0 RBP: 000078c1146ecac0 R08: 0000000000000005 R09: 0000000000000004 R10: 000078c1146ecac0 R11: 0000000001ed0db0 R12: 0000000000000002 R13: 0000000000000014 R14: 0000000000000039 R15: 0000000000000000 </TASK> And a kernel BUG: kernel BUG at mm/zswap.c:1395! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 4 PID: 283 Comm: kswapd0 Kdump: loaded Tainted: P W O 6.8.9-zen1 Hardware name: [redacted] RIP: 0010:__zswap_load+0x1dc/0x1f0 Code: 04 25 28 00 00 00 48 3b 44 24 48 75 14 48 83 c4 50 5b 41 5c 41 5d 41 5e 41 5f 5d e9 a9 39 9c 00 cc e8 68 cc 6e 00 90 0f 0b 90 <0f> 0b 90 0f 0b cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 66 0f RSP: 0018:ffff91ab80b83910 EFLAGS: 00010282 RAX: 00000000ffffffea RBX: ffff8c3f02a42c30 RCX: 00000000ffffffea RDX: 000000000000000a RSI: ffff91ab81343000 RDI: ffff8c4a2da2a4c0 RBP: ffff91ab80b83938 R08: 0000000000000005 R09: ffff91ab8210d510 R10: ffff91ab8210d4c0 R11: ffff91ab82106020 R12: ffffdcc9fb95ffc0 R13: ffff91ab80b83918 R14: ffff8c3b8008c1c0 R15: ffff8c4a2da3eb58 FS: 0000000000000000(0000) GS:ffff8c4a2da00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fffb956d008 CR3: 0000000bffe46000 CR4: 0000000000f50ef0 PKRU: 55555554 Call Trace: <TASK> ? __die_body+0x68/0xb0 ? die+0xa4/0xd0 ? do_trap+0xa5/0x180 ? __zswap_load+0x1dc/0x1f0 ? __zswap_load+0x1dc/0x1f0 ? handle_invalid_op+0x65/0x80 ? __zswap_load+0x1dc/0x1f0 ? exc_invalid_op+0x39/0x50 ? asm_exc_invalid_op+0x1a/0x20 ? __zswap_load+0x1dc/0x1f0 shrink_memcg_cb+0x25c/0x530 ? sysvec_call_function_single+0xe/0x80 ? zswap_shrinker_count+0x170/0x170 __list_lru_walk_one+0x110/0x220 ? zswap_shrinker_count+0x170/0x170 list_lru_walk_one+0x5e/0x80 zswap_shrinker_scan+0xc4/0x140 do_shrink_slab+0x160/0x330 shrink_slab+0x354/0x4d0 shrink_one+0xbe/0x1f0 shrink_node+0xcab/0xea0 kswapd+0x95d/0xf70 ? srso_alias_return_thunk+0x5/0xfbef5 ? __schedule+0x606/0xaf0 ? shrink_all_memory+0x170/0x170 kthread+0xe8/0x110 ? kthread_blkcg+0x40/0x40 ret_from_fork+0x37/0x50 ? kthread_blkcg+0x40/0x40 ret_from_fork_asm+0x11/0x20 </TASK>
(In reply to Christian Heusel from comment #3) > I have reported my findings in > https://lore.kernel.org/all/ > 3iccc6vjl5gminut3lvpl4va2lbnsgku5ei2d7ylftoofy3n2v@gcfdvtsq6dx2/ https://geometrydashsubzero.io You know, your finding is so useful to me. So great!