Bug 2015755

Summary: zram: zram leak with warning when running zram02.sh in ltp
Product: Red Hat Enterprise Linux 8 Reporter: Ming Lei <minlei>
Component: kernelAssignee: Ming Lei <minlei>
kernel sub component: Block Layer QA Contact: ChanghuiZhong <czhong>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: czhong, jmoyer, yizhan
Version: 8.5Keywords: Bugfix
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-4.18.0-353.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2015754 Environment:
Last Closed: 2022-05-10 15:02:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2015754    
Bug Blocks:    

Description Ming Lei 2021-10-20 02:45:46 UTC
+++ This bug was initially created as a clone of Bug #2015754 +++

Description of problem:

https://lore.kernel.org/linux-block/20210927163805.808907-1-mcgrof@kernel.org/T/#m12965e4b1c6ef5ae19f5fc019493a37b1993c2f6

When running the following script, kernel warning of 'Error: Removing
state 63 which has instances left.' will be triggered and the rhel9
test VM will reboot:

cd testcases/kernel/device-drivers/zram

while true; do
        PATH=$PATH:$PWD:$PWD/../../../lib/ ./zram02.sh;
done &

while true; do
        PATH=$MYPATH:$PWD:$PWD/../../../lib/ ./zram02.sh;
done


[   38.765210] ------------[ cut here ]------------^M
[   38.766161] Error: Removing state 63 which has instances left.^M
[   38.767287] WARNING: CPU: 15 PID: 1602 at kernel/cpu.c:2127 __cpuhp_remove_state_cpuslocked+0xea/0xf0^M
[   38.769042] Modules linked in: zram(-) rfkill nls_utf8 isofs vfat fat intel_rapl_msr intel_rapl_common isst_if_common nfit libnvdimm kvm_intel bochs_drm drm_vram_helper drm_ttm_helper kvm ttm drm_kms_helper irqbypass rapl ppdev syscopyarea iTCO_wdt sysfillrect sysimgblt iTCO_vendor_support fb_sys_fops i2c_i801 cec parport_pc i2c_smbus lpc_ich joydev parport pcspkr drm fuse xfs libcrc32c sr_mod sd_mod cdrom sg ahci libahci crct10dif_pclmul crc32_pclmul nvme uas crc32c_intel libata virtio_net nvme_core usb_storage ghash_clmulni_intel serio_raw virtio_scsi net_failover virtio_blk t10_pi failover sunrpc dm_mirror dm_region_hash dm_log dm_mod^M
[   38.779575] CPU: 15 PID: 1602 Comm: rmmod Kdump: loaded Not tainted 5.14.0-1.el9.x86_64 #1^M
[   38.781691] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1.fc33 04/01/2014^M
[   38.783771] RIP: 0010:__cpuhp_remove_state_cpuslocked+0xea/0xf0^M
[   38.785266] Code: c6 43 21 00 48 c7 43 18 00 00 00 00 5b 5d 41 5c 41 5d 41 5e e9 87 fd 95 00 0f 0b 44 89 e6 48 c7 c7 d8 62 52 b8 e8 d9 b4 90 00 <0f> 0b eb b0 66 90 0f 1f 44 00 00 55 89 fd 53 89 f3 e8 20 f1 95 00^M
[   38.789385] RSP: 0018:ffffaac300efbe98 EFLAGS: 00010282^M



Version-Release number of selected component (if applicable):


How reproducible:

100%


Steps to Reproduce:

See above.


Actual results:

kernel warning and panic

Expected results:

no kernel warning and panic


Additional info:

The issue can be reproduced on upstream v5.15-rc kernel.

Comment 2 Ming Lei 2021-11-16 00:21:43 UTC
Hi Yi,

Can you help to review and ack this BZ since Changhui is on PTO this week?

BTW, this one blocks another MR(!1678) too.

Thanks,

Comment 3 ChanghuiZhong 2021-11-16 01:06:08 UTC
(In reply to Ming Lei from comment #2)
> Hi Yi,
> 
> Can you help to review and ack this BZ since Changhui is on PTO this week?
> 
> BTW, this one blocks another MR(!1678) too.
> 
> Thanks,

thanks Ming and Yi,Yi is also in PTO this week.
I will feedback the test result later

thanks

Comment 4 Ming Lei 2021-11-16 01:29:54 UTC
(In reply to ChanghuiZhong from comment #3)
> (In reply to Ming Lei from comment #2)
> > Hi Yi,
> > 
> > Can you help to review and ack this BZ since Changhui is on PTO this week?
> > 
> > BTW, this one blocks another MR(!1678) too.
> > 
> > Thanks,
> 
> thanks Ming and Yi,Yi is also in PTO this week.
> I will feedback the test result later

Sorry for disturbing you guys, and thanks for handling these things, have a
nice PTO!

Comment 5 ChanghuiZhong 2021-11-16 14:36:48 UTC
reproduce this issue on 4.18.0-348.6.el8.x86_64

[ 1576.883529] ------------[ cut here ]------------ 
[ 1576.907691] Error: Removing state 61 which has instances left. 
[ 1576.937921] WARNING: CPU: 11 PID: 31190 at kernel/cpu.c:1905 __cpuhp_remove_state_cpuslocked+0xaf/0x100 
[ 1576.980289] Modules linked in: zram(-) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc vfat fat dm_multipath intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif iTCO_wdt irqbypass iTCO_vendor_support crct10dif_pclmul crc32_pclmul mgag200 i2c_algo_bit ghash_clmulni_intel rapl drm_kms_helper intel_cstate syscopyarea sysfillrect intel_uncore pcspkr sysimgblt fb_sys_fops i2c_i801 drm lpc_ich acpi_ipmi hpilo hpwdt ioatdma ipmi_si dca wmi ipmi_devintf ipmi_msghandler acpi_tad acpi_power_meter xfs libcrc32c sd_mod sg ahci libahci crc32c_intel tg3 libata nvme hpsa nvme_core scsi_transport_sas t10_pi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: zram] 
[ 1577.279325] CPU: 11 PID: 31190 Comm: rmmod Kdump: loaded Not tainted 4.18.0-348.6.el8.x86_64 #1 
[ 1577.318439] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 05/21/2018 
[ 1577.355698] RIP: 0010:__cpuhp_remove_state_cpuslocked+0xaf/0x100 
[ 1577.382727] Code: c9 31 d2 44 89 e6 89 df e8 1e f9 ff ff eb c4 48 8b 85 b8 ce c4 85 48 85 c0 74 11 44 89 e6 48 c7 c7 d8 f7 6c 85 e8 fa df ff ff <0f> 0b 5b 48 c7 85 a8 ce c4 85 00 00 00 00 48 c7 c7 60 ee c4 85 48 
[ 1577.473309] RSP: 0018:ffffacfd4303bea8 EFLAGS: 00010286 
[ 1577.496420] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 
[ 1577.528556] RDX: ffff9ba7afce7320 RSI: ffff9ba7afcd6858 RDI: ffff9ba7afcd6858 
[ 1577.560641] RBP: 0000000000000988 R08: 0000000000000000 R09: c0000000ffff7fff 
[ 1577.592656] R10: 0000000000000001 R11: ffffacfd4303bcc0 R12: 000000000000003d 
[ 1577.625195] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 
[ 1577.657508] FS:  00007f864ae27740(0000) GS:ffff9ba7afcc0000(0000) knlGS:0000000000000000 
[ 1577.693998] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[ 1577.719819] CR2: 00007f8649d40b70 CR3: 0000000139f46006 CR4: 00000000001706e0 
[ 1577.751920] Call Trace: 
[ 1577.762890]  __cpuhp_remove_state+0x2e/0x80 
[ 1577.781667]  __x64_sys_delete_module+0x139/0x280 
[ 1577.802372]  do_syscall_64+0x5b/0x1a0 
[ 1577.818791]  entry_SYSCALL_64_after_hwframe+0x65/0xca 
[ 1577.841651] RIP: 0033:0x7f8649e0283b 
[ 1577.858165] Code: 73 01 c3 48 8b 0d 4d 16 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1d 16 2c 00 f7 d8 64 89 01 48 
[ 1577.950045] RSP: 002b:00007ffc29a54e18 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 
[ 1577.985434] RAX: ffffffffffffffda RBX: 000055fea108e800 RCX: 00007f8649e0283b 
[ 1578.018076] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055fea108e868 
[ 1578.050432] RBP: 0000000000000000 R08: 00007ffc29a53d91 R09: 0000000000000000 
[ 1578.083148] R10: 00007f8649e763a0 R11: 0000000000000206 R12: 00007ffc29a55040 
[ 1578.116481] R13: 00007ffc29a56ef4 R14: 000055fea108e2a0 R15: 000055fea108e800 
[ 1578.148649] ---[ end trace bd3570ed5228c4b7 ]--- 



and confirmed that this issue can not be reproduced on 4.18.0-349.el8.mr1662_211115_1341.x86_64.
there is no kernel warning and panic.

Comment 6 ChanghuiZhong 2021-11-16 14:42:57 UTC
(In reply to Ming Lei from comment #2)
> Hi Yi,
> 
> Can you help to review and ack this BZ since Changhui is on PTO this week?
> 
> BTW, this one blocks another MR(!1678) too.
> 
> Thanks,

Hello,Ming

I can not find MR!1678 in https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests,
Which bz is this about?

Thanks

Comment 7 Ming Lei 2021-11-16 15:24:28 UTC
(In reply to ChanghuiZhong from comment #6)
> (In reply to Ming Lei from comment #2)
> > Hi Yi,
> > 
> > Can you help to review and ack this BZ since Changhui is on PTO this week?
> > 
> > BTW, this one blocks another MR(!1678) too.
> > 
> > Thanks,
> 
> Hello,Ming
> 
> I can not find MR!1678 in
> https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests,
> Which bz is this about?
> 
> Thanks

https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests/1648

Comment 8 ChanghuiZhong 2021-11-17 01:58:22 UTC
(In reply to Ming Lei from comment #7)
> (In reply to ChanghuiZhong from comment #6)
> > (In reply to Ming Lei from comment #2)
> > > Hi Yi,
> > > 
> > > Can you help to review and ack this BZ since Changhui is on PTO this week?
> > > 
> > > BTW, this one blocks another MR(!1678) too.
> > > 
> > > Thanks,
> > 
> > Hello,Ming
> > 
> > I can not find MR!1678 in
> > https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests,
> > Which bz is this about?
> > 
> > Thanks
> 
> https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests/1648

thanks Ming
this is an issue about memory management, looks there is nothing I can do,
other team member will handle it

Comment 12 ChanghuiZhong 2021-12-01 08:49:06 UTC
verified this issue can not be reproduced on kernel-4.18.0-353.el8,
there is no kernel warning and panic.

fix patches has included to kernel tree:
$ git log kernel-4.18.0-353.el8 --oneline --grep=2015755
530d462bef59 Merge: zram: several bug fixes
2951396a37b4 zram: replace fsync_bdev with sync_blockdev
91f0ad82123b zram: avoid race between zram_remove and disksize_store
2278cffd63b9 zram: don't fail to remove zram during unloading module
4cc586d5e7d3 zram: fix race between zram_reset_device() and disksize_store()
3eb7b38aa261 zram: register default groups with device_add_disk()


move to verified

Comment 14 errata-xmlrpc 2022-05-10 15:02:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1988