1599324 – vmscan: shrink_slab: gfs2_glock_shrink_scan+0x0/0x2d0 negative objects to delete

Bug 1599324 - vmscan: shrink_slab: gfs2_glock_shrink_scan+0x0/0x2d0 negative objects to delete

Summary: vmscan: shrink_slab: gfs2_glock_shrink_scan+0x0/0x2d0 negative objects to delete

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Robert Peterson
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-07-09 13:56 UTC by Andrew Price
Modified:	2023-12-07 00:31 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2019-10-16 15:44:57 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Proposed upstream and rhel8 patch (2.66 KB, patch) 2019-10-16 12:36 UTC, Robert Peterson	no flags	Details \| Diff
View All

Description Andrew Price 2018-07-09 13:56:54 UTC

Description of problem:

Running fsstress (from the xfstests tree) for a while, umounting the gfs2 fs and running fsck.gfs2 will cause vmscan warnings about "negative objects to delete" to be printed. 

Version-Release number of selected component (if applicable):

Current mainline kernel (v4.18-rc4)

How reproducible:
Not 100% but chances increase as the running time of fsstress increases.

Steps to Reproduce:
1. mkfs.gfs2 -p lock_nolock /dev/foo
2. mount /dev/foo /mnt/test
3. ./fsstress -d /mnt/test/ -p3 -l0 -n 10000000 -v -c
4. (Wait about an hour)
5. ^C
6. umount /mnt/test
7. fsck.gfs2 /dev/foo

Actual results:

At some point starting in pass 1, a lot of these warnings hit the console:

[10322.608787] vmscan: shrink_slab: gfs2_glock_shrink_scan+0x0/0x2d0 negative objects to delete nr=-9223372036854775718
[10322.611004] vmscan: shrink_slab: gfs2_glock_shrink_scan+0x0/0x2d0 negative objects to delete nr=-9223372036854775718
[10322.615502] vmscan: shrink_slab: gfs2_glock_shrink_scan+0x0/0x2d0 negative objects to delete nr=-9223372036854775718
[10322.619220] vmscan: shrink_slab: gfs2_glock_shrink_scan+0x0/0x2d0 negative objects to delete nr=-9223372036854775718
...

Expected results:

No warnings

Additional info:

https://www.redhat.com/archives/cluster-devel/2018-April/msg00019.html

Comment 1 Robert Peterson 2019-10-15 17:09:12 UTC

I'm guessing this is a simple overflow problem. GFS2 uses an atomic_t to keep track of
the number of items on the lru list. That's read as an int (4 bytes) for the
calculation. I bet this goes away if we change that to an atomic64_t.

I'll whip up a quick patch and maybe Andy can test it.
Reassigning to myself.

Comment 4 Robert Peterson 2019-10-16 12:36:06 UTC

Created attachment 1626473 [details]
Proposed upstream and rhel8 patch

Andy, here's the full patch.

Comment 5 Andrew Price 2019-10-16 15:44:57 UTC

I can't reproduce this at all any more. Scanning through commits it seems very likely that this one would have fixed the issue:

  commit 7881ef3f33bb80f459ea6020d1e021fc524a6348
  Author: Ross Lagerwall <ross.lagerwall>
  Date:   Wed Mar 27 17:09:17 2019 +0000
  
      gfs2: Fix lru_count going negative

So I'm going to close this on that basis.

Note You need to log in before you can comment on or make changes to this bug.