Description of problem: Running fsstress (from the xfstests tree) for a while, umounting the gfs2 fs and running fsck.gfs2 will cause vmscan warnings about "negative objects to delete" to be printed. Version-Release number of selected component (if applicable): Current mainline kernel (v4.18-rc4) How reproducible: Not 100% but chances increase as the running time of fsstress increases. Steps to Reproduce: 1. mkfs.gfs2 -p lock_nolock /dev/foo 2. mount /dev/foo /mnt/test 3. ./fsstress -d /mnt/test/ -p3 -l0 -n 10000000 -v -c 4. (Wait about an hour) 5. ^C 6. umount /mnt/test 7. fsck.gfs2 /dev/foo Actual results: At some point starting in pass 1, a lot of these warnings hit the console: [10322.608787] vmscan: shrink_slab: gfs2_glock_shrink_scan+0x0/0x2d0 negative objects to delete nr=-9223372036854775718 [10322.611004] vmscan: shrink_slab: gfs2_glock_shrink_scan+0x0/0x2d0 negative objects to delete nr=-9223372036854775718 [10322.615502] vmscan: shrink_slab: gfs2_glock_shrink_scan+0x0/0x2d0 negative objects to delete nr=-9223372036854775718 [10322.619220] vmscan: shrink_slab: gfs2_glock_shrink_scan+0x0/0x2d0 negative objects to delete nr=-9223372036854775718 ... Expected results: No warnings Additional info: https://www.redhat.com/archives/cluster-devel/2018-April/msg00019.html
I'm guessing this is a simple overflow problem. GFS2 uses an atomic_t to keep track of the number of items on the lru list. That's read as an int (4 bytes) for the calculation. I bet this goes away if we change that to an atomic64_t. I'll whip up a quick patch and maybe Andy can test it. Reassigning to myself.
Created attachment 1626473 [details] Proposed upstream and rhel8 patch Andy, here's the full patch.
I can't reproduce this at all any more. Scanning through commits it seems very likely that this one would have fixed the issue: commit 7881ef3f33bb80f459ea6020d1e021fc524a6348 Author: Ross Lagerwall <ross.lagerwall> Date: Wed Mar 27 17:09:17 2019 +0000 gfs2: Fix lru_count going negative So I'm going to close this on that basis.