2215429 – Panic in __percpu_counter_sum via nfsd_reply_cache_stats_show

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2215429 - Panic in __percpu_counter_sum via nfsd_reply_cache_stats_show

Summary: Panic in __percpu_counter_sum via nfsd_reply_cache_stats_show

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	9.2
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Jeff Layton
QA Contact:	JianHong Yin
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-06-16 01:52 UTC by Eirik Fuller
Modified:	2023-11-07 11:01 UTC (History)
CC List:	8 users (show)
Fixed In Version:	kernel-5.14.0-334.el9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-07 08:48:08 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net (3.89 KB, patch) 2023-06-16 14:32 UTC, Jeff Layton	no flags	Details \| Diff
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Gitlab	redhat/centos-stream/src/kernel centos-stream-9 merge_requests 2735	None	opened	nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net	2023-06-27 10:38:50 UTC
Red Hat Issue Tracker	RHELPLAN-160026	None	None	None	2023-06-16 01:54:43 UTC
Red Hat Product Errata	RHSA-2023:6583	None	None	None	2023-11-07 08:48:49 UTC

Description Eirik Fuller 2023-06-16 01:52:51 UTC

Description of problem:
nfsd_reply_cache_stats_show does not check whether per-netns stats counters are initialized

Version-Release number of selected component (if applicable):
kernel-5.14.0-284.11.1.el9_2

How reproducible:
Easily reproducible with suitable load

Steps to Reproduce:
1. Run the NetworkManager-ci tests on aarch64

Actual results:
Unable to handle kernel paging request at virtual address ffff2c5edffcf000

Expected results:
The system should not panic

Additional info:

https://gitlab.freedesktop.org/NetworkManager/NetworkManager-ci has the source code for the NetworkManager-ci tests.

The one recipe in https://beaker.engineering.redhat.com/jobs/7973547 reported a panic during NetworkManager-ci testing.

https://beaker.engineering.redhat.com/recipes/14091713/logs/console.log has the full panic message; an excerpt follows.


[15022.250125] CPU: 1 PID: 465528 Comm: sosreport Kdump: loaded Tainted: G           OE    --------  ---  5.14.0-284.11.1.el9_2.aarch64 #1
[15022.250938] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[15022.251405] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[15022.251871] pc : __percpu_counter_sum+0x4c/0x90
[15022.252182] lr : __percpu_counter_sum+0x5c/0x90
[15022.252490] sp : ffff80000b053a80
[15022.252714] x29: ffff80000b053a80 x28: 0000000000000001 x27: 0000000000400cc0
[15022.253191] x26: 000000007ffff000 x25: ffff0000c514d308 x24: ffffd3a49e62ec48
[15022.253667] x23: 0000000000000000 x22: 0000000000000003 x21: ffffd3a49e6342e0
[15022.254147] x20: ffff00024d9e3218 x19: 0000000000000000 x18: ffffffffffffffff
[15022.254625] x17: 0000000000000000 x16: ffffd3a49cf74b40 x15: ffff0001c1e7c000
[15022.255104] x14: 0000000000000000 x13: ffff0001c1e7b04e x12: 20203a7374656b63
[15022.255586] x11: 0000000000000000 x10: 000000000000000a x9 : ffffd3a49cf74b64
[15022.256065] x8 : 000000000000000a x7 : ffffffffffffffff x6 : 0000000000000000
[15022.256544] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[15022.257022] x2 : 0000000000000000 x1 : ffff2c5edffcf000 x0 : 0000000000000000
[15022.257501] Call trace:
[15022.257668]  __percpu_counter_sum+0x4c/0x90
[15022.257953]  nfsd_reply_cache_stats_show+0x90/0x1c0 [nfsd]
[15022.258346]  seq_read_iter+0x1cc/0x4e0
[15022.258601]  seq_read+0xc4/0x104
[15022.258822]  vfs_read+0xb4/0x1d4
[15022.259044]  ksys_read+0x6c/0xf4
[15022.259266]  __arm64_sys_read+0x20/0x2c
[15022.259526]  invoke_syscall.constprop.0+0x7c/0xd0
[15022.259847]  el0_svc_common.constprop.0+0x154/0x160
[15022.260179]  do_el0_svc+0x2c/0xb0
[15022.260405]  el0_svc+0x38/0x190
[15022.260623]  el0t_64_sync_handler+0xb4/0x130
[15022.260911]  el0t_64_sync+0x174/0x178
[15022.261168] Code: f9400693 14000005 f860daa1 f9401282 (b8a26822)


As indicated in that excerpt, the command which triggered the panic is sosreport, in its attempt to gather NFS stats.

The following excerpt of a crash session shows multiple sosreport processes active at the time of the panic, which presumably makes this panic more likely than testing with a single sosreport process.


      KERNEL: /usr/lib/debug/lib/modules/5.14.0-284.11.1.el9_2.aarch64/vmlinux  [TAINTED]
    DUMPFILE: /var/crash/127.0.0.1-2023-06-14-14:34:51/vmcore  [PARTIAL DUMP]
        CPUS: 3
        DATE: Wed Jun 14 14:34:49 EDT 2023
      UPTIME: 04:10:22
LOAD AVERAGE: 0.68, 0.65, 0.68
       TASKS: 439
    NODENAME: ampere-mtsnow-altra-02-vm-08.lab.eng.rdu2.redhat.com
     RELEASE: 5.14.0-284.11.1.el9_2.aarch64
     VERSION: #1 SMP PREEMPT_DYNAMIC Wed Apr 12 11:23:11 EDT 2023
     MACHINE: aarch64  (unknown Mhz)
      MEMORY: 14 GB
       PANIC: "Unable to handle kernel paging request at virtual address ffff2c5edffcf000"
         PID: 465528
     COMMAND: "sosreport"
        TASK: ffff000187c52a00  [THREAD_INFO: ffff000187c52a00]
         CPU: 1
       STATE: TASK_RUNNING (PANIC)

crash> ps sosreport
      PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS  COMM
   464762  456731   0  ffff0002282cd400  IN   0.4   952676    61216  sosreport
   465035  456731   1  ffff000188092a00  IN   0.4   952676    61216  sosreport
   465036  456731   1  ffff000188095400  IN   0.4   952676    61216  sosreport
   465037  456731   1  ffff0000c0325400  RU   0.4   952676    61216  sosreport
   465038  456731   1  ffff0001c125bf00  IN   0.4   952676    61216  sosreport
   465087  456731   0  ffff000228889500  RU   0.4   952676    61216  sosreport
   465105  456731   1  ffff0000c66d1500  IN   0.4   952676    61216  sosreport
   465429  456731   0  ffff00024d1a0000  RU   0.4   952676    61216  sosreport
   465435  456731   2  ffff000148016900  IN   0.4   952676    61216  sosreport
   465441  456731   0  ffff0000c493d400  IN   0.4   952676    61216  sosreport
>  465528  456731   1  ffff000187c52a00  RU   0.4   952676    61216  sosreport
crash> 


The virtual address in the Actual results suggests a null percpu pointer, based on the following information from the vmcore.


crash> printf "%lx\n", __per_cpu_offset[0]
ffff2c5edffcf000
crash> 


That address also matches the x1 register value in the panic message, which matches the instruction at the pc value:


crash> x/i __percpu_counter_sum+0x4c
   0xffffd3a49cf74b8c <__percpu_counter_sum+76>:        ldrsw   x2, [x1, x2]
crash> 


The null percpu counters pointers can also be seen in the vmcore, as follows.


crash> mod -s nfsd
     MODULE       NAME                          BASE           SIZE  OBJECT FILE
ffffd3a42c97ee40  nfsd                    ffffd3a42c90b000   638976  /usr/lib/debug/lib/modules/5.14.0-284.11.1.el9_2.aarch64/kernel/fs/nfsd/nfsd.ko.debug 
crash> gdb set $nn = (struct nfsd_net *) init_net.gen.ptr[nfsd_net_id]
crash> print $nn->nfsd_net_up
$1 = false
crash> print $nn->counter[NFSD_NET_DRC_MEM_USAGE]
$2 = {
  lock = {
    raw_lock = {
      {
        val = {
          counter = 1
        },
        {
          locked = 1 '\001',
          pending = 0 '\000'
        },
        {
          locked_pending = 1,
          tail = 0
        }
      }
    }
  },
  count = 0,
  list = {
    next = 0xdead000000000100,
    prev = 0xdead000000000122
  },
  counters = 0x0
}
crash> 


The first percpu counters pointer nfsd_reply_cache_stats_show attempts to use is 0x0. The lock counter is presumably 1 because __percpu_counter_sum incremented it. The other counter also has a null pointer, but no increment in its lock.


crash> print $nn->counter[NFSD_NET_PAYLOAD_MISSES]
$3 = {
  lock = {
    raw_lock = {
      {
        val = {
          counter = 0
        },
        {
          locked = 0 '\000',
          pending = 0 '\000'
        },
        {
          locked_pending = 0,
          tail = 0
        }
      }
    }
  },
  count = 0,
  list = {
    next = 0xdead000000000100,
    prev = 0xdead000000000122
  },
  counters = 0x0
}
crash> 


It makes sense that the fault address is __per_cpu_offset[0] because __percpu_counter_sum loops over CPUs.

Based on my reading of code this problem apparently exists across all RHEL9 kernels, but not RHEL8. The Version-Release number was chosen on the basis of the panic that brought this issue to my attention, but I've also triggered the panic with 5.14.0-325.el9.aarch64, and I won't be surprised if the Fedora Rawhide recipe still running the same tests also panics (I have not found an upstream fix).

The apparent trigger of this panic is a combination of sosreport bombardment, and multiple network interfaces going up and down.

Comment 1 Jeff Layton 2023-06-16 11:22:29 UTC

Making this bug public, and cc'ing Chuck in case he has thoughts. Looking at nfsd_umount:

static void nfsd_umount(struct super_block *sb)
{
        struct net *net = sb->s_fs_info;

        nfsd_shutdown_threads(net);

        kill_litter_super(sb);
        put_net(net);
}

It looks like we hold a reference to the net while the nfsdfs superblock is mounted, so it seems unlikely that we'd tear down the net while someone was still scraping the file. Does this means that we're presenting the "filecache" file in nfsdfs before the reply cache has been initialized?

Eirik, are you able to tell from the core whether this happened while the net was being brought up or shut down?

Comment 2 Jeff Layton 2023-06-16 11:56:47 UTC

If it happens while the net was coming up then it should be sufficient to just make nfsd_reply_cache_stats_show check nfsd_net_up before trying to touch any of the stats. If it's happening on net shutdown though, then things are a little trickier, but I'm thinking that's not possible given that the superblock apparently holds a ref to the net.

Comment 3 Jeff Layton 2023-06-16 13:14:09 UTC

Ok, I think this is a regression from this patch: 

commit f5f9d4a314da88c0a5faa6d168bf69081b7a25ae
Author: Jeff Layton <jlayton>
Date:   Wed Jan 11 11:19:59 2023 -0500

    nfsd: move reply cache initialization into nfsd startup
    
    There's no need to start the reply cache before nfsd is up and running,
    and doing so means that we register a shrinker for every net namespace
    instead of just the ones where nfsd is running.
    
    Move it to the per-net nfsd startup instead.
    
    Reported-by: Dai Ngo <dai.ngo>
    Signed-off-by: Jeff Layton <jlayton>
    Signed-off-by: Chuck Lever <chuck.lever>

The filecache file is available before nfsd is started, so you can probably reproduce this by just creating a new net namespace, mounting up its /proc/fs/nfsd, and then read its filecache file without ever starting nfsd. We need to move the initialization of the percpu variables back into nfsd_init_net. I'll see if I can reproduce this and then send a patch upstream.

I'll mark this bug for 9.3, but I think we'll need a backport to 9.2.z as well.

Nice catch, Eirik!

Comment 4 Eirik Fuller 2023-06-16 13:34:45 UTC

This panic did not occur during Fedora Rawhide testing, with kernel 6.4.0-0.rc6.20230614gitb6dad5178cea.49.fc39; it's not yet clear whether that's due to an upstream fix, or the intermittent nature of this problem, but the panics have been consistent enough with RHEL9 kernels to suggest the former.

The panic did not occur with kernel 5.14.0-325.bz2215429.el9, which added just the following patch to kernel 5.14.0-325.el9:


diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index 041faa13b852..d02c2ac37cec 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -614,7 +614,7 @@ int nfsd_reply_cache_stats_show(struct seq_file *m, void *v)
                   atomic_read(&nn->num_drc_entries));
        seq_printf(m, "hash buckets:          %u\n", 1 << nn->maskbits);
        seq_printf(m, "mem usage:             %lld\n",
-                  percpu_counter_sum_positive(&nn->counter[NFSD_NET_DRC_MEM_USAGE]));
+                  nn->nfsd_net_up ? percpu_counter_sum_positive(&nn->counter[NFSD_NET_DRC_MEM_USAGE]) : 0);
        seq_printf(m, "cache hits:            %lld\n",
                   percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_HITS]));
        seq_printf(m, "cache misses:          %lld\n",
@@ -622,7 +622,7 @@ int nfsd_reply_cache_stats_show(struct seq_file *m, void *v)
        seq_printf(m, "not cached:            %lld\n",
                   percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE]));
        seq_printf(m, "payload misses:        %lld\n",
-                  percpu_counter_sum_positive(&nn->counter[NFSD_NET_PAYLOAD_MISSES]));
+                  nn->nfsd_net_up ? percpu_counter_sum_positive(&nn->counter[NFSD_NET_PAYLOAD_MISSES]) : 0);
        seq_printf(m, "longest chain len:     %u\n", nn->longest_chain);
        seq_printf(m, "cachesize at longest:  %u\n", nn->longest_chain_cachesize);
        return 0;

Comment 5 Jeff Layton 2023-06-16 14:32:46 UTC

Created attachment 1971173 [details]
nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net

Eirik, would you be able to test this patch? I think this should fix the underlying issue.

Comment 6 Eirik Fuller 2023-06-16 14:46:28 UTC

Yes, I'll test the patch.

The following command sequence triggers a panic with kernel 5.14.0-284.11.1.el9_2:


mount -t nfsd nfsd /proc/fs/nfsd
cat /proc/fs/nfsd/reply_cache_stats


That seems to be easily reproducible.

The first time I tried those commands on Fedora Rawhide I saw the following.


[root@localhost ~]# mount -t nfsd nfsd /proc/fs/nfsd
[root@localhost ~]# cat /proc/fs/nfsd/
clients/                 export_stats             max_block_size           nfsv4leasetime           pool_threads             supported_krb5_enctypes  unlock_ip                
export_features          filecache                max_connections          nfsv4recoverydir         portlist                 threads                  v4_end_grace             
exports                  filehandle               nfsv4gracetime           pool_stats               reply_cache_stats        unlock_filesystem        versions                 
[root@localhost ~]# cat /proc/fs/nfsd/reply_cache_stats 
Segmentation fault
[root@localhost ~]# 


(that intermediate output is bash from tab completion)

However, the second time I tried those commands, the system panicked (kernel 6.4.0-0.rc6.20230614gitb6dad5178cea.49.fc39.aarch64).

Comment 7 Jeff Layton 2023-06-16 16:06:20 UTC

Strange. I can't reproduce this on Fedora kernels on x86_64.

Basically, it appears that nn->counter is never NULL in my testing, though it seems like it should be. I added some trace_printk's to nfsd_reply_cache_stats_init (which is what should initialize nn->counter) and nfsd_reply_cache_stats_show. It looks like nn->counter is already initialized on this host, even though the nfsd_reply_cache_stats_init trace_printk doesn't fire until you start up the nfs server.

Is this issue arch-specific? Maybe percpu vars are implemented differently on x86 vs. ARM?

Comment 8 Jeff Layton 2023-06-16 16:12:53 UTC

FWIW, I also can't seem to reproduce this on centos9 x86_64 kernels. I'll see if I can check out an ARM host to test with.

Comment 9 Eirik Fuller 2023-06-16 16:50:42 UTC

Bug 2031604 comment 11 suggests a possible reason null percpu pointers don't trigger panics with x86_64, the upshot of which is "passing NULL to this_cpu_ptr returns fixed_percpu_data on x86_64", i.e. the resulting pointer is mapped (but pity the fool who tries to modify memory accessed that way :)

I have a scratch build with the patch from attachment 1971173 [details] ready for testing. I'll report the results of that testing here.

Comment 10 Jeff Layton 2023-06-16 17:05:39 UTC

(In reply to Eirik Fuller from comment #9)
> Red HatBug 2031604 comment 11 suggests a possible reason null percpu
> pointers don't trigger panics with x86_64, the upshot of which is "passing
> NULL to this_cpu_ptr returns fixed_percpu_data on x86_64", i.e. the
> resulting pointer is mapped (but pity the fool who tries to modify memory
> accessed that way :)
> 

Dear lord! That's certainly unexpected. I wonder if that's intentional, or if the behavior of this_cpu_ptr in this case is just a "happy" accident?

Many thanks for testing the patch. I'll plan to post it upstream if it looks good.

Comment 11 Eirik Fuller 2023-06-16 18:30:48 UTC

The NetworkManager-ci tests are still happily churning away on ampere-mtsnow-altra-02-vm-03, but in the meantime:


[root@ampere-mtsnow-altra-02-vm-03 ~]# uname -a
Linux ampere-mtsnow-altra-02-vm-03.lab.eng.rdu2.redhat.com 5.14.0-327.bz2215429.el9.aarch64 #1 SMP PREEMPT_DYNAMIC Fri Jun 16 11:26:40 EDT 2023 aarch64 aarch64 aarch64 GNU/Linux
[root@ampere-mtsnow-altra-02-vm-03 ~]# mount -t nfsd nfsd /proc/fs/nfsd
[root@ampere-mtsnow-altra-02-vm-03 ~]# cat /proc/fs/nfsd/reply_cache_stats
max entries:           0
num entries:           0
hash buckets:          1
mem usage:             0
cache hits:            0
cache misses:          0
not cached:            0
payload misses:        0
longest chain len:     0
cachesize at longest:  0
[root@ampere-mtsnow-altra-02-vm-03 ~]# umount /proc/fs/nfsd
[root@ampere-mtsnow-altra-02-vm-03 ~]#

Comment 12 Jeff Layton 2023-06-16 19:34:12 UTC

Thanks for testing it. Patch posted upstream:

    https://lore.kernel.org/linux-nfs/20230616191744.202292-1-jlayton@kernel.org/T/#u

Once this goes into mainline I'll make a MR for 9.3.

I'll also nominate this for 9.2.z as it looks like we took that patch that caused the regression into kernel-5.14.0-284.3.1.el9_2.

Comment 13 Eirik Fuller 2023-06-16 19:35:57 UTC

This panic also occurs on ppc64le:


[   20.115939] Kernel attempted to read user page (ffd2e0000) - exploit attempt? (uid: 0) 
[   20.115946] BUG: Unable to handle kernel data access on read at 0xffd2e0000 
[   20.115947] Faulting instruction address: 0xc00000000086fa50 
[   20.115950] Oops: Kernel access of bad area, sig: 11 [#1] 
[   20.115968] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries 
[   20.115973] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace bonding tls rfkill sunrpc pseries_rng drm fuse drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp vmx_crypto dm_mirror dm_region_hash dm_log dm_mod 
[   20.116002] CPU: 3 PID: 4049 Comm: cat Kdump: loaded Not tainted 5.14.0-327.el9.ppc64le #1 
[   20.116007] NIP:  c00000000086fa50 LR: c00000000086fa6c CTR: c00000000086f9e0 
[   20.116011] REGS: c000000006f93810 TRAP: 0300   Not tainted  (5.14.0-327.el9.ppc64le) 
[   20.116015] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 48222401  XER: 20040000 
[   20.116025] CFAR: c00000000086fa7c DAR: 0000000ffd2e0000 DSISR: 40000000 IRQMASK: 1  
[   20.116025] GPR00: c00000000086fa10 c000000006f93ab0 c000000002b3bf00 0000000000000000  
[   20.116025] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
[   20.116025] GPR08: c000000002b73060 0000000000000000 0000000ffd2e0000 c0080000033200c8  
[   20.116025] GPR12: c00000000086f9e0 c000000fffffd480 0000000000000000 0000000000000000  
[   20.116025] GPR16: 0000000020000000 0000000000000000 0000000000020000 0000000000000001  
[   20.116025] GPR20: 00007ffffcb56068 0000000000400cc0 c00000009458eb88 000000007fff0000  
[   20.116025] GPR24: fffffffffffff000 0000000000000000 c00000009458eb78 0000000000000000  
[   20.116025] GPR28: c000000002b771b8 c000000002b778b0 c00000000b9a8e18 0000000000000000  
[   20.116074] NIP [c00000000086fa50] __percpu_counter_sum+0x70/0xe0 
[   20.116082] LR [c00000000086fa6c] __percpu_counter_sum+0x8c/0xe0 
[   20.116087] Call Trace: 
[   20.116089] [c000000006f93ab0] [c00000000b9a8c00] 0xc00000000b9a8c00 (unreliable) 
[   20.116096] [c000000006f93b00] [c0080000032d5ff8] nfsd_reply_cache_stats_show+0xb0/0x1f0 [nfsd] 
[   20.116126] [c000000006f93b80] [c0000000005ccafc] seq_read_iter+0x25c/0x6b0 
[   20.116132] [c000000006f93c60] [c0000000005cd038] seq_read+0xe8/0x150 
[   20.116137] [c000000006f93d10] [c000000000584648] vfs_read+0xc8/0x240 
[   20.116142] [c000000006f93d60] [c000000000584e14] ksys_read+0x84/0x140 
[   20.116148] [c000000006f93db0] [c00000000002f434] system_call_exception+0x164/0x310 
[   20.116154] [c000000006f93e10] [c00000000000bfe8] system_call_vectored_common+0xe8/0x278 
[   20.116161] --- interrupt: 3000 at 0x7fff9013a8b4 
[   20.116164] NIP:  00007fff9013a8b4 LR: 0000000000000000 CTR: 0000000000000000 
[   20.116167] REGS: c000000006f93e80 TRAP: 3000   Not tainted  (5.14.0-327.el9.ppc64le) 
[   20.116171] MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 42222408  XER: 00000000 
[   20.116183] IRQMASK: 0  
[   20.116183] GPR00: 0000000000000003 00007ffffcb559b0 0000000109d87f00 0000000000000003  
[   20.116183] GPR04: 00007fff8ffd0000 0000000000020000 0000000000000022 0000000000000000  
[   20.116183] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
[   20.116183] GPR12: 0000000000000000 00007fff903fa5c0 0000000000000000 0000000000000000  
[   20.116183] GPR16: 0000000020000000 0000000000000000 0000000000020000 0000000000000000  
[   20.116183] GPR20: 00007ffffcb56068 0000000000000002 0000000000000000 0000000109d61e50  
[   20.116183] GPR24: 0000000109d80110 0000000000020000 0000000109d680e0 000000007ff00000  
[   20.116183] GPR28: 0000000000000003 00007fff8ffd0000 0000000000020000 0000000000000003  
[   20.116228] NIP [00007fff9013a8b4] 0x7fff9013a8b4 
[   20.116231] LR [0000000000000000] 0x0 
[   20.116234] --- interrupt: 3000 
[   20.116236] Instruction dump: 
[   20.116239] 3d220004 7c7b1b78 3860ffff 3ba9b9b0 48000028 60000000 60000000 60000000  
[   20.116249] 3d220003 39097160 e93e0020 7d48502a <7d2a4aaa> 7fff4a14 38a30001 38800800  


The trigger for that panic was the following command.


mount -t nfsd nfsd /proc/fs/nfsd && cat /proc/fs/nfsd/reply_cache_stats && umount /proc/fs/nfsd

Comment 14 Jeff Layton 2023-06-16 19:38:23 UTC

Makes total sense that ppc64 would also fail. The x86 behavior was making me think I must be crazy. Thanks for explaining it!

Comment 15 Eirik Fuller 2023-06-16 19:44:04 UTC

I expected ppc64le to be susceptible, based on past experience (bug 2031604 describes various ppc64le panics).

As for the x86_64 behavior ... a case could be made for "that behavior is what's crazy" :)

Comment 16 Eirik Fuller 2023-06-16 20:58:35 UTC

I discovered additional details about the Segmentation fault mentioned in comment 6.

With kernel 6.4.0-0.rc6.20230614gitb6dad5178cea.49.fc39.aarch64 (Fedora Rawhide) I repeated the Segmentation fault (strace revealed, not surprisingly, that the SIGSEGV occurred during a read on the file descriptor returned by openat for "/proc/fs/nfsd/reply_cache_stats"), but dmesg output revealed the usual panic messages,


[24178.543628] Unable to handle kernel paging request at virtual address ffff5d359e56f000
[24178.544152] Mem abort info:
[24178.544331]   ESR = 0x0000000096000004
[24178.544569]   EC = 0x25: DABT (current EL), IL = 32 bits
[24178.544904]   SET = 0, FnV = 0
[24178.545097]   EA = 0, S1PTW = 0
[24178.545297]   FSC = 0x04: level 0 translation fault
[24178.545604] Data abort info:
[24178.545788]   ISV = 0, ISS = 0x00000004
[24178.546032]   CM = 0, WnR = 0
[24178.546220] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000004f7cd000
[24178.546643] [ffff5d359e56f000] pgd=0000000000000000, p4d=0000000000000000
[24178.547072] Internal error: Oops: 0000000096000004 [#1] SMP
[24178.547425] Modules linked in: veth qrtr rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs wireguard libcurve25519_generic ip_gre gre ipip ip_tunnel act_gact cls_flower act_mirred cls_matchall sch_ingress sch_tbf sch_prio sch_sfq nf_conntrack_netbios_ns nf_conntrack_broadcast echainiv ah6 ah4 esp6 esp4 xfrm4_tunnel tunnel4 ipcomp ipcomp6 xfrm6_tunnel xfrm_ipcomp tunnel6 chacha20poly1305 camellia_generic xcbc sha512_arm64 des_generic libdes af_key sch_netem openvswitch nsh nf_conncount macsec xt_conntrack xt_comment xt_MASQUERADE iptable_nat ip_tables tun rdma_ucm ib_uverbs rpcrdma rdma_cm iw_cm ib_cm nfsd auth_rpcgss nfs_acl lockd grace raid0 vrf vxlan ip6_udp_tunnel udp_tunnel macvlan ipt_REJECT nft_compat nft_reject_ipv4 nf_nat_h323 nf_conntrack_h323 nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_sip nf_conntrack_sip nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp binfmt_misc nft_meta_bridge team_mode_random team_mode_activebackup team_mode_broadcast team_mode_loadbalance
[24178.547500]  team_mode_roundrobin team nft_masq ppp_deflate bsd_comp ppp_async pppoe pppox ppp_generic slhc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set ib_core bluetooth bonding tls 8021q garp mrp nf_tables nfnetlink bridge stp llc dummy rfkill sunrpc vfat fat virtio_net net_failover failover fuse loop zram crct10dif_ce polyval_ce polyval_generic ghash_ce virtio_console virtio_blk virtio_mmio qemu_fw_cfg
[24178.553277] Unloaded tainted modules: netdevsim(OE):4 [last unloaded: veth]
[24178.556855] CPU: 1 PID: 663311 Comm: cat Tainted: G           OE     -------  ---  6.4.0-0.rc6.20230614gitb6dad5178cea.49.fc39.aarch64 #1
[24178.557664] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[24178.558117] pstate: 204000c5 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[24178.558577] pc : __percpu_counter_sum+0x60/0xb8
[24178.558881] lr : __percpu_counter_sum+0x7c/0xb8
[24178.559187] sp : ffff80000c5cbab0
[24178.559407] x29: ffff80000c5cbab0 x28: ffff0000c68824e8 x27: 0000000000400cc0
[24178.559877] x26: 000000007ffff000 x25: 0000000000000000 x24: ffffa2cde0174f30
[24178.560348] x23: ffffa2cde016f050 x22: ffffa2cde016ec50 x21: ffffa2cde016f850
[24178.560819] x20: ffff0002eac26608 x19: 0000000000000000 x18: ffffffffffffffff
[24178.561292] x17: 0000000000000000 x16: ffffa2cdddfb6388 x15: ffff80000c5cb9b0
[24178.561762] x14: 0000000000000000 x13: ffff000273aa204e x12: 20203a7374656b63
[24178.562234] x11: 0000000000000001 x10: 000000000000000a x9 : ffffa2cdddfb63bc
[24178.562705] x8 : 000000000000000a x7 : 0000000000000004 x6 : 0000000000000000
[24178.563178] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000001
[24178.563649] x2 : 0000000000000003 x1 : ffffa2cde016f050 x0 : ffff5d359e56f000
[24178.564120] Call trace:
[24178.564284]  __percpu_counter_sum+0x60/0xb8
[24178.564564]  nfsd_reply_cache_stats_show+0x94/0x1a0 [nfsd]
[24178.564970]  seq_read_iter+0xe4/0x490
[24178.565216]  seq_read+0x98/0xd8
[24178.565425]  vfs_read+0xc8/0x2a8
[24178.565640]  ksys_read+0x78/0x118
[24178.565861]  __arm64_sys_read+0x24/0x38
[24178.566115]  invoke_syscall+0x78/0x100
[24178.566365]  el0_svc_common.constprop.0+0xd4/0x100
[24178.566682]  do_el0_svc+0x34/0x50
[24178.566903]  el0_svc+0x34/0x108
[24178.567114]  el0t_64_sync_handler+0x114/0x120
[24178.567404]  el0t_64_sync+0x194/0x198
[24178.567647] Code: 52800003 14000005 f860db00 f9401284 (b8a46800) 
[24178.568047] ---[ end trace 0000000000000000 ]---
[24178.568353] note: cat[663311] exited with irqs disabled
[24178.568726] note: cat[663311] exited with preempt_count 1


but with no accompanying vmcore (nor reboot).

This experiment, like the one in comment 6, was performed in the reservation of a Beaker recipe in which the NetworkManager-ci tests ran to completion without a panic.

A second attempt to "cat /proc/fs/nfsd/reply_cache_stats" resulted in the system becoming unresponsive. Checking the console logs revealed a number of soft lockups in sshd, and a second look at the console log from the comment 6 revealed the same thing there (that system never did become responsive).

In short, with newer kernels this issue does not seem to result in a clean panic, by which I mean the system does not reboot on its own, nor does it save a vmcore. In any case I'd expect the patch linked in comment 12 (or possibly a future incarnation of it) to prevent such trouble, but it might be worthwhile to ponder why Fedora Rawhide kernels apparently don't panic cleanly.

Comment 17 Eirik Fuller 2023-06-16 22:43:47 UTC

A Beaker recipe is running the NetworkManager-ci tests with kernel 5.14.0-289.el9.aarch64 (which predates the commit linked in comment 3).

I expect that to not panic, based on the following output from the same system.


[root@ampere-mtsnow-altra-02-vm-06 ~]# mount -t nfsd nfsd /proc/fs/nfsd && cat /proc/fs/nfsd/reply_cache_stats && umount /proc/fs/nfsd
max entries:           119488
num entries:           0
hash buckets:          2048
mem usage:             0
cache hits:            0
cache misses:          0
not cached:            0
payload misses:        0
longest chain len:     0
cachesize at longest:  0
[root@ampere-mtsnow-altra-02-vm-06 ~]# uname -r
5.14.0-289.el9.aarch64
[root@ampere-mtsnow-altra-02-vm-06 ~]#

Comment 25 Eirik Fuller 2023-06-27 14:15:24 UTC

The command at the end of comment 13 triggers a panic with kernel 5.14.0-332.el9.ppc64le and with kernel 5.14.0-332.el9.ppc64le but the merge request kernels do not panic in response to the same command.


[root@ibm-p9z-15-lp3 ~]# mount -t nfsd nfsd /proc/fs/nfsd && cat /proc/fs/nfsd/reply_cache_stats && umount /proc/fs/nfsd
max entries:           0
num entries:           0
hash buckets:          1
mem usage:             0
cache hits:            0
cache misses:          0
not cached:            0
payload misses:        0
longest chain len:     0
cachesize at longest:  0
[root@ibm-p9z-15-lp3 ~]# uname -a
Linux ibm-p9z-15-lp3.khw3.lab.eng.bos.redhat.com 5.14.0-332.2735_913139648.el9.ppc64le #1 SMP Tue Jun 27 11:56:05 UTC 2023 ppc64le ppc64le ppc64le GNU/Linux
[root@ibm-p9z-15-lp3 ~]# 


[root@ampere-mtsnow-altra-01-vm-01 ~]# mount -t nfsd nfsd /proc/fs/nfsd && cat /proc/fs/nfsd/reply_cache_stats && umount /proc/fs/nfsd
max entries:           0
num entries:           0
hash buckets:          1
mem usage:             0
cache hits:            0
cache misses:          0
not cached:            0
payload misses:        0
longest chain len:     0
cachesize at longest:  0
[root@ampere-mtsnow-altra-01-vm-01 ~]# uname -a
Linux ampere-mtsnow-altra-01-vm-01.lab.eng.rdu2.redhat.com 5.14.0-332.2735_913139648.el9.aarch64 #1 SMP PREEMPT_DYNAMIC Tue Jun 27 12:00:11 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
[root@ampere-mtsnow-altra-01-vm-01 ~]# 


I might also run the NetworkManager-ci tests on those systems with the merge request kernel.

Comment 26 Eirik Fuller 2023-06-28 00:48:48 UTC

The NetworkManager-ci tests finished with kernel 5.14.0-332.2735_913139648.el9 on the two systems which ran the tests in comment 25. There were assorted failures, but no panics.

Comment 33 errata-xmlrpc 2023-11-07 08:48:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6583

Note You need to log in before you can comment on or make changes to this bug.