Bug 2231596 - regression: krb5 nfs mounts fail with kernel 6.4
Summary: regression: krb5 nfs mounts fail with kernel 6.4
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 38
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-12 09:31 UTC by Enrico Scholz
Modified: 2023-11-16 21:46 UTC (History)
17 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2023-11-16 21:46:25 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Enrico Scholz 2023-08-12 09:31:37 UTC
1. Please describe the problem:

When trying to use 'sec=krb5i' (or other krb5x variants) with kernel 6.4+, mounting NFS shares fails:

| # mount   -t nfs -o nodev,noexec,nosuid,ro,sec=krb5i sciurus.intern.sigma-chemnitz.de:/mirror /mnt/
| mount.nfs: access denied by server while mounting sciurus.intern.sigma-chemnitz.de:/mirror

When going back to 6.3.13-200.fc38.x86_64, things works as expected.


In good case, tcpdump shows that client tries to start two sessions.  First one contains a plaintext `EXCHANGE_ID` and this first session is aborted with "Access denied".  Then, clients starts another session with GSS data in `EXCHANGE_ID` and this session is accepted.

In bad case, only the plaintext `EXCHANGE_ID` seems to be sent.


Server is RHEL8.8 (kernel-4.18.0-477.21.1.el8_8.x86_64)


2. What is the Version-Release number of the kernel:
3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?
5. Does this problem occur with the latest Rawhide kernel?

Bad:

kernel-6.4.4-200.fc38.x86_64
kernel-6.4.8-200.fc38.x86_64
kernel-6.5.0-0.rc5.20230811git25aa0bebba72.40.fc40.x86_64



Ok:

kernel-6.3.13-200.fc38.x86_64


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:


 To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:


6. Are you running any modules that not shipped with directly Fedora's kernel?:

no

Reproducible: Always

Comment 1 Enrico Scholz 2023-08-12 13:44:13 UTC
kernel 6.4 offers more enctypes; e.g.  kernel 6.3 shows

| handle_gssd_upcall(0x7f6d05a6d840): 'mech=krb5 uid=0 service=* enctypes=18,17' (nfs/clnt0)

while 6.4 has

| handle_gssd_upcall(0x7fd007647840): 'mech=krb5 uid=0 service=* enctypes=20,19,26,25,18,17' (nfs/clnt0)


rpc-gssd seems to send only packets for the first enctype (20) which is not supported by the server.  Removing the unsupported enctypes from the server keytab restores operation.

Comment 2 Troels Arvin 2023-09-17 15:57:12 UTC
I don't see how this can be closed. I have a number of F38 clients which can suddenly not mount Kerberized NFS shares from anything else than F38 based NFS servers. For example, I have an NFS server running Centos 8 Stream, and it works fine with NFS clients running Ubuntu 22 or CentOS Stream 8, but none of the Fedora 38 clients can mount from it.

I spent many hours trying all sorts of things (kerberized NFS is not the easiest thing in the first place), and then I finally realized that it was due to the rather minor kernel version difference.

On the CentOS 8 Stream server, it looks like this in journald when I run "rpcdebug -m rpc -s all" and then try to mount from a F38 host:
Sep 17 17:32:38 servername.somedomain kernel: gss_kerberos_mech: unsupported krb5 enctype 20
Sep 17 17:32:38 servername.somedomain kernel: RPC:       gss_import_sec_context_kerberos: returning -22
Sep 17 17:32:38 servername.somedomain kernel: RPC:       gss_delete_sec_context deleting 000000002bf496ea

In my opinion, this is a rather serious regression, and something has to be done about it (but I'm not sure what).

Comment 3 Troels Arvin 2023-11-16 21:46:25 UTC
After having upgraded both the server (from Stream 8 to Stream 9) and the clients (from Fedora 38 to 39), I can no longer reproduce the problem, so I'm not going to keep this case open.


Note You need to log in before you can comment on or make changes to this bug.