Bug 1582186 - [RFE] Populate nfs4_unique_id module parameter in nfs-config.service
Summary: [RFE] Populate nfs4_unique_id module parameter in nfs-config.service
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: nfs-utils
Version: 7.5
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Steve Dickson
QA Contact: Yongcheng Yang
URL:
Whiteboard:
Depends On:
Blocks: 1711360
TreeView+ depends on / blocked
 
Reported: 2018-05-24 13:12 UTC by Frank Sorenson
Modified: 2021-09-09 14:13 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1801326 (view as bug list)
Environment:
Last Closed: 2020-02-10 16:24:02 UTC
Target Upstream Version:


Attachments (Terms of Use)
potential nfs4_unique_id implementation V1 (24.58 KB, patch)
2019-07-08 17:47 UTC, Frank Sorenson
no flags Details | Diff
program to test expansion of unique_id strings (4.69 KB, patch)
2019-07-08 17:58 UTC, Frank Sorenson
no flags Details | Diff
conffile expansions patch v2 (15.16 KB, patch)
2019-07-16 18:57 UTC, Frank Sorenson
no flags Details | Diff
nfs4_unique_id patch v2 (14.88 KB, patch)
2019-07-16 19:00 UTC, Frank Sorenson
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3457671 0 Troubleshoot None NFSv4: Getting Input/output error with NFS shares while reading contents of file in OpenStack environment 2019-05-08 18:10:36 UTC
Red Hat Knowledge Base (Solution) 3753211 0 Configure None RHEL7.6 : NFS4ERR_EXPIRED seen due to 2 or more clients using the same hostname/nodename 2019-05-08 18:10:18 UTC

Description Frank Sorenson 2018-05-24 13:12:00 UTC
Description of problem:

In some environments, the default uniform client string may not be sufficiently unique, making the server unable to distinguish between clients.

Configure the nfs4_unique_id module parameter ('nfs') or /sys/module/nfs/parameters/nfs4_unique_id during nfs-config to make the client string more unique.


Version-Release number of selected component (if applicable):

RHEL 7 kernel and nfs-utils (all versions)


How reproducible:

see below

Steps to Reproduce:

two nfs clients, with the same IP address, each behind a system/device providing NAT
on both clients, mount with nfs v4 from the same nfs server


Actual results:

both clients will make SETCLIENTID (nfs v4.0) or EXCHANGE_ID (nfs v4.1+) calls with the same client string.  The server will determine that the second is the result of the client restarting, and will invalidate/expire the first clientid, preventing the first client from communicating


Expected results:

nfs clients use sufficiently unique client strings


Additional info:

Comment 2 Benjamin Coddington 2018-05-24 13:38:51 UTC
Let's use the contents of /etc/machine-id as an ID in the nfs module parameter an /etc/modprobe.d/nfs.conf file.  The nfs.config service can check to see if modprobe.d/nfs.conf exists with that parameter, and add it if it is not there.  We'll need a way to detect if an admin has already set that parameter as well, perhaps by prepending the value with something static, or marking the config option with a comment.

See the man page MACHINE-ID(5) for information on the contents of /etc/machine-id, it seems to be exactly we want.

Comment 7 Frank Sorenson 2019-05-23 16:30:19 UTC
Looking in the nfs client code, 

'migration' mount option:
    'nfs4_unique_id' is set:
        string is: Linux NFSv<MAJOR_VERSION>.<MINOR_VERSION> <UNIQUE_STRING>/<CLIENT_HOSTNAME>
    'nfs4_unique_id' is not set:
        string is: Linux NFSv<MAJOR_VERSION>.<MINOR_VERSION> <CLIENT_HOSTNAME>/<SERVER_IP>
'nomigration' mount option:
    'nfs4_unique_id' is set:
        string is: Linux NFSv4.0 <CLIENT_HOSTNAME>/<UNIQUE_STRING>/<SERVER_IP>
    'nfs4_unique_id' is not set:
        string is: Linux NFSv4.0 <CLIENT_HOSTNAME>

So the <CLIENT_HOSTNAME> gets used in all cases, and would be pointless in the nfs4_unique_id as a result.  <SERVER_IP> is also already included in the 'nomigration + nfs4_unique_id' case, an would obviously be pointless anyway, since the problem is already with the client string not being unique when viewed by the server (so would not help in distinguishing between clients).

The /etc/machine-id sounds like a good option, however the problem is typically seen between nfs clients that have been cloned from the same master, and the /etc/machine-id is not unique (obviously it should be, but in practice isn't).



What about something like a short nfs-utils program to generate (and set?) the nfs4_unique_id based on a template in /etc/nfs.conf?  something like:

[general]
# current default - empty
# nfs4-unique-id=

# nfs4-unique-id=some static string

# from /etc/machine-id
# nfs4-unique-id=%{machine_id}

# generated completely randomly from /proc/sys/kernel/random/uuid
# nfs4-unique-id=%{random_uuid}

# use a script/program to 
# nfs4-unique-id=%{exec=/path/to/script}

# use current time
# nfs4-unique-id=%{epoch}


(or some combination of the above)

Comment 8 J. Bruce Fields 2019-05-23 18:27:13 UTC
(In reply to Frank Sorenson from comment #7)
> The /etc/machine-id sounds like a good option, however the problem is
> typically seen between nfs clients that have been cloned from the same
> master, and the /etc/machine-id is not unique (obviously it should be, but
> in practice isn't).
> 
> 
> 
> What about something like a short nfs-utils program to generate (and set?)
> the nfs4_unique_id based on a template in /etc/nfs.conf?  something like:

How does it persist across boots?  If we're storing the result in the filesystem,
what happens if someone then clones that filesystem?

Surely we must not be the only subsystem with this problem.  I don't see how we can get around fixing /etc/machine-id.

Comment 9 Frank Sorenson 2019-05-23 20:27:57 UTC
(In reply to J. Bruce Fields from comment #8)
> (In reply to Frank Sorenson from comment #7)

> > What about something like a short nfs-utils program to generate (and set?)
> > the nfs4_unique_id based on a template in /etc/nfs.conf?  something like:
> 
> How does it persist across boots?  If we're storing the result in the
> filesystem,
> what happens if someone then clones that filesystem?

Yes, if the template for nfs4_unique_id only includes elements read directly from the filesystem, then they'll experience the same issue; hence the suggestions that the template could use the epoch seconds or a completely unique value instead, or as well.


> Surely we must not be the only subsystem with this problem.

No, we're definitely not.  A quick search of bugzilla shows a number of places where either machine-id is not becoming uniquely set, or where that causes a problem as a result.


> I don't see how
> we can get around fixing /etc/machine-id.

Sure, but currently there's nothing that sets /sys/module/nfs/parameters/nfs4_unique_id to machine-id, or to anything else.  If we add something to set nfs4_unique_id to machine-id (which I agree is a reasonable solution), is there a case where the customer might want an alternate string?  How do we detect that condition?  The drawback to setting nfs4_unique_id to machine-id (unless it's already populated) is that we then have nfs-related configuration items that are configured outside of nfs-utils.

The template suggestion was intended to make that configuration easier, and all from within nfs-utils.  But I'm not stuck on the idea.  Just a thought on how we could make setting nfs4_unique_id easily configurable.

Comment 10 J. Bruce Fields 2019-05-23 20:58:42 UTC
(In reply to Frank Sorenson from comment #9)
> Sure, but currently there's nothing that sets
> /sys/module/nfs/parameters/nfs4_unique_id to machine-id, or to anything
> else.  If we add something to set nfs4_unique_id to machine-id (which I
> agree is a reasonable solution), is there a case where the customer might
> want an alternate string?

OK, got it, so we could use your proposal to default to machine-id while still allowing for some exceptions, that makes sense.

Though I worry that the typical user experience won't be great: clone a bunch of VMs; find they don't work; realize its because NFS mounts are erroring out; stare at network traces for a day, finally figure out the right google keywords to realize it's something to do with nfs4_unique_id and find documentation pointing to /etc/nfs.conf.  A single clone app that everyone knows to use and can fix up stuff like machine-id would be nice.  But I assume people have already tried that.

Maybe I'm too worried about persistence.  What actually happens if every NFS client just generates a new random ID every time it boots?

I think the only effect for now is that everybody has to wait a lease period for the rebooted client's locks to get revoked.  Whereas if it boots back up with the same ID in less than a lease period, it can tell the server to release its old state early.  Maybe that's not a huge deal.

There's also client delegation reclaim--in theory a client that keeps cached data on disk can try to reclaim its old delegations on reboot to avoid throwing out its cache.  We don't implement that, but maybe we will some day.

Comment 11 fj-lsoft-rh-net 2019-06-10 01:55:21 UTC
Hi,

Will the discussed solution be included into the next RHEL 7.7?

Regards,
Ikarashi

Comment 12 Christian Horn 2019-06-10 02:32:03 UTC
Customers running upgrading from earlier rhel7 setups to rhel7.6 are running into this, marking as regression.

Comment 17 J. Bruce Fields 2019-06-11 00:28:21 UTC
Looks reasonable to me.  So, "nfs4_unique_id = ${machine_id}" would be the default on a new install?

Comment 18 Murphy Zhou 2019-06-11 03:50:42 UTC
If this can't make into 7.7, we have to defer it to RHEL8.

Thanks.

Comment 19 Frank Sorenson 2019-06-11 11:56:48 UTC
Yes, I think we'd want to default to "nfs4_unique_id = ${machine_id}", rather than the empty string, as with current behavior.


At this point, moving this to RHEL 8 is probably appropriate.

Comment 32 Frank Sorenson 2019-07-08 17:47:58 UTC
Created attachment 1588467 [details]
potential nfs4_unique_id implementation V1

this patch adds the nfs4_unique_id program, which can be used to parse and potentially expand the [general]/unique_id setting in the nfs.conf (or as given on the command line).  The program will then set the nfs4 module's nfs4_unique_id parameter to the expanded string.


usage: nfs4_unique_id [options]
      -h, -?, --help      Print this help screen.
      -s                  Set the nfs4_unique_id string.
      -u <unique_id_string>, --unique_id=<unique_id_string>
                          String to parse.
      -n, --null          Null out the system's nfs4_unique_id string.
      -x, --hex           Dump the expanded nfs4_unique_id string in hex.
      -v, --verbose       Display verbose information about the expansion.
      -V, --version       Display version information and exit.


expansion rules and keywords:
        input string is null-terminated
        expanded string is not terminated (0x00 is a valid nfs4_unique_id character)

        input string:
                empty string - empty string is returned

                string beginning with $ - replaced by the value of an environment variable

                regular characters - copied to epanded as-is
                        (including spaces but not a trailing \n, unless escaped or quoted)
                %{keyword} - expanded/replaced as defined
                        %{machine_id} - replaced by hash of contents of /etc/machine-id
                        %{net_ns} - replaced by the network namespace pointed to by /proc/self/ns/net
                        %{file=/path/to/file} - replaced by contents of the given file
                        %{exec=/path/to/executable} - execute the given program/script, and use the output
                        %{env=ENVIRONMENT_VAR} - replaced by the value of an environment variable

                        %{epoch} - replaced by the epoch time in seconds
                        %{epoch_ns} - replaced by the epoch time in nanoseconds
                        %{random_uuid} - replaced by uuid generated by /proc/sys/kernel/random/uuid

                        %{hostname} - expands to hostname; does not add uniqueness

                        broken %{... without terminating } - copy literal
                        unknown %{keyword} - copy literal

                % (without the next character being {) - literal %
                \% - literal %
                \" - literal "
                \\ - literal \
                \{ - literal {
                \x## - replace this hex sequence with the 8-bit character value

Comment 33 Frank Sorenson 2019-07-08 17:58:34 UTC
Created attachment 1588469 [details]
program to test expansion of unique_id strings

this patch adds a test_expansions program to test the various expansions of the nfs4_unique_id program

Comment 34 Steve Whitehouse 2019-07-12 09:21:02 UTC
I'm assuming that we think this is a reasonable target for 7.8 at this point in time? So I've set devel ack. Alice, feel free to assign this bug to yourself if this is the one that you mentioned that you were working on the other day.

Comment 35 Alice Mitchell 2019-07-12 09:34:06 UTC
Frank has done all the work, I'm just chipping in with some suggestions wrt the nfs.conf parts and springfield integration

Comment 37 Frank Sorenson 2019-07-16 18:53:34 UTC
Alice suggested extending conffile/nfsconf settings with this expansion capability, rather than building it directly into a tool to set just the nfs4_unique_id, which makes complete sense, and which I had totally missed.

I'm attaching new patches which do just this.  The first adds the expansion capabilities to the configuration settings, and the second adds the nfs4_unique_id-specific pieces.

Comment 38 Frank Sorenson 2019-07-16 18:57:43 UTC
Created attachment 1591153 [details]
conffile expansions patch v2

Comment 39 Frank Sorenson 2019-07-16 19:00:25 UTC
Created attachment 1591155 [details]
nfs4_unique_id patch v2

add the nfs4_unique_id utility

Comment 40 Dave Wysochanski 2019-08-14 16:40:46 UTC
Hey Frank - are you wanting review or should you just submit your patches to linux-nfs?

Comment 43 Dave Wysochanski 2020-02-10 16:24:02 UTC
CLOSING WONTFIX for RHEL7 and will open clone to RHEL8

Comment 44 Dave Wysochanski 2020-02-10 16:28:03 UTC
RHEL8 bug https://bugzilla.redhat.com/show_bug.cgi?id=1801326


Note You need to log in before you can comment on or make changes to this bug.