Bug 1486132

Summary: Write to a file does not get propagated to another NFS client on RHEL 7.4
Product: Red Hat Enterprise Linux 7 Reporter: Miroslav Novak <mnovak>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED WONTFIX QA Contact: Yongcheng Yang <yoyang>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 7.4CC: bfields, cdewolf, jiyin, mnovak, rmj, steved, toneata, xzhou, zlang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-15 07:41:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
live.groovy
none
backup.groovy none

Description Miroslav Novak 2017-08-29 06:46:44 UTC
Created attachment 1319325 [details]
live.groovy

Description of problem:
There is change in behaviour in data synchronization between 2 NFS clients in NFSv4 between RHEL 7.3 and RHEL 7.4.
Artemis (messaging broker) is using NFSv4 as shared store for its messaging journal. It's using it to provide HA in case that Artemis live server fails. Backup activates once there is no file lock on server.lock file in Artemis journal (on NFSv4 mount) and takes all duties instead of live server.
The change or problem we see is not with file locks but how data changes to server.lock file made by live server are propagated to backup server.

To return back original state, live must be started again and backup must be notified to shutdown. Live notifies backup writing special character to 1st byte in server.lock file. Backup which holds lock on server.lock file (but not at the first byte) periodically reads 1st byte in server.lock and once special character is there then it shutdown itself, releases file lock on server.lock file, live server creates file lock on server.lock file and activates.

Problem is that writing special character to server.lock by live server does not get propagated to backup. So backup never shutdowns. This is change we see after upgrade of RHEL 7.3 -> 7.4. 

Mount options (taken from /etc/fstab):
10.40.4.25:/mnt/shared /mnt/hornetq/client nfs4 minorversion=0,rw,sync,nosuid,nodev,intr,tcp,noac,soft,lookupcache=none 0 0

#mount
10.40.4.25:/mnt/shared on /mnt/hornetq/client type nfs4 (rw,nosuid,nodev,relatime,sync,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,soft,noac,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.40.5.160,lookupcache=none,local_lock=none,addr=10.40.4.25)

Version-Release number of selected component (if applicable):
RHEL 7.4/NFSv4

How reproducible:
Download attached groovy scripts and follow steps to reproduce.

Steps to Reproduce:
1. On 2 different machines mount NFSv4 directory using above mount options
2. Run backup.groovy script by "groovy backup.groovy" on first machine
3. Run live.groovy script by "groovy live.groovy" on second machine

Actual results:
backup.groovy does not detect change in server.lock file. You can open created server.lock file and see if "F" is written as first character.

Expected results:
backup.groovy script will detect "F" written by live.groovy script

Comment 1 Miroslav Novak 2017-08-29 06:47:06 UTC
Created attachment 1319326 [details]
backup.groovy

Comment 2 Oneata Mircea Teodor 2017-08-29 07:53:10 UTC
Hello Miroslav,

This is an illegal clone, zstream are cloned from ystream.
Please add RHEL 5 y stream flag in the bug.

Comment 3 Miroslav Novak 2017-08-29 09:13:30 UTC
@Oneata Sorry, I don't know what's the correct process and not sure what RHEL 5 y stream flag is. Could you set the correct flags instead of me?

Thanks,
Mirek

Comment 4 Oneata Mircea Teodor 2017-08-29 09:17:37 UTC
(In reply to Miroslav Novak from comment #3)
> @Oneata Sorry, I don't know what's the correct process and not sure what
> RHEL 5 y stream flag is. Could you set the correct flags instead of me?
> 
> Thanks,
> Mirek

Hi There,
Sorry, typo there, RHEL 7.5 as y stream flag

Comment 5 Steve Dickson 2017-08-29 15:51:18 UTC
Its my understanding that there are two NFS clients reading and writing to
the same file on a server. Both clients and server are RHEL 7.4

When the backup client is active, it holds a lock on the 2ed byte 
of the file and then monitors the 1st byte of the file to see 
when to shutdown.

When the live client sees the 2ed byte is locked, it writes to
the 1st byte of the file, telling the backup to shutdown.

In 7.3 the backup client was seeing the live client's write 
and in 7.4 it does not. Basically you got lucky with 7.3 because
the only way to guarantee synchronization between two NFS client 
reads and writes to the same file is through file locking.

So the solution here is to use the same locking synchronization on 
the 1st byte as done on the 2ed byte. 

Have the live client lock the 1st byte, write the byte then 
unlock the file which will flush the byte back to the server. 
Then have the backup client lock the 1st, read the byte then 
unlock the file. 

If that does not work, there is bug in NFS...

Comment 7 Miroslav Novak 2017-08-30 07:19:20 UTC
Thanks Steve! I've tried your suggestion and it works. I believe this bugzilla can be closed as not a bug.

Comment 11 J. Bruce Fields 2017-09-07 14:09:15 UTC
I totally overlooked the "sync" options on those mount points.

If the NFS mount points on both clients involved were mounted with "sync", then the extra locking should be unnecessary.  From nfs(5):

 "If the sync option is specified on a mount point, any system call that
  writes data to files on that mount point causes that data to be flushed
  to the server before the system call returns control to user space."

If both clients are mounting with "sync", then the extra locking should make no difference, but Comment 7 says it did make a difference.

So, I'm stumped.  Are we positive about those mount options?

Comment 12 Miroslav Novak 2017-09-07 14:14:47 UTC
> Are we positive about those mount options?

Yes. Mount options are the same on both of the NFS clients.

Comment 13 J. Bruce Fields 2017-09-07 15:57:42 UTC
(In reply to J. Bruce Fields from comment #11)
> If both clients are mounting with "sync", then the extra locking should make
> no difference, but Comment 7 says it did make a difference.

Apologies, my mistake: sync doesn't turn off attribute or data caching.  So, writes on one client are being flushed to the server before the system call returns, but reads on the other client may still rely on cache.  You'd need "noac", not just sync, to eliminate the need for locking.

So, the behavior looks to me as expected, and I think we can close this as NOTABUG.

Apologies again for the confusion.

Comment 15 Miroslav Novak 2017-09-07 18:57:05 UTC
There is "noac" option as well. The goal was to avoid any caching or asynchronous behaviour in mount to keep state between NFS clients in sync.

Comment 16 J. Bruce Fields 2017-09-07 19:08:26 UTC
(In reply to Miroslav Novak from comment #15)
> There is "noac" option as well. The goal was to avoid any caching or
> asynchronous behaviour in mount to keep state between NFS clients in sync.

Argh, OK, thanks.

What's the server?  (And, if Linux, what filesystem are you exporting, and what are the mount options?)

Comment 17 Miroslav Novak 2017-09-07 20:33:34 UTC
Server is RHEL 7.4. Exported file system is xfs:
/dev/mapper/system-hornetq--nfs41                                                       xfs        10G   81M   10G   1% /mnt/system-hornetq/client-nfs41

But i'm not sure if this can have effect as I could see the same behaviour when NFSv4 server was RHEL 6.9 and exported file system was ext4. NFS clients were RHEL 7.4 with the above mount options.

Comment 18 J. Bruce Fields 2017-09-07 21:19:30 UTC
(In reply to Miroslav Novak from comment #17)
> Server is RHEL 7.4. Exported file system is xfs:
> /dev/mapper/system-hornetq--nfs41                                           
> xfs        10G   81M   10G   1% /mnt/system-hornetq/client-nfs41
> 
> But i'm not sure if this can have effect as I could see the same behaviour
> when NFSv4 server was RHEL 6.9 and exported file system was ext4. NFS
> clients were RHEL 7.4 with the above mount options.

I wondered if there might be a problem with NFSv4 change attribute resolution.  Even with "noac", the client probably still depends on the change attribute to know whether the file has changed, and older xfs will use a change attribute that's actually based on a last-modified time (ctime actually) with resolution about 1ms, which can lead the client to miss changes that occur in very quick succession--I'm not clear whether this test case might be doing that.

Could you check with "xfs_db -r -c version /dev/mapper/system-hornetq--nfs41" that V5 is set?  Also if possible it'd be useful to know whether MS_I_VERSION is actually set on that mount.... See https://bugzilla.redhat.com/show_bug.cgi?id=1466981#c7 for an example of doing this with crash, I don't know another way.

Comment 19 Miroslav Novak 2017-09-08 08:25:37 UTC
Sorry, I cannot execute xfs_db in our QA lab as don't have 'root' rights. I there another way how to check it.

Comment 20 Miroslav Novak 2017-09-11 12:28:44 UTC
I've asked one of the admins to check it, here is output for: "xfs_db -r -c version /dev/mapper/system-hornetq--nfs41":

versionnum [0xb4b4+0x8a] = V4,NLINK,DIRV2,ATTR,ALIGN,LOGV2,EXTFLG,MOREBITS,ATTR2,LAZYSBCOUNT,PROJID32BIT

Comment 21 Miroslav Novak 2017-09-11 13:15:44 UTC
> I wondered if there might be a problem with NFSv4 change attribute
> resolution.  Even with "noac", the client probably still depends on the
> change attribute to know whether the file has changed, and older xfs will
> use a change attribute that's actually based on a last-modified time (ctime
> actually) with resolution about 1ms, which can lead the client to miss
> changes that occur in very quick succession--I'm not clear whether this test
> case might be doing that.

I think this is not the case as backup.groovy is reading 1st byte of server.lock file periodically (in while cycle). So if it misses update once then it would get it next update.

Comment 23 RHEL Program Management 2021-01-15 07:41:39 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.