Bug 1557465

Summary: [nfs-ganesha] convert to rados_ng RecoveryBackend
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Jeff Layton <jlayton>
Component: CephFSAssignee: Matt Benjamin (redhat) <mbenjamin>
Status: CLOSED ERRATA QA Contact: Ramakrishnan Periyasamy <rperiyas>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: ceph-eng-bugs, ceph-qe-bugs, hnallurv, jlayton, john.spray, pdonnell, rperiyas, rraja, tserlin
Target Milestone: rc   
Target Release: 3.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: nfs-ganesha-2.5.5-100.el7cp Ubuntu: nfs-ganesha_2.5.5-100redhat1xenial Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1581884 (view as bug list) Environment:
Last Closed: 2018-09-26 18:19:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeff Layton 2018-03-16 15:41:19 UTC
Ramana points out that we considered using the newer rados_ng backend in December, but decided to use rados_kv one instead. I think we need to reconsider this.

rados_kv does not properly survive ganesha being restarted during the grace period. That's something that can easily happen during testing (just fail over and back in rapid succession).

Given that we aren't using the dbus interfaces to do node takeover or anything, (right?) we should strongly consider merging the more resilient rados_ng driver, and shipping with that enabled instead of the rados_kv driver.

Comment 3 Jeff Layton 2018-03-16 18:37:37 UTC
Initial backport is here:

    http://git.engineering.redhat.com/git/users/jlayton/nfs-ganesha.git/log/?h=rados_ng

It compiles, but needs more extensive testing.

Comment 4 Jeff Layton 2018-03-16 18:39:15 UTC
Note that I snuck a couple of other "nice to have" patches in there too. With 21 patches, I didn't see much reason to hold back.

Comment 5 Jeff Layton 2018-03-16 18:48:31 UTC
So this will add the rados_ng driver to ganesha, clean up grace period handling, etc. We'll need to change whatever is generating the config file to use:

    RecoveryBackend = rados_ng;

instead of "rados_kv".

Comment 7 Jeff Layton 2018-03-17 10:37:30 UTC
The storage format in the object is the same as rados_kv (it even uses a lot of the same low-level code), but it handles the transition to grace period and back more resiliently.

The problem with rados_kv is that it makes alterations to the recovery db while in the grace period. This means that if ganesha gets restarted while in the grace period, some records can be lost if they never get written to the new recovery db.

In practice, a recovery db only becomes valid at the moment that the grace period is lifted. Until then, the previous recovery db is authoritative no matter how many times the server reboots.

I'll just copypasta the comment block here to explain rados_ng:

 * recovery_rados_ng: a "safe by design" recovery backing store                 
 *                                                                              
 * At startup, create a global write op, and set it up to clear out all of      
 * the old keys. We then will spool up new client creation (and removals) to    
 * that transaction during the grace period.                                    
 *                                                                              
 * When lifting the grace period, synchronously commit the transaction          
 * to the kvstore. After that point, all client creation and removal is done    
 * synchronously to the kvstore.                                                
 *                                                                              
 * This allows for better resilience when the server crashes during the grace   
 * period. No changes are made to the backing store until the grace period      
 * has been lifted.                                                             

The main reason I didn't just replace rados_kv in the code was that it has some IP-address takeover functionality that rados_ng doesn't have at this point. We aren't using any of that functionality here, so rados_ng is really a better choice.

Comment 18 Ramakrishnan Periyasamy 2018-07-28 15:29:22 UTC
Hi Jeff,

Do we have any specific steps to verify this bz and any tests ?
This bz require Openstack manila HA config setup to verify or CephFS NFS config is enough ?

Comment 19 Jeff Layton 2018-07-29 13:21:05 UTC
No specific steps. If you set the backend to rados-ng and the server starts all the way up, then it should be active.

Comment 20 Ramakrishnan Periyasamy 2018-08-23 12:17:35 UTC
Moving this bug to verified state.

Configured CephFS-NFS config with below version.
ceph version 12.2.5-39.el7cp
nfs-ganesha-2.5.5-100.el7cp.x86_64
nfs-ganesha-ceph-2.5.5-100.el7cp.x86_64

Configured ganesha.conf with "RecoveryBackend = rados_ng"

Mounted NFS and started running IO for more than 1hr, performed NFS service restart for 10 times, Not seen any interruption in the client IO.

Comment 22 errata-xmlrpc 2018-09-26 18:19:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819