Ramana points out that we considered using the newer rados_ng backend in December, but decided to use rados_kv one instead. I think we need to reconsider this. rados_kv does not properly survive ganesha being restarted during the grace period. That's something that can easily happen during testing (just fail over and back in rapid succession). Given that we aren't using the dbus interfaces to do node takeover or anything, (right?) we should strongly consider merging the more resilient rados_ng driver, and shipping with that enabled instead of the rados_kv driver.
Initial backport is here: http://git.engineering.redhat.com/git/users/jlayton/nfs-ganesha.git/log/?h=rados_ng It compiles, but needs more extensive testing.
Note that I snuck a couple of other "nice to have" patches in there too. With 21 patches, I didn't see much reason to hold back.
So this will add the rados_ng driver to ganesha, clean up grace period handling, etc. We'll need to change whatever is generating the config file to use: RecoveryBackend = rados_ng; instead of "rados_kv".
The storage format in the object is the same as rados_kv (it even uses a lot of the same low-level code), but it handles the transition to grace period and back more resiliently. The problem with rados_kv is that it makes alterations to the recovery db while in the grace period. This means that if ganesha gets restarted while in the grace period, some records can be lost if they never get written to the new recovery db. In practice, a recovery db only becomes valid at the moment that the grace period is lifted. Until then, the previous recovery db is authoritative no matter how many times the server reboots. I'll just copypasta the comment block here to explain rados_ng: * recovery_rados_ng: a "safe by design" recovery backing store * * At startup, create a global write op, and set it up to clear out all of * the old keys. We then will spool up new client creation (and removals) to * that transaction during the grace period. * * When lifting the grace period, synchronously commit the transaction * to the kvstore. After that point, all client creation and removal is done * synchronously to the kvstore. * * This allows for better resilience when the server crashes during the grace * period. No changes are made to the backing store until the grace period * has been lifted. The main reason I didn't just replace rados_kv in the code was that it has some IP-address takeover functionality that rados_ng doesn't have at this point. We aren't using any of that functionality here, so rados_ng is really a better choice.
Hi Jeff, Do we have any specific steps to verify this bz and any tests ? This bz require Openstack manila HA config setup to verify or CephFS NFS config is enough ?
No specific steps. If you set the backend to rados-ng and the server starts all the way up, then it should be active.
Moving this bug to verified state. Configured CephFS-NFS config with below version. ceph version 12.2.5-39.el7cp nfs-ganesha-2.5.5-100.el7cp.x86_64 nfs-ganesha-ceph-2.5.5-100.el7cp.x86_64 Configured ganesha.conf with "RecoveryBackend = rados_ng" Mounted NFS and started running IO for more than 1hr, performed NFS service restart for 10 times, Not seen any interruption in the client IO.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2819