1557465 – [nfs-ganesha] convert to rados_ng RecoveryBackend

Bug 1557465 - [nfs-ganesha] convert to rados_ng RecoveryBackend

Summary: [nfs-ganesha] convert to rados_ng RecoveryBackend

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	CephFS
Sub Component:
Version:	3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	3.1
Assignee:	Matt Benjamin (redhat)
QA Contact:	Ramakrishnan Periyasamy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-03-16 15:41 UTC by Jeff Layton
Modified:	2018-09-26 18:21 UTC (History)
CC List:	9 users (show)
Fixed In Version:	RHEL: nfs-ganesha-2.5.5-100.el7cp Ubuntu: nfs-ganesha_2.5.5-100redhat1xenial
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1581884 (view as bug list)
Environment:
Last Closed:	2018-09-26 18:19:40 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:2819	0	None	None	None	2018-09-26 18:21:35 UTC

Description Jeff Layton 2018-03-16 15:41:19 UTC

Ramana points out that we considered using the newer rados_ng backend in December, but decided to use rados_kv one instead. I think we need to reconsider this.

rados_kv does not properly survive ganesha being restarted during the grace period. That's something that can easily happen during testing (just fail over and back in rapid succession).

Given that we aren't using the dbus interfaces to do node takeover or anything, (right?) we should strongly consider merging the more resilient rados_ng driver, and shipping with that enabled instead of the rados_kv driver.

Comment 3 Jeff Layton 2018-03-16 18:37:37 UTC

Initial backport is here:

    http://git.engineering.redhat.com/git/users/jlayton/nfs-ganesha.git/log/?h=rados_ng

It compiles, but needs more extensive testing.

Comment 4 Jeff Layton 2018-03-16 18:39:15 UTC

Note that I snuck a couple of other "nice to have" patches in there too. With 21 patches, I didn't see much reason to hold back.

Comment 5 Jeff Layton 2018-03-16 18:48:31 UTC

So this will add the rados_ng driver to ganesha, clean up grace period handling, etc. We'll need to change whatever is generating the config file to use:

    RecoveryBackend = rados_ng;

instead of "rados_kv".

Comment 7 Jeff Layton 2018-03-17 10:37:30 UTC

The storage format in the object is the same as rados_kv (it even uses a lot of the same low-level code), but it handles the transition to grace period and back more resiliently.

The problem with rados_kv is that it makes alterations to the recovery db while in the grace period. This means that if ganesha gets restarted while in the grace period, some records can be lost if they never get written to the new recovery db.

In practice, a recovery db only becomes valid at the moment that the grace period is lifted. Until then, the previous recovery db is authoritative no matter how many times the server reboots.

I'll just copypasta the comment block here to explain rados_ng:

* recovery_rados_ng: a "safe by design" recovery backing store
*
* At startup, create a global write op, and set it up to clear out all of
* the old keys. We then will spool up new client creation (and removals) to
* that transaction during the grace period.
*
* When lifting the grace period, synchronously commit the transaction
* to the kvstore. After that point, all client creation and removal is done
* synchronously to the kvstore.
*
* This allows for better resilience when the server crashes during the grace
* period. No changes are made to the backing store until the grace period
* has been lifted.

The main reason I didn't just replace rados_kv in the code was that it has some IP-address takeover functionality that rados_ng doesn't have at this point. We aren't using any of that functionality here, so rados_ng is really a better choice.

Comment 18 Ramakrishnan Periyasamy 2018-07-28 15:29:22 UTC

Hi Jeff,

Do we have any specific steps to verify this bz and any tests ?
This bz require Openstack manila HA config setup to verify or CephFS NFS config is enough ?

Comment 19 Jeff Layton 2018-07-29 13:21:05 UTC

No specific steps. If you set the backend to rados-ng and the server starts all the way up, then it should be active.

Comment 20 Ramakrishnan Periyasamy 2018-08-23 12:17:35 UTC

Moving this bug to verified state.

Configured CephFS-NFS config with below version.
ceph version 12.2.5-39.el7cp
nfs-ganesha-2.5.5-100.el7cp.x86_64
nfs-ganesha-ceph-2.5.5-100.el7cp.x86_64

Configured ganesha.conf with "RecoveryBackend = rados_ng"

Mounted NFS and started running IO for more than 1hr, performed NFS service restart for 10 times, Not seen any interruption in the client IO.

Comment 22 errata-xmlrpc 2018-09-26 18:19:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819

Note You need to log in before you can comment on or make changes to this bug.