Bug 1285097

Summary:	updated nfs-utils package broke nfsdcltrack
Product:	Red Hat Enterprise Linux 7	Reporter:	Frank Sorenson <fsorenso>
Component:	nfs-utils	Assignee:	Steve Dickson <steved>
Status:	CLOSED ERRATA	QA Contact:	Yongcheng Yang <yoyang>
Severity:	high	Docs Contact:	Marie Hornickova <mdolezel>
Priority:	urgent
Version:	7.2	CC:	adrian.fischli, bcodding, bfields, bugzilla.redhat.com, chorn, dgilbert, dossow, dwysocha, eguan, evelu, gdubreui, green, igeorgex, ioan, jbnance, jiyin, j, knweiss, lslysz, luc.lalonde, mark2015, martin, mdolezel, me, miturria, mkolaja, pasteur, redhat.bugs, redhatbugs, redhat, rob.verduijn, sellis, steved, swhiteho, tlavigne, troels, vanhoof, wdh, yoguma
Target Milestone:	rc	Keywords:	Patch, Regression, TestCaseProvided, ZStream
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	nfs-utils-1.3.0-0.23.el7	Doc Type:	Bug Fix
Doc Text:	The update of the nfs-utils packages in Red Hat Enterprise Linux 7.2 added support for the NFSv4.1 features which was incomplete. Consequently, the NFSv4 client-tracking callout program (nfsdcltrack) created an incorrect schema for the clients table, and the file locks appeared then but these locks did not persist after restart. With this update, the underlying source code has been fixed, and nfsdcltrack can now enter the NFS client data into the database. As a result, NFS clients no longer experience the incorrect locks after NFS server restart.	Story Points:	---
Clone Of:
Clones:	1309625 (view as bug list)		Environment:
Last Closed:	2016-11-04 05:01:27 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1203710, 1295577, 1309625

Description Frank Sorenson 2015-11-24 21:34:49 UTC

Description of problem:

The nfsdcltrack upcall creates a 'v1 schema' for the clients table, then uses the table as 'v2 schema'

In addition to not working, the system's logs become filled with messages:
Nov 24 14:29:06 vm18 nfsdcltrack[29345]: sqlite_insert_client: insert statement prepare failed: table clients has 2 columns but 3 values were supplied


Version-Release number of selected component (if applicable):

RHEL 7
nfs-utils 1.3.0-0.21.el7.x86_64 (tested)
 (expected to exist in nfs-utils beginning with 1.3.0-0.12)


How reproducible:

see below

Steps to Reproduce:

# mkdir -p /exports/cltrack_test /mnt/tmp
# echo "/exports/cltrack_test 127.0.0.1(rw,no_root_squash)" >> /etc/exports
# exportfs -r
# mount -o vers=4 localhost:/exports/cltrack_test /mnt/tmp
# flock /mnt/tmp/foo sleep 10

Actual results:

Nov 24 15:18:57 vm18 nfsdcltrack[31927]: sqlite_insert_client: insert statement prepare failed: table clients has 2 columns but 3 values were supplied

Expected results:

no errors

Additional info:

This updated nfsdcltrack to use the v2 schema, but a v1 schema database is still created:
* Wed Jun 24 2015 Steve Dickson <steved> 1.3.0-0.12
- Updated nfsdcltrack v2 schema (bz 1234598)


It looks like we may have parts (but not all) of this:

commit d479ad3adb0671c48d6fbf3e36bd52a31159c413
Author: Jeff Layton <jlayton>
Date:   2014-09-19 11:03:45 -0400

    nfsdcltrack: update schema to v2
    
    From: Jeff Layton <jlayton>
    
    In order to allow knfsd's lock manager to lift its grace period early,
    we need to figure out whether all clients have finished reclaiming
    their state not. Unfortunately, the current code doesn't allow us to
    ascertain this. All we track for each client is a timestamp that tells
    us when the last "check" or "create" operation came in.
    
    Not only is this insufficient with clients that use sessions, it's also
    wrong. We only want to update the timestamp on v4.1 clients when the
    "create" operation comes in or we can leave the server susceptible to
    edge condition #2 in RFC5661, section 8.4.3. Once the grace period is
    lifted, we disallow reclaim on subsequent reboots for clients that
    have not sent a RECLAIM_COMPLETE.
    
    Signed-off-by: Jeff Layton <jlayton>
    Signed-off-by: Steve Dickson <steved>

Comment 2 Jason Tibbitts 2015-12-16 20:44:03 UTC

For those who can't see that Red Hat KB article, the solution there is simply to downgrade nfs-utils.  Not sure why that gem is hidden behind the paywall.

Also, does anyone know if this issue actually cause any problems with remote hosts mounting exported filesystems?  I'm trying to track down several issues I'm having since the 7.2 update but aside from the log spamming I'm not sure what this actually breaks.

Comment 3 Frank Sorenson 2015-12-17 18:13:10 UTC

(In reply to Jason Tibbitts from comment #2)

> Also, does anyone know if this issue actually cause any problems with remote
> hosts mounting exported filesystems?  I'm trying to track down several
> issues I'm having since the 7.2 update but aside from the log spamming I'm
> not sure what this actually breaks.

There shouldn't be problems during either the 'mount' or during most normal usage of the mounts.  This program is used to update an on-disk (persistent) database which tracks file locks which the nfs clients have been granted.

This becomes important in the event of an nfs server restart, as it enables the nfs clients to reclaim (for a period of time) these locks (while denying the lock to other nfs clients trying to take a conflicting lock).

This bug prevents nfsdcltrack from entering the relevant nfs client data into the database.  In the event of an nfs server restart, nfs clients will be unable to reclaim these locks.

Comment 4 Gilles Dubreuil 2015-12-21 00:34:42 UTC

There are definitely issues with this version.

In my case it's when using a IPv6 stack between the NFS server and clients.
Mounting and listing works but accessing a file content doesn't work. The command (such as cat) just freeze. Also security is not an issue (tested with both SELinux and FW off)

A workaround is to move back to previous package version (Tested with nfs-utils-1.3.0-0.8.el7.x86_64.rpm) and to make sure both sides, nfs server and clients are downgraded.

Comment 5 mark2015 2016-01-01 21:15:58 UTC

I've also had this problem.

I have a share defined in /etc/exports as follows
/path host(ro,insecure)

After upgrade to 7.2 that host (running Kodibuntu OS) could no longer mount the share, it got 'mount.nfs: access denied by server while mounting x'

Downgrading nfs-utils resolved the issue.

I also have other share defined in /etc/exports and they weren't effected, although they weren't defined with the insecure flag.

Comment 6 Frank Sorenson 2016-01-11 19:42:02 UTC

I don't see how either of these relates to the bugzilla.  This bugzilla has nothing to do with either mounting or IPv6.  It's about errors updating the database schema used by the nfsdcltrack utility for tracking file locks.

Comment 7 Steven Ellis 2016-01-19 08:47:52 UTC

I can confirm that rolling back the nfs-utils package resolved the error and also allowed Ubuntu bases nfs4 clients to connect again.

yum downgrade nfs-utils

Running transaction
  Installing : 1:nfs-utils-1.3.0-0.8.el7.x86_64                                                                                                                                  1/2 
warning: /etc/sysconfig/nfs created as /etc/sysconfig/nfs.rpmnew
  Cleanup    : 1:nfs-utils-1.3.0-0.21.el7.x86_64                                                                                                                                 2/2 
  Verifying  : 1:nfs-utils-1.3.0-0.8.el7.x86_64                                                                                                                                  1/2 
  Verifying  : 1:nfs-utils-1.3.0-0.21.el7.x86_64                                                                                                                                 2/2 

Removed:
  nfs-utils.x86_64 1:1.3.0-0.21.el7                                                                                                                                                  

Installed:
  nfs-utils.x86_64 1:1.3.0-0.8.el7                                                                                                                                                   

Previously to downgrading with a RHEL 7.2 NFS server and Ubuntu 14.04 LTS client

root@mythtv:/mnt# mount -t nfs -o vers=4 fileserver:/mnt/share  /mnt/local -v
mount.nfs: timeout set for Tue Jan 19 21:25:11 2016
mount.nfs: trying text-based options 'vers=4,addr=192.168.0.10,clientaddr=192.168.0.17'
mount.nfs: mount(2): Permission denied

Comment 8 Steven Ellis 2016-01-19 08:49:51 UTC

current kernel version of test nfs server - 3.10.0-229.20.1.el7.x86_64

 - All packages are current excluding nfs-utils
 - selinux enforcing
 - firewalld enabled

If I upgrade nfs-utils to 1.3.0-0.21.el7 then Ubuntu 14.04 LTS client can't connect using NFSv4

Comment 12 Steve Dickson 2016-01-25 16:15:52 UTC

*** Bug 1298320 has been marked as a duplicate of this bug. ***

Comment 13 Zenon Panoussis 2016-01-30 12:08:25 UTC

I also got this problem since I upgraded to 7.2. 

Restarting nfs produced this in the log:

systemd: Starting NFS server and services...
Jan 30 12:40:30 cantor nfsdcltrack[11205]: sqlite_query_reclaiming: unable to prepare select statement: no such column: has_session
Jan 30 12:40:30 cantor kernel: NFSD: starting 90-second grace period 

So then I tried 

# sqlite3 /var/lib/nfs/nfsdcltrack/main.sqlite
sqlite> .tables
clients     parameters
sqlite> .schema clients
CREATE TABLE clients (id BLOB PRIMARY KEY, time INTEGER);
sqlite> .schema parameters
CREATE TABLE parameters (key TEXT PRIMARY KEY, value TEXT);

Just guessing where and what has_session ought to be, 

sqlite> alter table clients add column has_session TINYINT;
sqlite> .schema clients
CREATE TABLE clients (id BLOB PRIMARY KEY, time INTEGER, has_session TINYINT);
sqlite> .exit

Now I no longer get the error:

systemd: Starting NFS server and services...
kernel: NFSD: starting 90-second grace period (net ffffffff81a25e00)
systemd: Started NFS server and services.

And 'flock /myshare/bar sleep 30' followed by server restart produces this:

systemd: Starting NFS server and services...
kernel: NFSD: starting 90-second grace period (net ffffffff81a25e00)
systemd: Started NFS server and services.
systemd: Starting Notify NFS peers of a restart...
sm-notify[11325]: Version 1.3.0 starting
sm-notify[11325]: Already notifying clients; Exiting!
systemd: Started Notify NFS peers of a restart.

That's where I think it should have thrown the "insert statement prepare failed" error, but it seems happy. This could be the quick 'n dirty workaround if someone else would try and can confirm it works.

Comment 15 Benjamin Coddington 2016-02-04 16:39:10 UTC

(In reply to Zenon Panoussis from comment #13)
> That's where I think it should have thrown the "insert statement prepare
> failed" error, but it seems happy. This could be the quick 'n dirty
> workaround if someone else would try and can confirm it works.

It works for me!  Thanks for working that out.

Comment 16 Jason Tibbitts 2016-02-05 17:40:45 UTC

Just FYI, this is the schema from a Fedora 23 machine (where the problem is not present):

sqlite> .tables
clients     parameters
sqlite> .schema clients
CREATE TABLE clients (id BLOB PRIMARY KEY, time INTEGER, has_session INTEGER);
sqlite> .schema parameters
CREATE TABLE parameters (key TEXT PRIMARY KEY, value TEXT);

Comment 18 Zenon Panoussis 2016-02-10 17:54:25 UTC

(In reply to Jason Tibbitts from comment #16)

> Just FYI, this is the schema from a Fedora 23 machine (where the problem is
> not present):

> sqlite> .schema clients
> CREATE TABLE clients (id BLOB PRIMARY KEY, time INTEGER, has_session
> INTEGER);

Uhm, has_client is boolean and for a moment I thought that using INTEGER is a bug on its own, but it turns out that the TINYINT in my previous comment simply shows my ignorance of sqlite. Its storage is always INTEGER, no matter what kind of integer you specify: https://www.sqlite.org/datatype3.html . But, precisely therefore, you can safely create the column as TINYINT or BIGINT or any other INT you want and the result will be exactly the same.

Comment 21 Erwan Velu 2016-04-20 13:42:19 UTC

Just hit that issue too.

Comment 22 Luc Lalonde 2016-05-12 21:58:00 UTC

I have the same issue:

May 12 17:49:21 nfs-server nfsdcltrack[10765]: sqlite_insert_client: insert statement prepare failed: table clients has 2 columns but 3 values were supplied

I have to force 'vers=4.0' on the clients for autofs mounts... Otherwise I clients cannot mount their home directories:

*    -fstype=nfs4,rw,sec=krb5,vers=4.0      nfs-server:/&

However, I don't know if this is related to this issue...

Comment 24 Luc Lalonde 2016-05-16 15:38:20 UTC

Could this be related:

May 10 10:10:53 moe-180 kernel: ------------[ cut here ]------------
May 10 10:10:53 moe-180 kernel: WARNING: at fs/nfsd/nfs4state.c:3853 nfsd4_process_open2+0xb72/0xf70 [nfsd]()
May 10 10:10:53 moe-180 kernel: Modules linked in: fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables cts rpcsec_gss_krb5 nf_conntrack_ipv4 nf_defrag_ipv4 vmw_vsock_vmci_transport vsock xt_conntrack nf_conntrack iptable_filter coretemp kvm_intel kvm ppdev vmw_balloon sg pcspkr parport_pc vmw_vmci i2c_piix4 parport shpchp nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic sr_mod cdrom crct10dif_common ata_generic pata_acpi vmwgfx crc32c_intel mptspi drm_kms_helper serio_raw ata_piix scsi_transport_spi ttm mptscsih mptbase drm vmxnet3 libata i2c_core floppy dm_mirror dm_region_hash dm_log dm_mod
May 10 10:10:53 moe-180 kernel: CPU: 1 PID: 3708 Comm: nfsd Not tainted 3.10.0-327.13.1.el7.x86_64 #1
May 10 10:10:53 moe-180 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
May 10 10:10:53 moe-180 kernel: 0000000000000000 00000000288c7f59 ffff88040d56fc28 ffffffff8163571c
May 10 10:10:53 moe-180 kernel: ffff88040d56fc60 ffffffff8107b200 ffff8802bf459708 ffff880422e3f3e0
May 10 10:10:53 moe-180 kernel: ffff880191afdd98 ffff88042719b600 0000000000000000 ffff88040d56fc70
May 10 10:10:53 moe-180 kernel: Call Trace:
May 10 10:10:53 moe-180 kernel: [<ffffffff8163571c>] dump_stack+0x19/0x1b
May 10 10:10:53 moe-180 kernel: [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0
May 10 10:10:53 moe-180 kernel: [<ffffffff8107b34a>] warn_slowpath_null+0x1a/0x20
May 10 10:10:53 moe-180 kernel: [<ffffffffa033be22>] nfsd4_process_open2+0xb72/0xf70 [nfsd]
May 10 10:10:53 moe-180 kernel: [<ffffffffa032b14a>] nfsd4_open+0x55a/0x850 [nfsd]
May 10 10:10:53 moe-180 kernel: [<ffffffffa032b917>] nfsd4_proc_compound+0x4d7/0x7f0 [nfsd]
May 10 10:10:53 moe-180 kernel: [<ffffffffa031712b>] nfsd_dispatch+0xbb/0x200 [nfsd]
May 10 10:10:53 moe-180 kernel: [<ffffffffa02b2183>] svc_process_common+0x453/0x6f0 [sunrpc]
May 10 10:10:53 moe-180 kernel: [<ffffffffa02b2523>] svc_process+0x103/0x170 [sunrpc]
May 10 10:10:53 moe-180 kernel: [<ffffffffa0316ab7>] nfsd+0xe7/0x150 [nfsd]
May 10 10:10:53 moe-180 kernel: [<ffffffffa03169d0>] ? nfsd_destroy+0x80/0x80 [nfsd]
May 10 10:10:53 moe-180 kernel: [<ffffffff810a5aef>] kthread+0xcf/0xe0
May 10 10:10:53 moe-180 kernel: [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
May 10 10:10:53 moe-180 kernel: [<ffffffff81645e18>] ret_from_fork+0x58/0x90
May 10 10:10:53 moe-180 kernel: [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
May 10 10:10:53 moe-180 kernel: ---[ end trace 37abbe18e83e49c4 ]---

Comment 25 J. Bruce Fields 2016-05-16 19:00:45 UTC

(In reply to Luc Lalonde from comment #24)
> Could this be related:

That's unlikely to be related.  You might be seeing bug 1300023, which should be fixed in kernel-3.10.0-351.el7.

Comment 27 Yongcheng Yang 2016-05-19 03:06:48 UTC

Move to VERIFIED as comment 26 and continue to run the corresponding automatic case in the future.

Comment 29 errata-xmlrpc 2016-11-04 05:01:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2383.html