1382267 – rpc test suite fails with "mknod for block device: failed".

Bug 1382267 - rpc test suite fails with "mknod for block device: failed".

Summary: rpc test suite fails with "mknod for block device: failed".

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Kaleb KEITHLEY
QA Contact:	Arthy Loganathan
Docs Contact:
URL:	https://review.gerrithub.io/297394
Whiteboard:
Depends On:
Blocks:	1351528
TreeView+	depends on / blocked

Reported:	2016-10-06 07:53 UTC by Shashank Raj
Modified:	2017-03-23 06:24 UTC (History)
CC List:	11 users (show)
Fixed In Version:	nfs-ganesha-2.4.1-1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-03-23 06:24:12 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2017:0493	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.2.0 nfs-ganesha bug fix and enhancement update	2017-03-23 09:19:13 UTC

Description Shashank Raj 2016-10-06 07:53:26 UTC

Description of problem:

rpc test suite fails with "mknod for block device: failed".

Version-Release number of selected component (if applicable):

[root@dhcp43-110 ~]# rpm -qa|grep ganesha
nfs-ganesha-debuginfo-2.4.0-2.el7rhgs.x86_64
nfs-ganesha-2.4.0-2.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.0-2.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-1.el7rhgs.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Create a 4 node nfs-ganesha cluster, create a volume and enable ganesha on it.
2. Mount the volume on client and run rpc test suite.
3. Observe that rpc test suite fails with "mknod for block device: failed" error in logs.

4. following is the test which gets executed for mknod in rpc test suite:

function test_mknod()
{
    mknod -m 0666 $PFX/dir/block b 13 42;
    test "$(stat -c '%F %a %t %T' $PFX/dir/block)" == "block special file 666 d 2a" \
        || fail "mknod for block device"

    mknod -m 0666 $PFX/dir/char c 13 42;
    test "$(stat -c '%F %a %t %T' $PFX/dir/char)" == "character special file 666 d 2a" \
        || fail "mknod for character device"

    mknod -m 0666 $PFX/dir/fifo p;
    test "$(stat -c '%F %a' $PFX/dir/fifo)" == "fifo 666" || \
        fail "mknod for fifo"
}

5. So here this test tries to create block device with 13 as major and 42 as minor number but it actually gets created on mount point with 0, 0 as major and minor number respectively

[root@Client2 dir]# pwd
/mnt/nfs1/run17188/coverage/dir
[root@Client2 dir]# ls -l
total 1
brw-rw-rw-. 1 nobody nobody 0, 0 Oct  6 18:45 block

6. And while doing stat on the block device it expects major and minor in hex as d and 2a but gets 0, 0 instead and because of which it fails

[root@Client2 dir]# stat -c '%F %a %t %T' /mnt/nfs1/run17188/coverage/dir/block
block special file 666 0 0

7. Verified this on normal linux machine without nfs and it works fine:

[root@Client2 ~]# mknod -m 0666 /root/block b 13 42

[root@Client2 ~]# ls -l
total 10800
brw-rw-rw-.  1 root root   13, 42 Oct  6 18:30 block

[root@Client2 ~]# stat -c '%F %a %t %T' /root/block
block special file 666 d 2a


Actual results:

rpc test suite fails with "mknod for block device: failed".

Expected results:

block device should gets created with specified major and minor numbers.

Additional info:

Comment 2 Soumya Koduri 2016-10-06 09:24:01 UTC

This seems be implementation gap in NFS-Ganesha and not specific to just FSAL_GLUSTER. The device data passed by client is not processed as part of CREATE fop. This issue exists for all the FSALs which NFS-Ganesha supports (not sure about FSAL_PROXY) and for all the protocol versions. 

Requesting Frank to provide his comments

Comment 3 Daniel Gryniewicz 2016-10-06 14:16:12 UTC

So stepping through in the debugger, the RAWDEV attribute is *not* passed as part of the create.  Only the MAXWRITE attribute is given.  I'll dig deeper to see what is actually being sent.

Comment 4 Daniel Gryniewicz 2016-10-06 15:34:08 UTC

Proposed fix upstream.

Comment 5 Soumya Koduri 2016-10-06 17:05:09 UTC

Thanks for the fix Dan.

Comment 6 Frank Filz 2016-10-06 17:43:28 UTC

Curious what this "rpc test suite" is. Where can I find it?

Thanks

Frank

Comment 12 Arthy Loganathan 2016-11-18 07:19:41 UTC

Executed rpc test suite with latest build,
nfs-ganesha-2.4.1-1.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64

Not hitting the error, "mknod for block device: failed".

But getting the following error.

[root@dhcp47-49 test_nfsv4]# cat /var/tmp/rpc.log 
flock: 200: Bad file descriptor
flock -s: failed.


[root@dhcp47-49 ~]# /opt/qa/tools/system_light/run.sh -w /mnt/test_nfsv4/ -t rpc -l /var/tmp/rpc.log
/opt/qa/tools/system_light/scripts
/opt/qa/tools/system_light
/root
/mnt/test_nfsv4/
/mnt
----- /mnt/test_nfsv4/
/mnt/test_nfsv4//run26565/
Tests available:
arequal
bonnie
compile_kernel
coverage
dbench
dd
ffsb
fileop
fs_mark
fsx
fuse
glusterfs
glusterfs_build
iozone
locks
ltp
multiple_files
openssl
posix_compliance
postmark
read_large
rpc
syscallbench
tiobench
===========================TESTS RUNNING===========================
Changing to the specified mountpoint
/mnt/test_nfsv4/run26565
executing rpc
start: 12:34:16

real	0m7.959s
user	0m0.089s
sys	0m0.229s
end: 12:34:24
rpc failed
0
Total 0 tests were successful
Switching over to the previous working directory
Removing /mnt/test_nfsv4//run26565/
rmdir: failed to remove ‘/mnt/test_nfsv4//run26565/’: Directory not empty
rmdir failed:Directory not empty

I am able to remove the directories manually after the test failure.

[root@dhcp47-49 test_nfsv4]# mount
10.70.44.92:/vol1_new on /mnt/test_nfsv4 type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.70.47.49,local_lock=none,addr=10.70.44.92)


ganesha-gfapi.log snippet:
----------------------------

[root@dhcp46-111 ~]# tail -f /var/log/ganesha-gfapi.log
[2016-11-18 07:03:38.290864] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol1_new-dht: renaming /run26431/coverage/dir/file (hash=vol1_new-replicate-4/cache=vol1_new-replicate-4) => /run26431/coverage/dir/file2 (hash=vol1_new-replicate-3/cache=<nul>)
[2016-11-18 07:03:38.335783] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol1_new-dht: renaming /run26431/coverage/dir/file2 (hash=vol1_new-replicate-3/cache=vol1_new-replicate-4) => /run26431/coverage/dir/file (hash=vol1_new-replicate-4/cache=<nul>)
[2016-11-18 07:03:38.377424] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol1_new-dht: renaming /run26431/coverage/dir (hash=vol1_new-replicate-2/cache=vol1_new-replicate-0) => /run26431/coverage/dir2 (hash=vol1_new-replicate-2/cache=<nul>)
[2016-11-18 07:03:38.425084] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol1_new-dht: renaming /run26431/coverage/dir2 (hash=vol1_new-replicate-2/cache=vol1_new-replicate-0) => /run26431/coverage/dir (hash=vol1_new-replicate-2/cache=<nul>)
[2016-11-18 07:04:16.334413] I [MSGID: 108031] [afr-common.c:2070:afr_local_discovery_cbk] 0-vol1_new-replicate-2: selecting local read_child vol1_new-client-4
[2016-11-18 07:04:16.336589] I [MSGID: 108031] [afr-common.c:2070:afr_local_discovery_cbk] 0-vol1_new-replicate-0: selecting local read_child vol1_new-client-0
[2016-11-18 07:04:16.337434] I [MSGID: 108031] [afr-common.c:2070:afr_local_discovery_cbk] 0-vol1_new-replicate-4: selecting local read_child vol1_new-client-8
[2016-11-18 07:04:19.754385] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol1_new-dht: renaming /run26565/coverage/dir/file (hash=vol1_new-replicate-1/cache=vol1_new-replicate-1) => /run26565/coverage/dir/file2 (hash=vol1_new-replicate-0/cache=<nul>)
[2016-11-18 07:04:19.795166] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol1_new-dht: renaming /run26565/coverage/dir/file2 (hash=vol1_new-replicate-0/cache=vol1_new-replicate-1) => /run26565/coverage/dir/file (hash=vol1_new-replicate-1/cache=<nul>)
[2016-11-18 07:04:19.834662] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol1_new-dht: renaming /run26565/coverage/dir (hash=vol1_new-replicate-3/cache=vol1_new-replicate-0) => /run26565/coverage/dir2 (hash=vol1_new-replicate-3/cache=<nul>)
[2016-11-18 07:04:19.879988] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol1_new-dht: renaming /run26565/coverage/dir2 (hash=vol1_new-replicate-3/cache=vol1_new-replicate-0) => /run26565/coverage/dir (hash=vol1_new-replicate-3/cache=<nul>)

packet trace are located at, http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1382267/

Comment 13 Soumya Koduri 2016-11-18 07:30:54 UTC

The original issue reported in this bug is fixed. Directory not empty error seems to be different issue (could be similar to bug1381416). Please file a different bug to track it. Thanks!

Comment 14 Arthy Loganathan 2016-11-18 12:08:13 UTC

Since the original issue reported in this bug is fixed, moving this bug to verified state.

Verified build:
---------------

nfs-ganesha-2.4.1-1.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64

Comment 16 errata-xmlrpc 2017-03-23 06:24:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2017-0493.html

Note You need to log in before you can comment on or make changes to this bug.