Description of problem: rpc test suite fails with "mknod for block device: failed". Version-Release number of selected component (if applicable): [root@dhcp43-110 ~]# rpm -qa|grep ganesha nfs-ganesha-debuginfo-2.4.0-2.el7rhgs.x86_64 nfs-ganesha-2.4.0-2.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.0-2.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-1.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1. Create a 4 node nfs-ganesha cluster, create a volume and enable ganesha on it. 2. Mount the volume on client and run rpc test suite. 3. Observe that rpc test suite fails with "mknod for block device: failed" error in logs. 4. following is the test which gets executed for mknod in rpc test suite: function test_mknod() { mknod -m 0666 $PFX/dir/block b 13 42; test "$(stat -c '%F %a %t %T' $PFX/dir/block)" == "block special file 666 d 2a" \ || fail "mknod for block device" mknod -m 0666 $PFX/dir/char c 13 42; test "$(stat -c '%F %a %t %T' $PFX/dir/char)" == "character special file 666 d 2a" \ || fail "mknod for character device" mknod -m 0666 $PFX/dir/fifo p; test "$(stat -c '%F %a' $PFX/dir/fifo)" == "fifo 666" || \ fail "mknod for fifo" } 5. So here this test tries to create block device with 13 as major and 42 as minor number but it actually gets created on mount point with 0, 0 as major and minor number respectively [root@Client2 dir]# pwd /mnt/nfs1/run17188/coverage/dir [root@Client2 dir]# ls -l total 1 brw-rw-rw-. 1 nobody nobody 0, 0 Oct 6 18:45 block 6. And while doing stat on the block device it expects major and minor in hex as d and 2a but gets 0, 0 instead and because of which it fails [root@Client2 dir]# stat -c '%F %a %t %T' /mnt/nfs1/run17188/coverage/dir/block block special file 666 0 0 7. Verified this on normal linux machine without nfs and it works fine: [root@Client2 ~]# mknod -m 0666 /root/block b 13 42 [root@Client2 ~]# ls -l total 10800 brw-rw-rw-. 1 root root 13, 42 Oct 6 18:30 block [root@Client2 ~]# stat -c '%F %a %t %T' /root/block block special file 666 d 2a Actual results: rpc test suite fails with "mknod for block device: failed". Expected results: block device should gets created with specified major and minor numbers. Additional info:
This seems be implementation gap in NFS-Ganesha and not specific to just FSAL_GLUSTER. The device data passed by client is not processed as part of CREATE fop. This issue exists for all the FSALs which NFS-Ganesha supports (not sure about FSAL_PROXY) and for all the protocol versions. Requesting Frank to provide his comments
So stepping through in the debugger, the RAWDEV attribute is *not* passed as part of the create. Only the MAXWRITE attribute is given. I'll dig deeper to see what is actually being sent.
Proposed fix upstream.
Thanks for the fix Dan.
Curious what this "rpc test suite" is. Where can I find it? Thanks Frank
Executed rpc test suite with latest build, nfs-ganesha-2.4.1-1.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64 Not hitting the error, "mknod for block device: failed". But getting the following error. [root@dhcp47-49 test_nfsv4]# cat /var/tmp/rpc.log flock: 200: Bad file descriptor flock -s: failed. [root@dhcp47-49 ~]# /opt/qa/tools/system_light/run.sh -w /mnt/test_nfsv4/ -t rpc -l /var/tmp/rpc.log /opt/qa/tools/system_light/scripts /opt/qa/tools/system_light /root /mnt/test_nfsv4/ /mnt ----- /mnt/test_nfsv4/ /mnt/test_nfsv4//run26565/ Tests available: arequal bonnie compile_kernel coverage dbench dd ffsb fileop fs_mark fsx fuse glusterfs glusterfs_build iozone locks ltp multiple_files openssl posix_compliance postmark read_large rpc syscallbench tiobench ===========================TESTS RUNNING=========================== Changing to the specified mountpoint /mnt/test_nfsv4/run26565 executing rpc start: 12:34:16 real 0m7.959s user 0m0.089s sys 0m0.229s end: 12:34:24 rpc failed 0 Total 0 tests were successful Switching over to the previous working directory Removing /mnt/test_nfsv4//run26565/ rmdir: failed to remove ‘/mnt/test_nfsv4//run26565/’: Directory not empty rmdir failed:Directory not empty I am able to remove the directories manually after the test failure. [root@dhcp47-49 test_nfsv4]# mount 10.70.44.92:/vol1_new on /mnt/test_nfsv4 type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.70.47.49,local_lock=none,addr=10.70.44.92) ganesha-gfapi.log snippet: ---------------------------- [root@dhcp46-111 ~]# tail -f /var/log/ganesha-gfapi.log [2016-11-18 07:03:38.290864] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol1_new-dht: renaming /run26431/coverage/dir/file (hash=vol1_new-replicate-4/cache=vol1_new-replicate-4) => /run26431/coverage/dir/file2 (hash=vol1_new-replicate-3/cache=<nul>) [2016-11-18 07:03:38.335783] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol1_new-dht: renaming /run26431/coverage/dir/file2 (hash=vol1_new-replicate-3/cache=vol1_new-replicate-4) => /run26431/coverage/dir/file (hash=vol1_new-replicate-4/cache=<nul>) [2016-11-18 07:03:38.377424] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol1_new-dht: renaming /run26431/coverage/dir (hash=vol1_new-replicate-2/cache=vol1_new-replicate-0) => /run26431/coverage/dir2 (hash=vol1_new-replicate-2/cache=<nul>) [2016-11-18 07:03:38.425084] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol1_new-dht: renaming /run26431/coverage/dir2 (hash=vol1_new-replicate-2/cache=vol1_new-replicate-0) => /run26431/coverage/dir (hash=vol1_new-replicate-2/cache=<nul>) [2016-11-18 07:04:16.334413] I [MSGID: 108031] [afr-common.c:2070:afr_local_discovery_cbk] 0-vol1_new-replicate-2: selecting local read_child vol1_new-client-4 [2016-11-18 07:04:16.336589] I [MSGID: 108031] [afr-common.c:2070:afr_local_discovery_cbk] 0-vol1_new-replicate-0: selecting local read_child vol1_new-client-0 [2016-11-18 07:04:16.337434] I [MSGID: 108031] [afr-common.c:2070:afr_local_discovery_cbk] 0-vol1_new-replicate-4: selecting local read_child vol1_new-client-8 [2016-11-18 07:04:19.754385] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol1_new-dht: renaming /run26565/coverage/dir/file (hash=vol1_new-replicate-1/cache=vol1_new-replicate-1) => /run26565/coverage/dir/file2 (hash=vol1_new-replicate-0/cache=<nul>) [2016-11-18 07:04:19.795166] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol1_new-dht: renaming /run26565/coverage/dir/file2 (hash=vol1_new-replicate-0/cache=vol1_new-replicate-1) => /run26565/coverage/dir/file (hash=vol1_new-replicate-1/cache=<nul>) [2016-11-18 07:04:19.834662] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol1_new-dht: renaming /run26565/coverage/dir (hash=vol1_new-replicate-3/cache=vol1_new-replicate-0) => /run26565/coverage/dir2 (hash=vol1_new-replicate-3/cache=<nul>) [2016-11-18 07:04:19.879988] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol1_new-dht: renaming /run26565/coverage/dir2 (hash=vol1_new-replicate-3/cache=vol1_new-replicate-0) => /run26565/coverage/dir (hash=vol1_new-replicate-3/cache=<nul>) packet trace are located at, http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1382267/
The original issue reported in this bug is fixed. Directory not empty error seems to be different issue (could be similar to bug1381416). Please file a different bug to track it. Thanks!
Since the original issue reported in this bug is fixed, moving this bug to verified state. Verified build: --------------- nfs-ganesha-2.4.1-1.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2017-0493.html