Bug 1713261

Summary: Posix compliance test is hung on v3 mount and "open_fd_count" goes to negative value
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Manisha Saini <msaini>
Component: nfs-ganeshaAssignee: Frank Filz <ffilz>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.5CC: dang, ffilz, grajoria, jthottan, mbenjamin, nchilaka, pasik, rhs-bugs, skoduri, storage-qa-internal, vdas
Target Milestone: ---Keywords: Regression
Target Release: RHGS 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: nfs-ganesha-2.7.3-4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-30 12:15:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1696809    

Description Manisha Saini 2019-05-23 09:42:56 UTC
Description of problem:
================

Posix compliance test is hung when ran over v3 mount.And open_fd_count goes to negative in ganesha.log.
The test is passing on v4.1 mount

----------------
23/05/2019 14:13:00 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-21498[reaper] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE
23/05/2019 14:14:12 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-21498[svc_4] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error Permission denied
23/05/2019 14:14:12 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-21498[svc_7] glusterfs_close_my_fd :FSAL :CRIT :Error : close returns with Permission denied
23/05/2019 14:14:12 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-21498[svc_7] fsal_remove :FSAL :CRIT :Error closing fstest_94364d33890f770dbc0e6b6249066b70 before unlink: Permission denied.
23/05/2019 14:14:17 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-21498[Gen_Fridge] fsal_close :FSAL :CRIT :open_fd_count is negative: -1
23/05/2019 14:14:17 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-21498[Gen_Fridge] fsal_close :FSAL :CRIT :open_fd_count is negative: -2
23/05/2019 14:14:18 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-21498[svc_15] mdcache_lru_fds_available :INODE LRU :CRIT :FD Hard Limit Exceeded, waking LRU thread.
23/05/2019 14:14:53 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-21498[cache_lru] lru_run :INODE LRU :WARN :Futility count exceeded.  Client load is opening FDs faster than the LRU thread can close them.
23/05/2019 14:15:23 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-21498[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 893, old: 892; deq new: 892, old: 892
--------------


Version-Release number of selected component (if applicable):
===========

# rpm -qa | grep ganesha
nfs-ganesha-2.7.3-3.el7rhgs.x86_64
glusterfs-ganesha-6.0-3.el7rhgs.x86_64
nfs-ganesha-gluster-2.7.3-3.el7rhgs.x86_64

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.7 Beta (Maipo)


How reproducible:
============
3/3

Steps to Reproduce:
============
1.Create 4 node ganesha cluster
2.Create 3 volumes 4*3 Distributed-Replicate Volume and export the volumes via Ganesha
3.Mount the volumes on NFS client via v3 and run posix compliance test on one of the volume (mani1)


Actual results:
=============

The posix test got hung on v3 mount while running "link/01.t ....... 1/8" test.
Ganesha.log shows "open_fd_count" goes to negative.
Posix compliance test is passing when running on v4.1 mount

----------
[root@dhcp47-152 mani1]# prove -r /home/ntfs-3g-pjd-fstest/
/home/ntfs-3g-pjd-fstest/tests/chflags/00.t .... ok   
/home/ntfs-3g-pjd-fstest/tests/chflags/01.t .... ok   
/home/ntfs-3g-pjd-fstest/tests/chflags/02.t .... ok   
/home/ntfs-3g-pjd-fstest/tests/chflags/03.t .... ok   
/home/ntfs-3g-pjd-fstest/tests/chflags/04.t .... ok   
/home/ntfs-3g-pjd-fstest/tests/chflags/05.t .... ok   
/home/ntfs-3g-pjd-fstest/tests/chflags/06.t .... ok   
/home/ntfs-3g-pjd-fstest/tests/chflags/07.t .... ok   
/home/ntfs-3g-pjd-fstest/tests/chflags/08.t .... ok   
/home/ntfs-3g-pjd-fstest/tests/chflags/09.t .... ok   
/home/ntfs-3g-pjd-fstest/tests/chflags/10.t .... ok   
/home/ntfs-3g-pjd-fstest/tests/chflags/11.t .... ok   
/home/ntfs-3g-pjd-fstest/tests/chflags/12.t .... ok   
/home/ntfs-3g-pjd-fstest/tests/chflags/13.t .... ok   
/home/ntfs-3g-pjd-fstest/tests/chmod/00.t ...... Failed 3/106 subtests 
/home/ntfs-3g-pjd-fstest/tests/chmod/01.t ...... ok   
/home/ntfs-3g-pjd-fstest/tests/chmod/02.t ...... ok   
/home/ntfs-3g-pjd-fstest/tests/chmod/03.t ...... ok     
/home/ntfs-3g-pjd-fstest/tests/chmod/04.t ...... ok   
/home/ntfs-3g-pjd-fstest/tests/chmod/05.t ...... ok     
/home/ntfs-3g-pjd-fstest/tests/chmod/06.t ...... ok   
/home/ntfs-3g-pjd-fstest/tests/chmod/07.t ...... ok     
/home/ntfs-3g-pjd-fstest/tests/chmod/08.t ...... ok   
/home/ntfs-3g-pjd-fstest/tests/chmod/09.t ...... ok   
/home/ntfs-3g-pjd-fstest/tests/chmod/10.t ...... ok   
/home/ntfs-3g-pjd-fstest/tests/chmod/11.t ...... ok     
/home/ntfs-3g-pjd-fstest/tests/chown/00.t ...... Failed 1/171 subtests 
/home/ntfs-3g-pjd-fstest/tests/chown/01.t ...... ok   
/home/ntfs-3g-pjd-fstest/tests/chown/02.t ...... ok   
/home/ntfs-3g-pjd-fstest/tests/chown/03.t ...... ok     
/home/ntfs-3g-pjd-fstest/tests/chown/04.t ...... ok   
/home/ntfs-3g-pjd-fstest/tests/chown/05.t ...... ok     
/home/ntfs-3g-pjd-fstest/tests/chown/06.t ...... ok   
/home/ntfs-3g-pjd-fstest/tests/chown/07.t ...... ok     
/home/ntfs-3g-pjd-fstest/tests/chown/08.t ...... ok   
/home/ntfs-3g-pjd-fstest/tests/chown/09.t ...... ok   
/home/ntfs-3g-pjd-fstest/tests/chown/10.t ...... ok   
/home/ntfs-3g-pjd-fstest/tests/ftruncate/00.t .. Failed 1/26 subtests 
/home/ntfs-3g-pjd-fstest/tests/ftruncate/13.t .. ok   
/home/ntfs-3g-pjd-fstest/tests/link/00.t ....... ok     
/home/ntfs-3g-pjd-fstest/tests/link/01.t ....... 1/8 
---------------


Expected results:
==============

Posix test should pass without any issues

Additional info:

will attach sosreport shortly

Comment 4 Daniel Gryniewicz 2019-05-23 13:43:49 UTC
The Permission Denied in fsal_remove() is suspicious, but I don't see how it can lead to a negative FD count.  The FD is, in fact, marked closed, so it shouldn't be closed again.  Frank, any ideas?

Comment 5 Frank Filz 2019-05-23 21:31:58 UTC
From the log, it looks like there could possibly be a race between the FSAL close (file_close is the Gluster implementation) methods on multiple threads. file_close checks openflags and then decides whether to actually close or return ERR_FSAL_NOT_OPENED which when returned keeps the open_fd_count from being decremented.

Here is a fix for that race:

https://review.gerrithub.io/#/c/ffilz/nfs-ganesha/+/455570/

Comment 6 Soumya Koduri 2019-05-24 04:47:18 UTC
Also there is a testcase which is hung .. not sure if it is due to above mentioned issue with fsal_close. Will look into it and update.

Comment 7 Manisha Saini 2019-05-24 09:38:27 UTC
"open_fd_count is negative" messages are also observed while running cthon special test on V3 mount


Ganesha.log
---------------
24/05/2019 13:30:39 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-14262[svc_14] fsal_close :FSAL :CRIT :open_fd_count is negative: -1
24/05/2019 13:30:39 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-14262[svc_33] fsal_close :FSAL :CRIT :open_fd_count is negative: -2
24/05/2019 13:30:39 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-14262[svc_29] fsal_close :FSAL :CRIT :open_fd_count is negative: -3
24/05/2019 13:30:39 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-14262[svc_26] fsal_close :FSAL :CRIT :open_fd_count is negative: -4
24/05/2019 13:30:39 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-14262[svc_6] fsal_close :FSAL :CRIT :open_fd_count is negative: -5
24/05/2019 13:30:39 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-14262[svc_17] fsal_close :FSAL :CRIT :open_fd_count is negative: -6
24/05/2019 13:30:39 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-14262[svc_30] fsal_close :FSAL :CRIT :open_fd_count is negative: -7
24/05/2019 13:30:39 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-14262[svc_13] fsal_close :FSAL :CRIT :open_fd_count is negative: -8

--------------



Later when ran cthon lock test post running special test,the test got hung on V3 mount



Ganesha.log
--------------
24/05/2019 15:04:39 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-14262[svc_2] mdcache_lru_fds_available :INODE LRU :CRIT :FD Hard Limit Exceeded, waking LRU thread.
24/05/2019 15:05:09 : epoch 94200000 : dhcp47-199.lab.eng.blr.redhat.com : ganesha.nfsd-14262[cache_lru] lru_run :INODE LRU :WARN :Futility count exceeded.  Client load is opening FDs faster than the LRU thread can close them.
--------------



Client terminal snippet

-----
# cd /root/cthon04 && ./server -l -o vers=3 -p /mani2 -m /mnt/mani2 -N 1 dhcp47-199.lab.eng.blr.redhat.com
sh ./runtests  -l -t /mnt/mani2/dhcp47-152.test

Starting LOCKING tests: test directory /mnt/mani2/dhcp47-152.test (arg: -t)

Testing native post-LFS locking

Creating parent/child synchronization pipes.
-------

Comment 8 Soumya Koduri 2019-05-24 11:32:41 UTC
Thanks manisha for sharing the setup. Checked the pkt trace when the tests are hung. Client is trying to create a file for which the ganesha server is constantly responding with NFS3ERR_JUKEBOX error (thus making it look like hang)

From gdb,

Breakpoint 6, mdcache_lru_fds_available ()
    at /usr/src/debug/nfs-ganesha-2.7.3/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:2142
2142		if (atomic_fetch_size_t(&open_fd_count) >= lru_state.fds_hard_limit) {
(gdb) p open_fd_count
$3 = 18446744073709551608
(gdb) p/x open_fd_count
$4 = 0xfffffffffffffff8
(gdb) p lru_state.fds_hard_limit
$5 = 1038090
(gdb) p mdcache_param.fd_limit_percent
$6 = 99
(gdb) p mdcache_param.fds_system_imposed
There is no member named fds_system_imposed.
(gdb) p lru_state.fds_system_imposed 
$7 = 1048576
(gdb) p (int64)open_fd_count
$8 = -8
(gdb) 


fsal_status_t mdcache_open2(struct fsal_obj_handle *obj_hdl,
                           struct state_t *state,
                           fsal_openflags_t openflags,
                           enum fsal_create_mode createmode,
                           const char *name,
                           struct attrlist *attrs_in,
                           fsal_verifier_t verifier,
                           struct fsal_obj_handle **new_obj,
                           struct attrlist *attrs_out,
                           bool *caller_perm_check)
{
...
...

        if (name) {
                if (!state && !mdcache_lru_fds_available()) {
                        /* This seems the best idea, let the
                         * client try again later after the reap.
                         */
                        return fsalstat(ERR_FSAL_DELAY, 0);
                }
..
...


bool mdcache_lru_fds_available(void) 
{
        if (atomic_fetch_size_t(&open_fd_count) >= lru_state.fds_hard_limit) {
                LogAtLevel(COMPONENT_CACHE_INODE_LRU,
                           atomic_fetch_uint32_t(&lru_state.fd_state)
                                                                != FD_LIMIT
                                ? NIV_CRIT
                                : NIV_DEBUG,
                           "FD Hard Limit Exceeded, waking LRU thread.");
                atomic_store_uint32_t(&lru_state.fd_state, FD_LIMIT);
                fridgethr_wake(lru_fridge);
                return false;
        }

...
...

We are not type-casting open_fd_count to integer value here and since the value is negative, atomic_fetch_size_t returns a large integer resulting in mdcache failing the fop with EDELAY error.

Dan/Frank,
Can we map open_fd_count to ssize_t here (like done in 'fsal_close') before comparing it with fds_hard_limit?

Comment 9 Daniel Gryniewicz 2019-05-24 12:13:10 UTC
Since the hang is a direct result of the open count going negative, we should fix that instead.

Comment 10 Soumya Koduri 2019-05-24 13:07:26 UTC
(In reply to Daniel Gryniewicz from comment #9)
> Since the hang is a direct result of the open count going negative, we
> should fix that instead.

Agree.. my question was if we need to fix this as well in addition to open count. Or may be add an assert whenever the count becomes < 0 ?

Comment 11 Daniel Gryniewicz 2019-05-24 13:37:22 UTC
I don't think we should just keep running.  I don't have a strong opinion whether we fail with a crit log (like now) or assert instead.  Failing with a log has the advantage that in-progress OPs can complete, so outstanding writes, for example, will continue.  However, it has the side effect of looking like a hang, and requires the admin to look at the logs to know what's wrong.  Asserting indicates an immediate problem, but also could cause corruption due to aborting outstanding OPs.

Comment 12 Soumya Koduri 2019-05-24 13:58:50 UTC
alright! thanks!

Comment 15 Manisha Saini 2019-06-13 08:00:41 UTC
Verified this with

# rpm -qa | grep ganesha
nfs-ganesha-gluster-2.7.3-4.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.7.3-4.el7rhgs.x86_64
glusterfs-ganesha-6.0-5.el7rhgs.x86_64
nfs-ganesha-2.7.3-4.el7rhgs.x86_64


# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.7 Beta (Maipo)

Ran posix compliance test on v3 mount.No hug and "open_fd_count" going into negative value is observed..


Test Summary Report
-------------------
/home/ntfs-3g-pjd-fstest/tests/chmod/00.t    (Wstat: 0 Tests: 106 Failed: 2)
  Failed tests:  31, 39
/home/ntfs-3g-pjd-fstest/tests/chown/00.t    (Wstat: 0 Tests: 171 Failed: 1)
  Failed test:  97
/home/ntfs-3g-pjd-fstest/tests/ftruncate/00.t (Wstat: 0 Tests: 26 Failed: 1)
  Failed test:  24
/home/ntfs-3g-pjd-fstest/tests/link/00.t     (Wstat: 0 Tests: 82 Failed: 1)
  Failed test:  56
/home/ntfs-3g-pjd-fstest/tests/mknod/03.t    (Wstat: 0 Tests: 12 Failed: 9)
  Failed tests:  1-3, 5-7, 9-11
/home/ntfs-3g-pjd-fstest/tests/truncate/00.t (Wstat: 0 Tests: 21 Failed: 1)
  Failed test:  15
/home/ntfs-3g-pjd-fstest/tests/unlink/00.t   (Wstat: 0 Tests: 55 Failed: 1)
  Failed test:  53
Files=210, Tests=2328, 84 wallclock secs ( 0.58 usr  0.18 sys +  9.84 cusr 12.36 csys = 22.96 CPU)
Result: FAIL

Comment 17 errata-xmlrpc 2019-10-30 12:15:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3252