Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 763351 (GLUSTER-1619)

Summary:

glusterfs nfs server crashed on dht+replica(2x2)

Product:

[Community] GlusterFS

Reporter:

shishir gowda <sgowda>

Component:

nfs

Assignee:

Shehjar Tikoo <shehjart>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Severity:

high

Docs Contact:

Priority:

low

Version:

mainline

CC:

amarts, gluster-bugs, nsathyan, vijay

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

Type:

---

Regression:

RTP

Mount Type:

nfs

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
tcpdump showing the second rdirplus request	none

Description shishir gowda 2010-09-16 07:13:32 UTC

stopped rm -rf half way
ls hung..
saw that glusterfs runnin nfs vol crashed.
Pasting the bt

Have added patch 1608 patches

Program terminated with signal 11, Segmentation fault.
#0  0x00007fd71514b580 in pthread_spin_lock () from /lib/libpthread.so.0
(gdb) bt
#0  0x00007fd71514b580 in pthread_spin_lock () from /lib/libpthread.so.0
#1  0x00007fd7159d4eda in fd_unref (fd=0x7fd70f4e6098) at fd.c:467
#2  0x00007fd71342ec88 in afr_local_cleanup (local=0x7fd7040bf468, this=0x10d2ff8) at afr-common.c:367
#3  0x00007fd7133f803f in afr_fstat_cbk (frame=0x7fd71434af30, cookie=0x0, this=0x10d2ff8, op_ret=0, op_errno=0, buf=0x7fff7b4ec1c0)
    at afr-inode-read.c:351
#4  0x00007fd7136615f4 in client3_1_fstat_cbk (req=0x7fd70e3c894c, iov=0x7fd70e3c898c, count=1, myframe=0x7fd71434bc14)
    at client3_1-fops.c:1006
#5  0x00007fd71578d197 in rpc_clnt_handle_reply (clnt=0x10e6598, pollin=0x7fd7040614c8) at rpc-clnt.c:737
#6  0x00007fd71578d4fa in rpc_clnt_notify (trans=0x10e67b8, mydata=0x10e65c8, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7fd7040614c8)
    at rpc-clnt.c:850
#7  0x00007fd71578aa96 in rpc_transport_notify (this=0x10e67b8, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7fd7040614c8)
    at rpc-transport.c:1134
#8  0x00007fd70f022b4b in socket_event_poll_in (this=0x10e67b8) at socket.c:1579
#9  0x00007fd70f022efe in socket_event_handler (fd=6, idx=0, data=0x10e67b8, poll_in=1, poll_out=0, poll_err=0) at socket.c:1693
#10 0x00007fd7159d7fab in event_dispatch_epoll_handler (event_pool=0x10c8b18, events=0x10e9308, i=0) at event.c:812
#11 0x00007fd7159d81bb in event_dispatch_epoll (event_pool=0x10c8b18) at event.c:876
#12 0x00007fd7159d8523 in event_dispatch (event_pool=0x10c8b18) at event.c:984
#13 0x00000000004063b0 in main (argc=5, argv=0x7fff7b4ec768) at glusterfsd.c:1391

Comment 1 Amar Tumballi 2010-09-16 10:49:13 UTC

[fd.c:539:fd_create] (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(__nfs3_dir_open_and_resume+0x84) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_opendir+0x6d) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_inode_opendir+0x71) (created)

[fd.c:362:_fd_ref] (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_opendir+0x6d) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_inode_opendir+0x71) (-->/usr/local/lib/libglusterfs.so.0(fd_create+0x1ce) (1)

[fd.c:362:_fd_ref] (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_inode_opendir_cbk+0xae) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3_dir_open_cbk+0x92) (-->/usr/local/lib/libglusterfs.so.0(fd_ref+0x97) (2)

[fd.c:474:fd_unref] (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_fop_fstat_cbk+0xbb) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3svc_readdir_fstat_cbk+0x2ef) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3_call_state_wipe+0xc0) (1)

[fd.c:362:_fd_ref] (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3_readdir_open_resume+0x74) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3_dir_open_and_resume+0xae) (-->/usr/local/lib/libglusterfs.so.0(fd_lookup+0xb1) (2)

[fd.c:474:fd_unref] (-->/usr/local/lib/libglusterfs.so.0(+0x23206) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_fop_fstat_cbk+0xbb) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3svc_readdir_fstat_cbk+0x285) (1)

[fd.c:362:_fd_ref] (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3_readdir_open_resume+0x74) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3_dir_open_and_resume+0xae) (-->/usr/local/lib/libglusterfs.so.0(fd_lookup+0xb1) (2)

[fd.c:474:fd_unref] (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_fop_fstat_cbk+0xbb) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3svc_readdir_fstat_cbk+0x2ef) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3_call_state_wipe+0xc0) (1)

[fd.c:474:fd_unref] (-->/usr/local/lib/libglusterfs.so.0(+0x23206) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_fop_fstat_cbk+0xbb) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3svc_readdir_fstat_cbk+0x285) (0)

[fd.c:480:fd_unref] (-->/usr/local/lib/libglusterfs.so.0(+0x23206) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_fop_fstat_cbk+0xbb) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3svc_readdir_fstat_cbk+0x285) (destroyed)

after this destroy, call_state_wipe(), is called again

#1  0x00007f6f7f577fc8 in fd_unref (fd=0x7f6f792f503c) at ../../../libglusterfs/src/fd.c:469
#2  0x00007f6f7c30ef75 in nfs3_call_state_wipe (cs=0x7f6f7802003c) at ../../../../../xlators/nfs/server/src/nfs3.c:361
#3  0x00007f6f7c31abdd in nfs3svc_readdir_fstat_cbk (frame=0x7f6f7dcc4400, cookie=0x10078b8, this=0x1008ad8, op_ret=0,

Comment 2 Amar Tumballi 2010-09-16 11:05:45 UTC

checked with only afr.. and this is the behavior:

[fd.c:539:fd_create] (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(__nfs3_dir_open_and_resume+0x84) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_opendir+0x6d) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_inode_opendir+0x71) (created)

[fd.c:362:_fd_ref] (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_opendir+0x6d) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_inode_opendir+0x71) (-->/usr/local/lib/libglusterfs.so.0(fd_create+0x1ce) (1)

[fd.c:362:_fd_ref] (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_inode_opendir_cbk+0xae) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3_dir_open_cbk+0x92) (-->/usr/local/lib/libglusterfs.so.0(fd_ref+0x97) (2)

[fd.c:474:fd_unref] (-->/usr/local/lib/libglusterfs.so.0(+0x23206) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_fop_fstat_cbk+0xbb) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3svc_readdir_fstat_cbk+0x285) (1)

[fd.c:474:fd_unref] (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_fop_fstat_cbk+0xbb) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3svc_readdir_fstat_cbk+0x2ef) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3_call_state_wipe+0xc0) (0)

[fd.c:474:fd_unref] (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_fop_fstat_cbk+0xbb) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3svc_readdir_fstat_cbk+0x2ef) (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3_call_state_wipe+0xc0) (destroyed)

this path doesn't crash..

Comment 3 Amar Tumballi 2010-09-16 11:30:37 UTC

how is EOF decided for readdir in nfs??

with the below patch, the extra fd_unref() issue is not happening.

--- a/xlators/nfs/server/src/nfs3.c
+++ b/xlators/nfs/server/src/nfs3.c
@@ -3904,7 +3904,7 @@ nfs3err: (nfs3svc_readdir_fstat_cbk)
 
         if (is_eof) {
                 gf_log (GF_NFS3, GF_LOG_TRACE, "EOF REF: %d", cs->fd->refcount);
-                fd_unref (cs->fd);
+                //fd_unref (cs->fd);
         }
 
         gf_log (GF_NFS3, GF_LOG_TRACE, "CS WIPE REF: %d", cs->fd->refcount);



I suspect this is shown only over DHT because of the way dht is doing readdirs.. need to confirm..

Comment 4 Shehjar Tikoo 2010-09-17 05:35:33 UTC

(In reply to comment #3)
> how is EOF decided for readdir in nfs??
> 
> with the below patch, the extra fd_unref() issue is not happening.
> 
> --- a/xlators/nfs/server/src/nfs3.c
> +++ b/xlators/nfs/server/src/nfs3.c
> @@ -3904,7 +3904,7 @@ nfs3err: (nfs3svc_readdir_fstat_cbk)
> 
>          if (is_eof) {
>                  gf_log (GF_NFS3, GF_LOG_TRACE, "EOF REF: %d",
> cs->fd->refcount);
> -                fd_unref (cs->fd);
> +                //fd_unref (cs->fd);
>          }
> 
>          gf_log (GF_NFS3, GF_LOG_TRACE, "CS WIPE REF: %d", cs->fd->refcount);
> 
> 
> 
> I suspect this is shown only over DHT because of the way dht is doing
> readdirs.. need to confirm..

The eof is propagated upwards from posix_do_readdir where if you see just before stack unwind:

        if ((!readdir (dir) && (errno == 0)))
                op_errno = ENOENT;

This is a hack to let gnfs tell nfs clients that end of dir was reached so they dont send any further requests. The hack depends on op_ret > 0 (if dirents were returned) AND operrno = ENOENT(set in posix and handled accordingly, in dht_do_readdir i think.)

I can see why this may be resulting in an extra unref in case I get a spurious ENOENT, besides, I see the whole fd handling for reading directories a bit unclean in nfs. Looking into it.

Comment 5 Shehjar Tikoo 2010-09-17 06:23:46 UTC

(In reply to comment #1)
> [fd.c:539:fd_create]
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(__nfs3_dir_open_and_resume+0x84)
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_opendir+0x6d)
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_inode_opendir+0x71)
> (created)
> 
> [fd.c:362:_fd_ref]
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_opendir+0x6d)
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_inode_opendir+0x71)
> (-->/usr/local/lib/libglusterfs.so.0(fd_create+0x1ce) (1)
> 

FD is created on first readdir request.


> [fd.c:362:_fd_ref]
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_inode_opendir_cbk+0xae)
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3_dir_open_cbk+0x92)
> (-->/usr/local/lib/libglusterfs.so.0(fd_ref+0x97) (2)
> 

fd is refed to keep it around till end of directory is reached.


> [fd.c:474:fd_unref]
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_fop_fstat_cbk+0xbb)
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3svc_readdir_fstat_cbk+0x2ef)
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3_call_state_wipe+0xc0)
> (1)
> 

The first readdir request has been served so the ref for this call in nfs3_call_state is unrefed. The ref done to keep the fd around till EOD is still there.


> [fd.c:362:_fd_ref]
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3_readdir_open_resume+0x74)
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3_dir_open_and_resume+0xae)
> (-->/usr/local/lib/libglusterfs.so.0(fd_lookup+0xb1) (2)
> 

Second readdir request comes in. fd is found in fd-cache and used. Only one ref is increased to signify that the cached fd is in use by the current readdir call.


> [fd.c:474:fd_unref] (-->/usr/local/lib/libglusterfs.so.0(+0x23206)
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_fop_fstat_cbk+0xbb)
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3svc_readdir_fstat_cbk+0x285)
> (1)
> 

The second readdir call ends, but notice that the unref is not in nfs3_call_state_wipe here but in 
nfs3svc_readdir_fstat_cbk, which means the if block:

 if (is_eof) {                                                           
                gf_log (GF_NFS3, GF_LOG_TRACE, "EOF REF: %d", cs->fd->refcount);
                fd_unref (cs->fd);                                              
        }   

That means end-of-dir was reached with this readdir call. Mark this point (a).


> [fd.c:362:_fd_ref]
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3_readdir_open_resume+0x74)
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3_dir_open_and_resume+0xae)
> (-->/usr/local/lib/libglusterfs.so.0(fd_lookup+0xb1) (2)

Heres the catch, before the unref is called in nfs3_call_state_wipe, which is right after the unref above, a new readdir call comes in, resulting in the fd being retrieved from the cache using fd_lookup.
This may have happened in another thread whereas the current thread is where the callback is being processed.

> 
> [fd.c:474:fd_unref]
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_fop_fstat_cbk+0xbb)
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3svc_readdir_fstat_cbk+0x2ef)
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3_call_state_wipe+0xc0)
> (1)
> 

The fd is unrefed to signify the unref that is supposed to happen after the fd_unref for a end-of-dir readdir in point (a).


> [fd.c:474:fd_unref] (-->/usr/local/lib/libglusterfs.so.0(+0x23206)
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_fop_fstat_cbk+0xbb)
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3svc_readdir_fstat_cbk+0x285)
> (0)
> 


Here another fd unref is performed when the end of dir is reached, same code block as in point (a).
The problem could be that a readdirplus request is being retransmitted with the last offset resulting in a second EOD notification from storage/posix. 


> [fd.c:480:fd_unref] (-->/usr/local/lib/libglusterfs.so.0(+0x23206)
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_fop_fstat_cbk+0xbb)
> (-->/usr/local/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3svc_readdir_fstat_cbk+0x285)
> (destroyed)
> 


> after this destroy, call_state_wipe(), is called again
> 
> #1  0x00007f6f7f577fc8 in fd_unref (fd=0x7f6f792f503c) at
> ../../../libglusterfs/src/fd.c:469
> #2  0x00007f6f7c30ef75 in nfs3_call_state_wipe (cs=0x7f6f7802003c) at
> ../../../../../xlators/nfs/server/src/nfs3.c:361
> #3  0x00007f6f7c31abdd in nfs3svc_readdir_fstat_cbk (frame=0x7f6f7dcc4400,
> cookie=0x10078b8, this=0x1008ad8, op_ret=0,

Comment 6 Shehjar Tikoo 2010-09-17 08:50:27 UTC

As summarised in the previous comment, there are in fact two rdirplus requests coming in and both result in an EOF notification forcing gnfs to unref one extra time for the second rdirp request. 

This behaviour is shown in the attached tcpdump. View through wireshark. Behaviour seen on kernel 2.6.32.24. I believe it to be a result of a change in kernel nfs client which sends a rdirplus request once more even after is_eof is set to one in a previous rdirplus reply.

Comment 7 Shehjar Tikoo 2010-09-17 08:51:04 UTC

Created attachment 310

Comment 8 Vijay Bellur 2010-09-17 10:24:04 UTC

PATCH: http://patches.gluster.com/patch/4841 in master (nfs3: Unref & unbind dir fd with inode lock on EOF)

Comment 9 Shehjar Tikoo 2010-09-27 03:24:02 UTC

Re-opening because of crash reported by Harsha due to a regression from the fd_unref_unbind patch.

Comment 10 Shehjar Tikoo 2010-09-28 07:06:01 UTC

Closing. The crash seen by Harsha was due to different reasons, as shown in bz 1717.