Bug 763691 (GLUSTER-1959)

Summary: Inode ref NULL and segfault in protocol/client
Product: [Community] GlusterFS Reporter: Harshavardhana <fharshav>
Component: nfsAssignee: Junaid <junaid>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: nfs-betaCC: amarts, cww, gluster-bugs, shehjart, vagarwal, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: nfs
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Harshavardhana 2010-10-14 17:24:59 EDT
#0  0x0000003ff7c08dc1 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x000000390d622e0c in inode_unref (inode=0x7f0bd4c248c0) at inode.c:393
#2  0x000000390d6143d2 in loc_wipe (loc=0x129597b0) at xlator.c:986
#3  0x00007f0c2b1aab9e in client_local_wipe (local=0x129597b0) at client-protocol.c:167
#4  0x00007f0c2b1bde88 in client_lookup_cbk (frame=0x12959730, hdr=<value optimized out>, hdrlen=<value optimized out>, iobuf=<value optimized out>)
    at client-protocol.c:4760
#5  0x00007f0c2b1aa70a in protocol_client_pollin (this=0x2172440, trans=0x220b190) at client-protocol.c:6435
#6  0x00007f0c2b1b0d28 in notify (this=0x0, event=<value optimized out>, data=0x220b190) at client-protocol.c:6554
#7  0x000000390d614903 in xlator_notify (xl=0x2172440, event=2, data=0x220b190) at xlator.c:919
#8  0x00007f0c29d311d8 in socket_event_handler (fd=<value optimized out>, idx=185, data=0x220b190, poll_in=1, poll_out=0, poll_err=<value optimized out>)
    at socket.c:831
#9  0x000000390d6304fd in event_dispatch_epoll_handler (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>)
    at event.c:804
#10 event_dispatch_epoll (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:867
#11 0x0000000000404272 in main (argc=<value optimized out>, argv=<value optimized out>) at glusterfsd.c:1494


--------------

(gdb) fr 1
#1  0x000000390d622e0c in inode_unref (inode=0x7f0bd4c248c0) at inode.c:393
393             pthread_mutex_lock (&table->lock);
(gdb) p *inode
$2 = {table = 0x7f0bd414df00, lock = -725601888, nlookup = 139688790870016, generation = 139688790870400, in_attic = 0, ref = 0, ino = 35206400,
  ia_type = 720942560, fd_list = {next = 0x100000000, prev = 0x7f0bd4c248c0}, dentry_list = {next = 0x0, prev = 0x0}, hash = {next = 0x0, prev = 0x0},
  list = {next = 0x0, prev = 0x0}, _ctx = 0x45}
(gdb) fr 2
#2  0x000000390d6143d2 in loc_wipe (loc=0x129597b0) at xlator.c:986
986                     inode_unref (loc->inode);
(gdb) p *loc
$3 = {path = 0xe8d7a80 "/client14_3", name = 0xe8d7a81 "client14_3", ino = 0, inode = 0x7f0bd4c248c0, parent = 0x21c8510}
(gdb)
Comment 1 Shehjar Tikoo 2010-10-18 23:58:57 EDT
Are we continuing to support nfs-beta? I thought the decision was to the contrary considering the manual rebaseing that is required for almost every patch that needs to be backported to nfs-beta.
Comment 2 Harshavardhana 2010-10-25 19:09:59 EDT
(In reply to comment #1)
> Are we continuing to support nfs-beta? I thought the decision was to the
> contrary considering the manual rebaseing that is required for almost every
> patch that needs to be backported to nfs-beta.

I will try to reproduce this with 3.1 since we will be continuing to test 3.1 with ucarp and distribute setup. 

I want to keep this open until i am not able to reproduce it with 3.1, if it doesn't happen after Nov 15th then you can happily close this.
Comment 3 Junaid 2010-10-26 03:29:06 EDT
Hi Harsha,

Can you please tell me the steps to reproduce this bug.
Comment 4 Harshavardhana 2010-10-26 08:35:40 EDT
(In reply to comment #3)
> Hi Harsha,
> 
> Can you please tell me the steps to reproduce this bug.

You need a SAN based hardware with multiple LUNS exported. I reproduced this issue with 96LUNS and 8 nodes each sharing 12 unique LUNS.
Comment 5 Harshavardhana 2010-10-26 10:43:38 EDT
> You need a SAN based hardware with multiple LUNS exported. I reproduced this
> issue with 96LUNS and 8 nodes each sharing 12 unique LUNS.

Also a reminder that this can be reproduced with a corrupted backend, try to mess around with ext3 or ext4 volume and make it go corrupt. During lookup you should be able to see this, its a case of proper checks inside our code to not fail even with corrupted backend and give use a proper error message on the client side. 

Case here is loc->inode->lock going negative, which should not happen.
Comment 6 Amar Tumballi 2011-01-20 01:24:28 EST
Not seen anymore, and we are not going to fix anything with NFS-Beta version