Bug 829172

Summary: reopen_fd_count is becoming -ve because of stale fdctx
Product: [Community] GlusterFS Reporter: Pranith Kumar K <pkarampu>
Component: protocolAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: pre-releaseCC: bugs, gluster-bugs, vagarwal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-22 15:40:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pranith Kumar K 2012-06-06 06:59:00 UTC
Description of problem:
Reopen of a fd is performed after a brick comes online. If the reopen is for an fd which is already marked as released and the reopen fails  decrementing of reopen_fd_count is done but the fdctx corresponding to the released fd is still added to the saved_fds list. That stale fdctx remains there for ever leading to -ve count of reopen_fd_count on subsequent 'CHILD_DOWN' then 'CHILD_UP'.

Please see the following gdb logs:

Here the reopen_fd_count becomes zero.
(gdb) s
decrement_reopen_fd_count (this=0x22768f0, conf=0x22beb70) at client-lk.c:591
591	        uint64_t fd_count = 0;
(gdb) n
593	        LOCK (&conf->rec_lock);
(gdb) 
595	                fd_count = --(conf->reopen_fd_count);
(gdb) 
597	        UNLOCK (&conf->rec_lock);
(gdb) 
599	        if (fd_count == 0) {
(gdb) 
600	                gf_log (this->name, GF_LOG_INFO,
(gdb) 
602	                client_set_lk_version (this);
(gdb) 
603	                client_notify_parents_child_up (this);
(gdb) 
606	        return fd_count;
(gdb) 
607	}

Breakpoint 10, clnt_release_reopen_fd_cbk (req=0x7f5f5f16733c, iov=0x7f5f5f16737c, count=1, myframe=0x7f5f677a4894)
    at client-handshake.c:595
595	        xlator_t       *this   = NULL;
(gdb) n
596	        call_frame_t   *frame  = NULL;
(gdb) 
597	        clnt_conf_t    *conf   = NULL;
(gdb) 
598	        clnt_fd_ctx_t  *fdctx  = NULL;
(gdb) 
600	        frame  = myframe;
(gdb) 
601	        this   = frame->this;
(gdb) 
602	        fdctx  = (clnt_fd_ctx_t *) frame->local;
(gdb) 
603	        conf   = (clnt_conf_t *) this->private;
(gdb) 
605	        clnt_fd_lk_reacquire_failed (this, fdctx, conf);
(gdb) 
607	        decrement_reopen_fd_count (this, conf);
(gdb) s
decrement_reopen_fd_count (this=0x22768f0, conf=0x22beb70) at client-lk.c:591
591	        uint64_t fd_count = 0;
(gdb) n
593	        LOCK (&conf->rec_lock);
(gdb) 
595	                fd_count = --(conf->reopen_fd_count);
(gdb) 
597	        UNLOCK (&conf->rec_lock);
(gdb) p fd_count
$18 = 18446744073709551615
(gdb) p conf->reopen_fd_count 
$19 = 18446744073709551615


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Junaid 2012-06-06 07:19:17 UTC
The check for this is there when we enable lock-self healing. The above condition was not checked because lock self healing was expected to be always on. Since now its optional we must handle this case.

Comment 3 Kaleb KEITHLEY 2015-10-22 15:40:20 UTC
pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.