Bug 784182

Summary: [54b8d503dd23e72ed3076988c52e689f3554ebc8]: glusterfs server crashed in posix_opendir
Product: [Community] GlusterFS Reporter: Raghavendra Bhat <rabhat>
Component: posixAssignee: Kaushal <kaushal>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: amarts, gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-03-12 12:32:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Raghavendra Bhat 2012-01-24 06:41:26 UTC
Description of problem:
2 replica pure replicate volume. Enable quota and set a limit on the root of the volume. Started running ping_pong on the fuse client. Gave replace brick. The source brick crashed with the following backtrace.

Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id mirror.hyperspace.mnt-sda7'.
Program terminated with signal 11, Segmentation fault.
#0  __opendir (name=0x0) at ../sysdeps/unix/opendir.c:86
86	../sysdeps/unix/opendir.c: No such file or directory.
	in ../sysdeps/unix/opendir.c
(gdb) bt
#0  __opendir (name=0x0) at ../sysdeps/unix/opendir.c:86
#1  0x00007fc0a7365599 in posix_opendir (frame=0x7fc0a97ef85c, this=0x213a020, loc=0x7fc0a94c2084, fd=0x7fc0a464704c)
    at ../../../../../xlators/storage/posix/src/posix.c:568
#2  0x00007fc0a7152144 in posix_acl_opendir (frame=0x7fc0a97ef7b0, this=0x213b560, loc=0x7fc0a94c2084, fd=0x7fc0a464704c)
    at ../../../../../xlators/system/posix-acl/src/posix-acl.c:1067
#3  0x00007fc0a6f394bf in pl_opendir (frame=0x7fc0a97ef704, this=0x213c760, loc=0x7fc0a94c2084, fd=0x7fc0a464704c)
    at ../../../../../xlators/features/locks/src/posix.c:388
#4  0x00007fc0a6d247aa in iot_opendir_wrapper (frame=0x7fc0a97ef5ac, this=0x213d900, loc=0x7fc0a94c2084, fd=0x7fc0a464704c)
    at ../../../../../xlators/performance/io-threads/src/io-threads.c:1468
#5  0x00007fc0ab2271b1 in call_resume_wind (stub=0x7fc0a94c204c) at ../../../libglusterfs/src/call-stub.c:2306
#6  0x00007fc0ab22e278 in call_resume (stub=0x7fc0a94c204c) at ../../../libglusterfs/src/call-stub.c:3853
#7  0x00007fc0a6d1b7c0 in iot_worker (data=0x214f6e0) at ../../../../../xlators/performance/io-threads/src/io-threads.c:138
#8  0x00007fc0aa99bd8c in start_thread (arg=0x7fc0a544f700) at pthread_create.c:304
#9  0x00007fc0aa6e704d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#10 0x0000000000000000 in ?? ()
(gdb) f 1
#1  0x00007fc0a7365599 in posix_opendir (frame=0x7fc0a97ef85c, this=0x213a020, loc=0x7fc0a94c2084, fd=0x7fc0a464704c)
    at ../../../../../xlators/storage/posix/src/posix.c:568
warning: Source file is more recent than executable.
568	        if (!real_path)
(gdb) l
563	        VALIDATE_OR_GOTO (fd, out);
564	
565	        SET_FS_ID (frame->root->uid, frame->root->gid);
566	        MAKE_INODE_HANDLE (real_path, this, loc, NULL);
567
568	        dir = opendir (real_path);
569	
(gdb) 

In MAKE_INODE_HANDLE we check if the gfid of loc is NULL. If its NULL we are breaking from there without constructing the real_path which will still be NULL when opendir is called, thus leading to the crash.


#define MAKE_INODE_HANDLE(rpath, this, loc, iatt_p) do {                \
        if (uuid_is_null (loc->gfid)) {                                 \
                gf_log (this->name, GF_LOG_ERROR,                       \
			"null gfid for path %s", loc->path);            \
                break;                                                  \
	}             

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

log message saying the gfid of loc is NULL,
[2012-01-24 11:56:23.300618] E [name.c:151:client_fill_address_family] 0-mirror-replace-brick: transport.address-family not specified and not 
able to determine the same from other options (remote-host:(null) and transport.unix.connect-path:(null))
[2012-01-24 11:56:25.231941] I [server-handshake.c:540:server_setvolume] 0-mirror-server: accepted client from 127.0.0.1:1016 (version: 3git)
[2012-01-24 11:56:25.239802] I [afr-common.c:1839:afr_set_root_inode_on_first_lookup] 0-mirror-pump: added root inode
[2012-01-24 11:56:25.240475] E [posix.c:2380:posix_getxattr] 0-mirror-posix: null gfid for path /
[2012-01-24 11:56:25.240528] I [client.c:1159:client_setxattr] 0-mirror-replace-brick: client rpc init command
[2012-01-24 11:56:25.240556] I [mem-pool.c:573:mem_pool_destroy] 0-mirror-replace-brick: size=588 max=0 total=0
[2012-01-24 11:56:25.240725] I [mem-pool.c:573:mem_pool_destroy] 0-mirror-replace-brick: size=124 max=0 total=0
[2012-01-24 11:56:25.242628] I [client.c:1937:notify] 0-mirror-replace-brick: parent translators are ready, attempting connect on transport
[2012-01-24 11:56:25.243091] I [pump.c:1598:pump_command_reply] 0-mirror-pump: Command succeeded
[2012-01-24 11:56:25.243270] I [client-handshake.c:1085:select_server_supported_programs] 0-mirror-replace-brick: Using Program GlusterFS 3git
, Num (1298437), Version (310)
[2012-01-24 11:56:25.246011] I [client-handshake.c:917:client_setvolume_cbk] 0-mirror-replace-brick: Connected to 127.0.0.1:24012, attached to
 remote volume '/mnt/sda8/export4'.
[2012-01-24 11:56:25.246042] I [afr-common.c:3473:afr_notify] 0-mirror-pump: subvol 1 came up, start crawl
[2012-01-24 11:56:25.246601] E [posix.c:119:posix_lookup] 0-mirror-posix: null gfid for path /
[2012-01-24 11:56:25.246655] E [posix.c:133:posix_lookup] 0-mirror-posix: lstat on (null) failed: Invalid argument
[2012-01-24 11:56:25.247107] E [posix.c:566:posix_opendir] 0-mirror-posix: null gfid for path /

Comment 1 Kaushal 2012-02-28 05:13:29 UTC
Doesn't seem to be happening anymore on master. Can you confirm?

Comment 2 Amar Tumballi 2012-03-12 09:46:39 UTC
please update these bugs w.r.to 3.3.0qa27, need to work on it as per target milestone set.

Comment 3 Kaushal 2012-03-12 12:32:31 UTC

*** This bug has been marked as a duplicate of bug 784176 ***