Bug 791087

Summary: [glusterfs-3.3.0qa22]: glusterfs server crashed due to assert in marker_get_xattr since gfid was NULL
Product: [Community] GlusterFS Reporter: Raghavendra Bhat <rabhat>
Component: quotaAssignee: Raghavendra Bhat <rabhat>
Status: CLOSED WORKSFORME QA Contact: Raghavendra Bhat <rabhat>
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: gluster-bugs, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-20 06:26:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Raghavendra Bhat 2012-02-16 07:05:36 UTC
Description of problem:
Created a replicate volume with replica count 2. 1 fuse mount and 1 nfs mount. Started running sanity script on fuse mount and running other tools on nfs mount. While tests were running added 2 more bricks (thus making the setup 2x2 distributed replicate). Rebalanced the volume. Enabled quota and profiling. Was bringing some bricks down and then up to trigger self-heal. Also volume set operations. 

glusterfs server on one of the peers crashed due to assert since loc->gfid was NULL. This is the backtrace.

Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id mirror.10.1.11.144.export-'.
Program terminated with signal 6, Aborted.
#0  0x000000390f432905 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64
(gdb) bt
#0  0x000000390f432905 in raise () from /lib64/libc.so.6
#1  0x000000390f4340e5 in abort () from /lib64/libc.so.6
#2  0x000000390f42b9be in __assert_fail_base () from /lib64/libc.so.6
#3  0x000000390f42ba80 in __assert_fail () from /lib64/libc.so.6
#4  0x00007fad979badc4 in mq_get_xattr (frame=0x7fad9f95ed70, cookie=0x7fada05c7a60, this=0x18abfd0, op_ret=0, op_errno=0)
    at ../../../../../xlators/features/marker/src/marker-quota.c:1165
#5  0x00007fad97bd9bab in iot_inodelk_cbk (frame=0x7fada05c7a60, cookie=0x7fada053e0a4, this=0x18aad70, op_ret=0, op_errno=0)
    at ../../../../../xlators/performance/io-threads/src/io-threads.c:1978
#6  0x00007fad97df73dc in pl_common_inodelk (frame=0x7fada053e0a4, this=0x18a9b60, volume=0x7fad5ce858b0 "mirror-marker", inode=0x448f50c, 
    cmd=7, flock=0x7fad9f37c334, loc=0x7fad9f37c2ec, fd=0x0) at ../../../../../xlators/features/locks/src/inodelk.c:653
#7  0x00007fad97df745a in pl_inodelk (frame=0x7fada053e0a4, this=0x18a9b60, volume=0x7fad5ce858b0 "mirror-marker", loc=0x7fad9f37c2ec, 
    cmd=7, flock=0x7fad9f37c334) at ../../../../../xlators/features/locks/src/inodelk.c:663
#8  0x00007fad97bd9e10 in iot_inodelk_wrapper (frame=0x7fada05c7a60, this=0x18aad70, volume=0x7fad5ce858b0 "mirror-marker", 
    loc=0x7fad9f37c2ec, cmd=7, lock=0x7fad9f37c334) at ../../../../../xlators/performance/io-threads/src/io-threads.c:1987
#9  0x00007fada18e3fff in call_resume_wind (stub=0x7fad9f37c2ac) at ../../../libglusterfs/src/call-stub.c:2419
#10 0x00007fada18eb31c in call_resume (stub=0x7fad9f37c2ac) at ../../../libglusterfs/src/call-stub.c:3938
#11 0x00007fad97bcc8cd in iot_worker (data=0x18b5a00) at ../../../../../xlators/performance/io-threads/src/io-threads.c:138
#12 0x000000390fc077e1 in start_thread () from /lib64/libpthread.so.0
#13 0x000000390f4e577d in clone () from /lib64/libc.so.6
(gdb) f 4
#4  0x00007fad979badc4 in mq_get_xattr (frame=0x7fad9f95ed70, cookie=0x7fada05c7a60, this=0x18abfd0, op_ret=0, op_errno=0)
    at ../../../../../xlators/features/marker/src/marker-quota.c:1165
1165            GF_UUID_ASSERT (local->loc.gfid);
(gdb) p local->loc
$1 = {path = 0x7fad5ce85850 "/playground/linux-2.6.31.1/include/linux", name = 0x7fad5ce85873 "linux", inode = 0x448f50c, 
  parent = 0x448da0c, gfid = '\000' <repeats 15 times>, pargfid = "t\347\n\265n\303H\270\270 \364\347߸~@"}
(gdb) p *local->loc.inode
$2 = {table = 0x18c23a0, gfid = '\000' <repeats 15 times>, lock = 1, nlookup = 0, ref = 2, ia_type = IA_INVAL, fd_list = {next = 0x448f53c, 
    prev = 0x448f53c}, dentry_list = {next = 0x448f54c, prev = 0x448f54c}, hash = {next = 0x448f55c, prev = 0x448f55c}, list = {
    next = 0x448da6c, prev = 0x448e38c}, _ctx = 0x448f5c0}
(gdb)  l
1160            }
1161
1162            if (uuid_is_null (local->loc.gfid))
1163                    uuid_copy (local->loc.gfid, local->loc.inode->gfid);
1164
1165            GF_UUID_ASSERT (local->loc.gfid);
1166
1167            STACK_WIND (frame, mq_check_n_set_inode_xattr, FIRST_CHILD(this),
1168                        FIRST_CHILD(this)->fops->lookup, &local->loc, xattr_req);
1169
(gdb) 

gfid of the inode itself is NULL.





Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:

glusterfs server crashed.
Expected results:

glusterfs server should not crash (means loc->gfid fild should not be null)

Additional info:
gluster volume info
 
Volume Name: mirror
Type: Distributed-Replicate
Volume ID: 770f045a-6e32-44e6-be2b-a9c9fb827dcc
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.1.11.130:/export-xfs/mirror
Brick2: 10.1.11.131:/export-xfs/mirror
Brick3: 10.1.11.144:/export-xfs/mirror
Brick4: 10.1.11.145:/export-xfs/mirror
Options Reconfigured:
geo-replication.indexing: on
performance.io-cache: off
performance.client-io-threads: on
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
features.limit-usage: /playground:33GB
features.quota: on
performance.write-behind: off

[2012-02-15 08:49:32.335513] I [server3_1-fops.c:318:server_entrylk_cbk] 0-mirror-server: 1100578: ENTRYLK (null) (--) ==> -1 (No such file or
 directory)
[2012-02-15 08:49:32.336154] I [server3_1-fops.c:318:server_entrylk_cbk] 0-mirror-server: 1100580: ENTRYLK (null) (--) ==> -1 (No such file or
 directory)
[2012-02-15 08:49:32.340796] I [server3_1-fops.c:318:server_entrylk_cbk] 0-mirror-server: 1100587: ENTRYLK (null) (--) ==> -1 (No such file or
 directory)
[2012-02-15 08:49:32.341161] I [server3_1-fops.c:318:server_entrylk_cbk] 0-mirror-server: 1100588: ENTRYLK (null) (--) ==> -1 (No such file or
 directory)
[2012-02-15 08:49:32.507749] I [server3_1-fops.c:318:server_entrylk_cbk] 0-mirror-server: 1100593: ENTRYLK (null) (--) ==> -1 (No such file or
 directory)
[2012-02-15 08:49:32.508353] I [server3_1-fops.c:318:server_entrylk_cbk] 0-mirror-server: 1100594: ENTRYLK (null) (--) ==> -1 (No such file or
 directory)
[2012-02-15 08:49:32.583330] E [marker-quota-helper.c:230:mq_dict_set_contribution] (-->/usr/local/lib/glusterfs/3.3.0qa22/xlator/debug/io-sta
ts.so(io_stats_lookup+0x28c) [0x7fad977959d9] (-->/usr/local/lib/glusterfs/3.3.0qa22/xlator/features/marker.so(marker_lookup+0x1a3) [0x7fad979
b426d] (-->/usr/local/lib/glusterfs/3.3.0qa22/xlator/features/marker.so(mq_req_xattr+0x123) [0x7fad979bf21b]))) 0-marker: invalid argument: lo
c->parent
[2012-02-15 08:49:32.583576] W [marker-quota.c:2039:mq_inspect_directory_xattr] 0-mirror-marker: cannot add a new contribution node
[2012-02-15 08:49:32.632551] W [posix-handle.c:527:posix_handle_soft] 0-mirror-posix: symlink ../../74/e7/74e70ab5-6ec3-48b8-b820-f4e7dfb87e40/linux -> /export-xfs/mirror/.glusterfs/05/e8/05e8225c-d6c0-4043-8c71-ac16596ac87c failed (File exists)
[2012-02-15 08:49:32.632583] E [posix.c:908:posix_mkdir] 0-mirror-posix: setting gfid on /export-xfs/mirror/.glusterfs/d6/ee/d6eea5b4-bac6-4b7d-8fb7-6c6b962a35ec/include/linux failed
[2012-02-15 08:49:33.122634] W [inode.c:866:inode_lookup] (-->/usr/local/lib/glusterfs/3.3.0qa22/xlator/features/marker.so(marker_lookup_cbk+0x23d) [0x7fad979b3fcd] (-->/usr/local/lib/glusterfs/3.3.0qa22/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x262) [0x7fad97790c4a] (-->/usr/local/lib/glusterfs/3.3.0qa22/xlator/protocol/server.so(server_lookup_cbk+0x5a2) [0x7fad97566ef8]))) 0-mirror-server: inode not found

Comment 1 Raghavendra Bhat 2012-02-16 07:07:33 UTC
 kernel untar from fuse and rm -rf of the untarred kernel directory from nfs client were happening parallely.

Comment 2 Amar Tumballi 2012-03-12 09:46:55 UTC
please update these bugs w.r.to 3.3.0qa27, need to work on it as per target milestone set.

Comment 3 Raghavendra Bhat 2012-04-20 06:26:54 UTC
This bug is not observed again. Please re-open if found again.

Comment 4 Anand Avati 2012-05-14 21:45:11 UTC
CHANGE: http://review.gluster.com/3323 (features/marker: use the gfid from the stat structure instead of inode) merged in master by Anand Avati (avati)