Bug 1459400
Summary: | brick process crashes while running bug-1432542-mpx-restart-crash.t in a loop | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Mohit Agrawal <moagrawa> | |
Component: | core | Assignee: | Mohit Agrawal <moagrawa> | |
Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | rhgs-3.3 | CC: | amukherj, nchilaka, rhs-bugs, storage-qa-internal | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.3.0 | |||
Hardware: | x86_64 | |||
OS: | All | |||
Whiteboard: | brick-multiplexing | |||
Fixed In Version: | glusterfs-3.8.4-28 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1459402 (view as bug list) | Environment: | ||
Last Closed: | 2017-09-21 04:45:37 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1468514 | |||
Bug Blocks: | 1417151, 1459402 |
Description
Mohit Agrawal
2017-06-07 03:50:52 UTC
Hi, Below is the core pattern generated by brick process at the time of getting crash >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 09:18:46 Program terminated with signal 11, Segmentation fault. 09:18:46 #0 0x00007f8fcab1bcdf in index_get_gfid_type (opaque=0x7f8fac49bcb0) at /home/jenkins/root/workspace/regression-test-with-multiplex/xlators/features/index/src/index.c:1632 09:18:46 1632 list_for_each_entry (entry, &args->entries->list, list) { 09:18:46 09:18:46 Thread 64 (Thread 0x7f8fd4700700 (LWP 6861)): 09:18:46 #0 0x00007f8fde222a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 09:18:46 No symbol table info available. 09:18:46 #1 0x00007f8fdef91ba0 in syncenv_task (proc=0xce4d70) at /home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/syncop.c:603 09:18:46 env = 0xce49b0 09:18:46 task = 0x0 09:18:46 sleep_till = {tv_sec = 1496642513, tv_nsec = 0} 09:18:46 ret = 0 09:18:46 #2 0x00007f8fdef91e42 in syncenv_processor (thdata=0xce4d70) at /home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/syncop.c:695 09:18:46 env = 0xce49b0 09:18:46 proc = 0xce4d70 09:18:46 task = 0x7f8fcc3d7670 09:18:46 #3 0x00007f8fde21eaa1 in start_thread () from /lib64/libpthread.so.0 09:18:46 No symbol table info available. 09:18:46 #4 0x00007f8fddb86bcd in clone () from /lib64/libc.so.6 09:18:46 No symbol table info available. 09:18:46 ..... ...... ...... 18:46 Thread 1 (Thread 0x7f8fd5101700 (LWP 6860)): 09:18:46 #0 0x00007f8fcab1bcdf in index_get_gfid_type (opaque=0x7f8fac49bcb0) at /home/jenkins/root/workspace/regression-test-with-multiplex/xlators/features/index/src/index.c:1632 09:18:46 entry = 0x0 09:18:46 this = 0x7f8fa0a24da0 09:18:46 args = 0x7f8fac49bcb0 09:18:46 loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>} 09:18:46 iatt = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0} 09:18:46 ret = 0 09:18:46 #1 0x00007f8fdef91355 in synctask_wrap () at /home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/syncop.c:375 09:18:46 task = 0x7f8fa4782e50 09:18:46 #2 0x00007f8fddae1760 in ?? () from /lib64/libc.so.6 09:18:46 No symbol table info available. 09:18:46 #3 0x0000000000000000 in ?? () 09:18:46 No symbol table info available. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>. After analyse the crash it seems brick process was getting crash because thread was not cleaned up appropriately in index xlator. After update the index_worker code as well as notify code in index xlator , brick process is not getting crash. Regards Mohit Agrawal downstream patch : https://code.engineering.redhat.com/gerrit/#/c/108648/ ON_QA validation: had 50 1x3 vols on a 6 node brick mux setup(all bricks hosted only on first 3 nodes) Started and stopped volumes in loop for 100 times with IO goingon on one vol. for j in {1..100};do echo "#########################################";date;for i in $(gluster v list);do gluster v stop $i --mode=script;done;ps -ef|grep glusterfsd;echo "############ end of loop $j ########################" ;for k in $(gluster v list);do gluster v start $k;done;done after about 7 times, I hit glusterfsd crash as reported in BZ#1468514 Hence blocked, until BZ#1468514 is fixed on_qa validation 3.8.4-34 test version Ran the command in comment#8 while I was doing IO for only one volume(refer to https://bugzilla.redhat.com/show_bug.cgi?id=1468514#c21) Moving to verified as I didn't hit any crash for about 150 loops. However, I did hit the same crash at around loop#170 which is mentioned in BZ#468514. As i didn't hit anyother crash i am moving to verified (In reply to nchilaka from comment #9) > on_qa validation 3.8.4-34 test version > > Ran the command in comment#8 while I was doing IO for only one volume(refer > to https://bugzilla.redhat.com/show_bug.cgi?id=1468514#c21) > > Moving to verified as I didn't hit any crash for about 150 loops. > However, I did hit the same crash at around loop#170 which is mentioned in > BZ#468514. As i didn't hit anyother crash i am moving to verified Sorry it is BZ#1468514 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 |