1643919 – glustershd crashed with segmentation fault while doing inservice upgrade

Bug 1643919 - glustershd crashed with segmentation fault while doing inservice upgrade

Summary: glustershd crashed with segmentation fault while doing inservice upgrade

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Ravishankar N
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-10-29 12:45 UTC by Nag Pavan Chilakam
Modified:	2023-09-14 04:41 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-01-03 07:07:12 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nag Pavan Chilakam 2018-10-29 12:45:36 UTC

Description of problem:
========================
hit a glustershd crash while doing inservice upgrade 
Below is the BT

warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
Missing separate debuginfo for 
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/16/3c2dc43405427478788bad0afd537a7acf7a13
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gl'.
Program terminated with signal 11, Segmentation fault.
#0  client3_3_lookup_cbk (req=0x7f25c4032dd0, iov=0x7f25c4032e10, count=<optimized out>, myframe=0x7f25c4037fb0) at client-rpc-fops.c:2807
2807	        inode = local->loc.inode;
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-34.el7.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-59.el7.x86_64 openssl-libs-1.0.2k-16.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  client3_3_lookup_cbk (req=0x7f25c4032dd0, iov=0x7f25c4032e10, count=<optimized out>, myframe=0x7f25c4037fb0) at client-rpc-fops.c:2807
#1  0x00007f26172d7960 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f26040ce3c0, pollin=pollin@entry=0x7f25fc0059c0) at rpc-clnt.c:778
#2  0x00007f26172d7d03 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f26040ce3f0, event=<optimized out>, data=0x7f25fc0059c0) at rpc-clnt.c:971
#3  0x00007f26172d3a73 in rpc_transport_notify (this=this@entry=0x7f26040ce600, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f25fc0059c0) at rpc-transport.c:538
#4  0x00007f260c0f7566 in socket_event_poll_in (this=this@entry=0x7f26040ce600, notify_handled=<optimized out>) at socket.c:2315
#5  0x00007f260c0f9b0c in socket_event_handler (fd=36, idx=26, gen=4, data=0x7f26040ce600, poll_in=1, poll_out=0, poll_err=0) at socket.c:2467
#6  0x00007f261756d7e4 in event_dispatch_epoll_handler (event=0x7f2608e53e80, event_pool=0x55820271c210) at event-epoll.c:583
#7  event_dispatch_epoll_worker (data=0x7f260406fcf0) at event-epoll.c:659
#8  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f2615c36ead in clone () from /lib64/libc.so.6
(gdb) t a a bt

Thread 26 (Thread 0x7f26179f5780 (LWP 6541)):
#0  0x00007f261636ff47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f261756de18 in event_dispatch_epoll (event_pool=0x55820271c210) at event-epoll.c:746
#2  0x0000558202235247 in main (argc=13, argv=<optimized out>) at glusterfsd.c:2550

Thread 25 (Thread 0x7f258f7fe700 (LWP 345)):
#0  0x00007f26163754ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f2616370de6 in _L_lock_941 () from /lib64/libpthread.so.0
#2  0x00007f2616370cdf in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f2615c75552 in _dl_addr () from /lib64/libc.so.6
#4  0x00007f2615c4d585 in backtrace_symbols_fd () from /lib64/libc.so.6
#5  0x00007f2617514c8b in gf_backtrace_fillframes (
    buf=buf@entry=0x7f25e982d6c0 "(--> /lib64/libglusterfs.so.0(synctask_yield+0x2a)[0x7f2617548b0a] (--> /lib64/libglusterfs.so.0(syncbarrier_wait+0x74)[0x7f261754c784] (--> /usr/lib64/glusterfs/3.12.2/xlator/cluster/replicate.so(+0x"...) at common-utils.c:4267
#6  0x00007f261751c565 in gf_backtrace_save (
    buf=buf@entry=0x7f25e982d6c0 "(--> /lib64/libglusterfs.so.0(synctask_yield+0x2a)[0x7f2617548b0a] (--> /lib64/libglusterfs.so.0(syncbarrier_wait+0x74)[0x7f261754c784] (--> /usr/lib64/glusterfs/3.12.2/xlator/cluster/replicate.so(+0x"...) at common-utils.c:4323
#7  0x00007f2617548b0a in synctask_yield (task=task@entry=0x7f25e982d230) at syncop.c:336
#8  0x00007f261754c784 in __syncbarrier_wait (waitfor=2, barrier=0x7f259c07d5b0) at syncop.c:1134
#9  syncbarrier_wait (barrier=barrier@entry=0x7f259c07d5b0, waitfor=waitfor@entry=2) at syncop.c:1155
#10 0x00007f2609bc4ec6 in afr_selfheal_uninodelk (frame=0x7f259c043a20, this=this@entry=0x7f2604051030, inode=<optimized out>, dom=0x7f2604050bc0 "arbo-replicate-1", off=off@entry=9223372036854775806, size=size@entry=0, 
    locked_on=locked_on@entry=0x7f25e9a2ebf0 "\001\001") at afr-self-heal-common.c:2066
#11 0x00007f2609bd04bd in afr_selfheal_metadata (frame=frame@entry=0x7f259c043a20, this=this@entry=0x7f2604051030, inode=<optimized out>) at afr-self-heal-metadata.c:451
#12 0x00007f2609bc9021 in afr_selfheal_do (frame=frame@entry=0x7f259c043a20, this=this@entry=0x7f2604051030, gfid=gfid@entry=0x7f25e9a2ee80 "g|j\210\217\212M%\201\350\262Mȁ\363\377") at afr-self-heal-common.c:2540
#13 0x00007f2609bc90c5 in afr_selfheal (this=this@entry=0x7f2604051030, gfid=gfid@entry=0x7f25e9a2ee80 "g|j\210\217\212M%\201\350\262Mȁ\363\377") at afr-self-heal-common.c:2586
#14 0x00007f2609bd1239 in afr_shd_selfheal (healer=healer@entry=0x7f2604061e40, child=0, gfid=gfid@entry=0x7f25e9a2ee80 "g|j\210\217\212M%\201\350\262Mȁ\363\377") at afr-self-heald.c:334
#15 0x00007f2609bd1491 in afr_shd_index_heal (subvol=0x7f26040481f0, entry=<optimized out>, parent=0x7f25dcff8de0, data=0x7f2604061e40) at afr-self-heald.c:431
#16 0x00007f261756e012 in _dir_scan_job_fn (data=0x7f25e8007da0) at syncop-utils.c:262
#17 0x00007f2617548bb0 in synctask_wrap () at syncop.c:375
#18 0x00007f2615b81010 in ?? () from /lib64/libc.so.6
#19 0x0000000000000000 in ?? ()

Thread 24 (Thread 0x7f2574ff9700 (LWP 450)):
#0  0x00007f2616372d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f261754b188 in syncenv_task (proc=proc@entry=0x558202727b50) at syncop.c:603
#2  0x00007f261754c050 in syncenv_processor (thdata=0x558202727b50) at syncop.c:695
#3  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 23 (Thread 0x7f258e7fc700 (LWP 347)):
#0  0x00007f2616372d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f261754b188 in syncenv_task (proc=proc@entry=0x5582027255d0) at syncop.c:603
#2  0x00007f261754c050 in syncenv_processor (thdata=0x5582027255d0) at syncop.c:695
#3  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 22 (Thread 0x7f258ffff700 (LWP 344)):
#0  0x00007f2616372d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f261754b188 in syncenv_task (proc=proc@entry=0x5582027246d0) at syncop.c:603
#2  0x00007f261754c050 in syncenv_processor (thdata=0x5582027246d0) at syncop.c:695
#3  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 21 (Thread 0x7f25757fa700 (LWP 449)):
#0  0x00007f2616372d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f261754b188 in syncenv_task (proc=proc@entry=0x558202727790) at syncop.c:603
#2  0x00007f261754c050 in syncenv_processor (thdata=0x558202727790) at syncop.c:695
#3  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 20 (Thread 0x7f258dffb700 (LWP 348)):
#0  0x00007f26163754ed in __lll_lock_wait () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#1  0x00007f2616370de6 in _L_lock_941 () from /lib64/libpthread.so.0
#2  0x00007f2616370cdf in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f2615c75552 in _dl_addr () from /lib64/libc.so.6
#4  0x00007f2615c4d585 in backtrace_symbols_fd () from /lib64/libc.so.6
#5  0x00007f2617514c8b in gf_backtrace_fillframes (
    buf=buf@entry=0x7f25e962b980 "(--> /lib64/libglusterfs.so.0(synctask_yield+0x2a)[0x7f2617548b0a] (--> /lib64/libglusterfs.so.0(syncbarrier_wait+0x74)[0x7f261754c784] (--> /usr/lib64/glusterfs/3.12.2/xlator/cluster/replicate.so(+0x"...) at common-utils.c:4267
#6  0x00007f261751c565 in gf_backtrace_save (
    buf=buf@entry=0x7f25e962b980 "(--> /lib64/libglusterfs.so.0(synctask_yield+0x2a)[0x7f2617548b0a] (--> /lib64/libglusterfs.so.0(syncbarrier_wait+0x74)[0x7f261754c784] (--> /usr/lib64/glusterfs/3.12.2/xlator/cluster/replicate.so(+0x"...) at common-utils.c:4323
#7  0x00007f2617548b0a in synctask_yield (task=task@entry=0x7f25e962b4f0) at syncop.c:336
#8  0x00007f261754c784 in __syncbarrier_wait (waitfor=2, barrier=0x7f25bc03eda0) at syncop.c:1134
#9  syncbarrier_wait (barrier=barrier@entry=0x7f25bc03eda0, waitfor=waitfor@entry=2) at syncop.c:1155
#10 0x00007f2609bc4ec6 in afr_selfheal_uninodelk (frame=0x7f25bc027960, this=this@entry=0x7f2604051030, inode=<optimized out>, dom=0x7f2604050bc0 "arbo-replicate-1", off=off@entry=9223372036854775806, size=size@entry=0, 
    locked_on=locked_on@entry=0x7f25e982ceb0 "\001\001") at afr-self-heal-common.c:2066
#11 0x00007f2609bd04bd in afr_selfheal_metadata (frame=frame@entry=0x7f25bc027960, this=this@entry=0x7f2604051030, inode=<optimized out>) at afr-self-heal-metadata.c:451
#12 0x00007f2609bc9021 in afr_selfheal_do (frame=frame@entry=0x7f25bc027960, this=this@entry=0x7f2604051030, gfid=gfid@entry=0x7f25e982d140 "\nWs\\ޥE*\205\376\305uO\n", <incomplete sequence \332>) at afr-self-heal-common.c:2540
#13 0x00007f2609bc90c5 in afr_selfheal (this=this@entry=0x7f2604051030, gfid=gfid@entry=0x7f25e982d140 "\nWs\\ޥE*\205\376\305uO\n", <incomplete sequence \332>) at afr-self-heal-common.c:2586
#14 0x00007f2609bd1239 in afr_shd_selfheal (healer=healer@entry=0x7f2604061e40, child=0, gfid=gfid@entry=0x7f25e982d140 "\nWs\\ޥE*\205\376\305uO\n", <incomplete sequence \332>) at afr-self-heald.c:334
#15 0x00007f2609bd1491 in afr_shd_index_heal (subvol=0x7f26040481f0, entry=<optimized out>, parent=0x7f25dcff8de0, data=0x7f2604061e40) at afr-self-heald.c:431
#16 0x00007f261756e012 in _dir_scan_job_fn (data=0x7f25e8008260) at syncop-utils.c:262
#17 0x00007f2617548bb0 in synctask_wrap () at syncop.c:375
#18 0x00007f2615b81010 in ?? () from /lib64/libc.so.6
#19 0x0000000000000000 in ?? ()

Thread 19 (Thread 0x7f25da7f4700 (LWP 9636)):
#0  0x00007f2616372965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f261756e903 in syncop_mt_dir_scan (frame=frame@entry=0x7f25f40070a0, subvol=subvol@entry=0x7f26040345d0, loc=loc@entry=0x7f25da7f3de0, pid=pid@entry=-6, data=data@entry=0x7f260409f400, 
    fn=fn@entry=0x7f2609bd1370 <afr_shd_index_heal>, xdata=xdata@entry=0x7f25f40015d0, max_jobs=48, max_qlen=1024) at syncop-utils.c:420
#2  0x00007f2609bd195f in afr_shd_index_sweep (healer=healer@entry=0x7f260409f400, vgfid=vgfid@entry=0x7f2609beef69 "glusterfs.xattrop_index_gfid") at afr-self-heald.c:481
#3  0x00007f2609bd1a03 in afr_shd_index_sweep_all (healer=healer@entry=0x7f260409f400) at afr-self-heald.c:504
#4  0x00007f2609bd1b3b in afr_shd_index_healer (data=0x7f260409f400) at afr-self-heald.c:584
#5  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 18 (Thread 0x7f260a860700 (LWP 6563)):
#0  0x00007f2615c37483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f261756d6b2 in event_dispatch_epoll_worker (data=0x5582027634c0) at event-epoll.c:649
#2  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7f258cff9700 (LWP 350)):
#0  0x00007f26163754ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f2616370de6 in _L_lock_941 () from /lib64/libpthread.so.0
#2  0x00007f2616370cdf in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f2615c75552 in _dl_addr () from /lib64/libc.so.6
#4  0x00007f2615c4d585 in backtrace_symbols_fd () from /lib64/libc.so.6
#5  0x00007f2617514c8b in gf_backtrace_fillframes (
    buf=buf@entry=0x7f25e9022b30 "(--> /lib64/libglusterfs.so.0(synctask_yield+0x2a)[0x7f2617548b0a] (--> /lib64/libglusterfs.so.0(syncbarrier_wait+0x74)[0x7f261754c784] (--> /usr/lib64/glusterfs/3.12.2/xlator/cluster/replicate.so(+0x"...) at common-utils.c:4267
#6  0x00007f261751c565 in gf_backtrace_save (
    buf=buf@entry=0x7f25e9022b30 "(--> /lib64/libglusterfs.so.0(synctask_yield+0x2a)[0x7f2617548b0a] (--> /lib64/libglusterfs.so.0(syncbarrier_wait+0x74)[0x7f261754c784] (--> /usr/lib64/glusterfs/3.12.2/xlator/cluster/replicate.so(+0x"...) at common-utils.c:4323
#7  0x00007f2617548b0a in synctask_yield (task=task@entry=0x7f25e90226a0) at syncop.c:336
#8  0x00007f261754c784 in __syncbarrier_wait (waitfor=2, barrier=0x7f2598069330) at syncop.c:1134
#9  syncbarrier_wait (barrier=barrier@entry=0x7f2598069330, waitfor=waitfor@entry=2) at syncop.c:1155
#10 0x00007f2609bc4ec6 in afr_selfheal_uninodelk (frame=0x7f25980017c0, this=this@entry=0x7f2604051030, inode=<optimized out>, dom=0x7f2604050bc0 "arbo-replicate-1", off=off@entry=9223372036854775806, size=size@entry=0, 

    locked_on=locked_on@entry=0x7f25e8c1f0b0 "\001\001") at afr-self-heal-common.c:2066
#11 0x00007f2609bd04bd in afr_selfheal_metadata (frame=frame@entry=0x7f25980017c0, this=this@entry=0x7f2604051030, inode=<optimized out>) at afr-self-heal-metadata.c:451
#12 0x00007f2609bc9021 in afr_selfheal_do (frame=frame@entry=0x7f25980017c0, this=this@entry=0x7f2604051030, gfid=gfid@entry=0x7f25e8c1f340 "\350\003\035\223d\253B\212\222\302~x;\277\302A") at afr-self-heal-common.c:2540
#13 0x00007f2609bc90c5 in afr_selfheal (this=this@entry=0x7f2604051030, gfid=gfid@entry=0x7f25e8c1f340 "\350\003\035\223d\253B\212\222\302~x;\277\302A") at afr-self-heal-common.c:2586
#14 0x00007f2609bd1239 in afr_shd_selfheal (healer=healer@entry=0x7f2604061e40, child=0, gfid=gfid@entry=0x7f25e8c1f340 "\350\003\035\223d\253B\212\222\302~x;\277\302A") at afr-self-heald.c:334
#15 0x00007f2609bd1491 in afr_shd_index_heal (subvol=0x7f26040481f0, entry=<optimized out>, parent=0x7f25dcff8de0, data=0x7f2604061e40) at afr-self-heald.c:431
---Type <return> to continue, or q <return> to quit---
#16 0x00007f261756e012 in _dir_scan_job_fn (data=0x7f25e8c1fc40) at syncop-utils.c:262
#17 0x00007f2617548bb0 in synctask_wrap () at syncop.c:375
#18 0x00007f2615b81010 in ?? () from /lib64/libc.so.6
#19 0x0000000000000000 in ?? ()

Thread 16 (Thread 0x7f258d7fa700 (LWP 349)):
#0  0x00007f26163754ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f2616370de6 in _L_lock_941 () from /lib64/libpthread.so.0
#2  0x00007f2616370cdf in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f2615c75552 in _dl_addr () from /lib64/libc.so.6
#4  0x00007f2615c4d585 in backtrace_symbols_fd () from /lib64/libc.so.6
#5  0x00007f2617514c8b in gf_backtrace_fillframes (
    buf=buf@entry=0x7f25f10329c0 "(--> /lib64/libglusterfs.so.0(synctask_yield+0x2a)[0x7f2617548b0a] (--> /lib64/libglusterfs.so.0(syncbarrier_wait+0x74)[0x7f261754c784] (--> /usr/lib64/glusterfs/3.12.2/xlator/cluster/replicate.so(+0x"...) at common-utils.c:4267
#6  0x00007f261751c565 in gf_backtrace_save (
    buf=buf@entry=0x7f25f10329c0 "(--> /lib64/libglusterfs.so.0(synctask_yield+0x2a)[0x7f2617548b0a] (--> /lib64/libglusterfs.so.0(syncbarrier_wait+0x74)[0x7f261754c784] (--> /usr/lib64/glusterfs/3.12.2/xlator/cluster/replicate.so(+0x"...) at common-utils.c:4323
#7  0x00007f2617548b0a in synctask_yield (task=task@entry=0x7f25f1032530) at syncop.c:336
#8  0x00007f261754c784 in __syncbarrier_wait (waitfor=3, barrier=0x7f25b00519c0) at syncop.c:1134
#9  syncbarrier_wait (barrier=barrier@entry=0x7f25b00519c0, waitfor=waitfor@entry=3) at syncop.c:1155
#10 0x00007f2609bcc059 in afr_selfheal_data_open (this=this@entry=0x7f260402dd60, inode=<optimized out>, fd=fd@entry=0x7f2562200390) at afr-self-heal-data.c:845
#11 0x00007f2609bc905e in afr_selfheal_do (frame=frame@entry=0x7f25ac092880, this=this@entry=0x7f260402dd60, gfid=gfid@entry=0x7f25622004d0 "\355%\222=\265*E\376\222\006\371\273kZ\227", <incomplete sequence \316>)
    at afr-self-heal-common.c:2529
#12 0x00007f2609bc90c5 in afr_selfheal (this=this@entry=0x7f260402dd60, gfid=gfid@entry=0x7f25622004d0 "\355%\222=\265*E\376\222\006\371\273kZ\227", <incomplete sequence \316>) at afr-self-heal-common.c:2586
#13 0x00007f2609bd1239 in afr_shd_selfheal (healer=healer@entry=0x7f26040b3920, child=1, gfid=gfid@entry=0x7f25622004d0 "\355%\222=\265*E\376\222\006\371\273kZ\227", <incomplete sequence \316>) at afr-self-heald.c:334
#14 0x00007f2609bd1491 in afr_shd_index_heal (subvol=0x7f26040295e0, entry=<optimized out>, parent=0x7f25d97f1de0, data=0x7f26040b3920) at afr-self-heald.c:431
#15 0x00007f261756e012 in _dir_scan_job_fn (data=0x7f25f1043650) at syncop-utils.c:262
#16 0x00007f2617548bb0 in synctask_wrap () at syncop.c:375
#17 0x00007f2615b81010 in ?? () from /lib64/libc.so.6
#18 0x0000000000000000 in ?? ()

Thread 15 (Thread 0x7f25dcff9700 (LWP 8666)):
#0  0x00007f2616372965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f261754c7c3 in __syncbarrier_wait (waitfor=3, barrier=0x7f25e8615630) at syncop.c:1138
#2  syncbarrier_wait (barrier=barrier@entry=0x7f25e8615630, waitfor=waitfor@entry=3) at syncop.c:1155
#3  0x00007f2609bc5352 in afr_selfheal_inodelk (frame=0x7f25e86181e0, this=this@entry=0x7f2604051030, inode=<optimized out>, dom=<optimized out>, off=off@entry=9223372036854775806, size=size@entry=0, 
    locked_on=locked_on@entry=0x7f25dcff8820 "") at afr-self-heal-common.c:1967
#4  0x00007f2609bd0418 in afr_selfheal_metadata (frame=frame@entry=0x7f25e86181e0, this=this@entry=0x7f2604051030, inode=<optimized out>) at afr-self-heal-metadata.c:409
#5  0x00007f2609bc9021 in afr_selfheal_do (frame=frame@entry=0x7f25e86181e0, this=this@entry=0x7f2604051030, gfid=gfid@entry=0x7f25dcff8ab0 "\257\355\240i\376\326L)\206w-j+\005u\035") at afr-self-heal-common.c:2540
#6  0x00007f2609bc90c5 in afr_selfheal (this=this@entry=0x7f2604051030, gfid=gfid@entry=0x7f25dcff8ab0 "\257\355\240i\376\326L)\206w-j+\005u\035") at afr-self-heal-common.c:2586
#7  0x00007f2609bd1239 in afr_shd_selfheal (healer=healer@entry=0x7f2604061e40, child=0, gfid=gfid@entry=0x7f25dcff8ab0 "\257\355\240i\376\326L)\206w-j+\005u\035") at afr-self-heald.c:334
#8  0x00007f2609bd1491 in afr_shd_index_heal (subvol=subvol@entry=0x7f26040481f0, entry=entry@entry=0x7f25fc0e6a40, parent=parent@entry=0x7f25dcff8de0, data=data@entry=0x7f2604061e40) at afr-self-heald.c:431
#9  0x00007f261756eb31 in syncop_mt_dir_scan (frame=frame@entry=0x7f25e8001ef0, subvol=subvol@entry=0x7f26040481f0, loc=loc@entry=0x7f25dcff8de0, pid=pid@entry=-6, data=data@entry=0x7f2604061e40, 
    fn=fn@entry=0x7f2609bd1370 <afr_shd_index_heal>, xdata=xdata@entry=0x7f25e8003aa0, max_jobs=48, max_qlen=1024) at syncop-utils.c:407
#10 0x00007f2609bd195f in afr_shd_index_sweep (healer=healer@entry=0x7f2604061e40, vgfid=vgfid@entry=0x7f2609beef69 "glusterfs.xattrop_index_gfid") at afr-self-heald.c:481
#11 0x00007f2609bd1a03 in afr_shd_index_sweep_all (healer=healer@entry=0x7f2604061e40) at afr-self-heald.c:504
#12 0x00007f2609bd1b3b in afr_shd_index_healer (data=0x7f2604061e40) at afr-self-heald.c:584
#13 0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f25d97f2700 (LWP 9639)):
#0  0x00007f2616372965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f261754decb in syncop_readdir (subvol=subvol@entry=0x7f26040295e0, fd=0x7f25f0001ef0, size=size@entry=131072, off=off@entry=2597965, entries=entries@entry=0x7f25d97f1cb0, xdata_in=xdata_in@entry=0x7f25f0016090, 
    xdata_out=xdata_out@entry=0x0) at syncop.c:1388
#2  0x00007f261756e873 in syncop_mt_dir_scan (frame=frame@entry=0x7f25f0000a70, subvol=subvol@entry=0x7f26040295e0, loc=loc@entry=0x7f25d97f1de0, pid=pid@entry=-6, data=data@entry=0x7f26040b3920, 
    fn=fn@entry=0x7f2609bd1370 <afr_shd_index_heal>, xdata=xdata@entry=0x7f25f0016090, max_jobs=48, max_qlen=1024) at syncop-utils.c:384
#3  0x00007f2609bd195f in afr_shd_index_sweep (healer=healer@entry=0x7f26040b3920, vgfid=vgfid@entry=0x7f2609beef69 "glusterfs.xattrop_index_gfid") at afr-self-heald.c:481
#4  0x00007f2609bd1a03 in afr_shd_index_sweep_all (healer=healer@entry=0x7f26040b3920) at afr-self-heald.c:504
#5  0x00007f2609bd1b3b in afr_shd_index_healer (data=0x7f26040b3920) at afr-self-heald.c:584
#6  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f260eb0e700 (LWP 6542)):
---Type <return> to continue, or q <return> to quit---
#0  0x00007f2616375e3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f261751dc96 in gf_timer_proc (data=0x558202723ab0) at timer.c:174
#2  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f2575ffb700 (LWP 448)):
#0  0x00007f2616372d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f261754b188 in syncenv_task (proc=proc@entry=0x5582027273d0) at syncop.c:603
#2  0x00007f261754c050 in syncenv_processor (thdata=0x5582027273d0) at syncop.c:695
#3  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f260db0c700 (LWP 6545)):

#0  0x00007f2615bfde2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f2615bfdcc4 in sleep () from /lib64/libc.so.6
#2  0x00007f261753850d in pool_sweeper (arg=<optimized out>) at mem-pool.c:481
#3  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f260e30d700 (LWP 6544)):
#0  0x00007f2616376361 in sigwait () from /lib64/libpthread.so.0
#1  0x000055820223852b in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2137
#2  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f260cb0a700 (LWP 6547)):

#0  0x00007f2616372d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f261754b188 in syncenv_task (proc=proc@entry=0x558202724a90) at syncop.c:603
#2  0x00007f261754c050 in syncenv_processor (thdata=0x558202724a90) at syncop.c:695
#3  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f25767fc700 (LWP 447)):
#0  0x00007f2616372d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f261754b188 in syncenv_task (proc=proc@entry=0x558202727010) at syncop.c:603
#2  0x00007f261754c050 in syncenv_processor (thdata=0x558202727010) at syncop.c:695
#3  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f25d57ea700 (LWP 20212)):

#0  0x00007f2616372d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f261754b188 in syncenv_task (proc=proc@entry=0x558202725990) at syncop.c:603
#2  0x00007f261754c050 in syncenv_processor (thdata=0x558202725990) at syncop.c:695
#3  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f258effd700 (LWP 346)):
#0  0x00007f2616372d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f261754b188 in syncenv_task (proc=proc@entry=0x558202725210) at syncop.c:603
#2  0x00007f261754c050 in syncenv_processor (thdata=0x558202725210) at syncop.c:695
#3  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f25777fe700 (LWP 351)):
#0  0x00007f2616372d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f261754b188 in syncenv_task (proc=proc@entry=0x558202726890) at syncop.c:603
#2  0x00007f261754c050 in syncenv_processor (thdata=0x558202726890) at syncop.c:695
#3  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f25d87f0700 (LWP 9649)):
#0  0x00007f2616372965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f261754c7c3 in __syncbarrier_wait (waitfor=3, barrier=0x7f25f92635a0) at syncop.c:1138
---Type <return> to continue, or q <return> to quit---
#2  syncbarrier_wait (barrier=barrier@entry=0x7f25f92635a0, waitfor=waitfor@entry=3) at syncop.c:1155
#3  0x00007f2609bc3b42 in afr_selfheal_unlocked_discover_on (frame=frame@entry=0x7f25f926a820, inode=inode@entry=0x7f25f9276110, gfid=gfid@entry=0x7f25d87efab0 "\233\016:[L\224L\344\254\305\336`&\016]\207PG\r\374%\177", 
    replies=replies@entry=0x7f25d87eef30, discover_on=<optimized out>) at afr-self-heal-common.c:1842
#4  0x00007f2609bc3c44 in afr_selfheal_unlocked_discover (frame=frame@entry=0x7f25f926a820, inode=inode@entry=0x7f25f9276110, gfid=gfid@entry=0x7f25d87efab0 "\233\016:[L\224L\344\254\305\336`&\016]\207PG\r\374%\177", 
    replies=replies@entry=0x7f25d87eef30) at afr-self-heal-common.c:1861
#5  0x00007f2609bc79ee in afr_selfheal_unlocked_inspect (frame=frame@entry=0x7f25f926a820, this=this@entry=0x7f2604025090, gfid=gfid@entry=0x7f25d87efab0 "\233\016:[L\224L\344\254\305\336`&\016]\207PG\r\374%\177", 
    link_inode=link_inode@entry=0x7f25d87ef968, data_selfheal=data_selfheal@entry=0x7f25d87ef958, metadata_selfheal=metadata_selfheal@entry=0x7f25d87ef95c, entry_selfheal=entry_selfheal@entry=0x7f25d87ef960)
    at afr-self-heal-common.c:2274
#6  0x00007f2609bc8ef6 in afr_selfheal_do (frame=frame@entry=0x7f25f926a820, this=this@entry=0x7f2604025090, gfid=gfid@entry=0x7f25d87efab0 "\233\016:[L\224L\344\254\305\336`&\016]\207PG\r\374%\177") at afr-self-heal-common.c:2516
#7  0x00007f2609bc90c5 in afr_selfheal (this=this@entry=0x7f2604025090, gfid=gfid@entry=0x7f25d87efab0 "\233\016:[L\224L\344\254\305\336`&\016]\207PG\r\374%\177") at afr-self-heal-common.c:2586
#8  0x00007f2609bd1239 in afr_shd_selfheal (healer=healer@entry=0x7f26040c7e40, child=0, gfid=gfid@entry=0x7f25d87efab0 "\233\016:[L\224L\344\254\305\336`&\016]\207PG\r\374%\177") at afr-self-heald.c:334
#9  0x00007f2609bd1491 in afr_shd_index_heal (subvol=subvol@entry=0x7f260401e4b0, entry=entry@entry=0x7f25fc11e790, parent=parent@entry=0x7f25d87efde0, data=data@entry=0x7f26040c7e40) at afr-self-heald.c:431
#10 0x00007f261756eb31 in syncop_mt_dir_scan (frame=frame@entry=0x7f25f8009380, subvol=subvol@entry=0x7f260401e4b0, loc=loc@entry=0x7f25d87efde0, pid=pid@entry=-6, data=data@entry=0x7f26040c7e40, 
    fn=fn@entry=0x7f2609bd1370 <afr_shd_index_heal>, xdata=xdata@entry=0x7f25f8001ef0, max_jobs=48, max_qlen=1024) at syncop-utils.c:407
#11 0x00007f2609bd195f in afr_shd_index_sweep (healer=healer@entry=0x7f26040c7e40, vgfid=vgfid@entry=0x7f2609beef69 "glusterfs.xattrop_index_gfid") at afr-self-heald.c:481
#12 0x00007f2609bd1a03 in afr_shd_index_sweep_all (healer=healer@entry=0x7f26040c7e40) at afr-self-heald.c:504
#13 0x00007f2609bd1b3b in afr_shd_index_healer (data=0x7f26040c7e40) at afr-self-heald.c:584
#14 0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f2615c36ead in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f2576ffd700 (LWP 446)):
#0  0x00007f2615c2da00 in writev () from /lib64/libc.so.6
#1  0x00007f26175398e5 in sys_writev (fd=<optimized out>, iov=<optimized out>, iovcnt=<optimized out>) at syscall.c:302
#2  0x00007f260c0f6137 in __socket_rwv (this=this@entry=0x7f26040a5a60, vector=<optimized out>, count=<optimized out>, pending_vector=pending_vector@entry=0x7f25c4017bc0, pending_count=pending_count@entry=0x7f25c4017bc8, 
    bytes=bytes@entry=0x0, write=write@entry=1) at socket.c:550
#3  0x00007f260c0f6aa2 in __socket_writev (pending_count=<optimized out>, pending_vector=<optimized out>, count=<optimized out>, vector=<optimized out>, this=0x7f26040a5a60) at socket.c:666
#4  __socket_ioq_churn_entry (this=this@entry=0x7f26040a5a60, entry=entry@entry=0x7f25c4017aa0, direct=direct@entry=1) at socket.c:1115
#5  0x00007f260c0f721c in socket_submit_request (this=0x7f26040a5a60, req=<optimized out>) at socket.c:3611
#6  0x00007f26172d80ee in rpc_clnt_submit (rpc=0x7f26040a5820, prog=prog@entry=0x7f260a05ee40 <clnt3_3_fop_prog>, procnum=procnum@entry=27, cbkfn=cbkfn@entry=0x7f2609e28c60 <client3_3_lookup_cbk>, proghdr=proghdr@entry=0x7f25f6c72490, 
    proghdrcount=<optimized out>, progpayload=progpayload@entry=0x0, progpayloadcount=progpayloadcount@entry=0, iobref=iobref@entry=0x7f25c400abe0, frame=frame@entry=0x7f25c401a620, rsphdr=0x0, rsphdr_count=rsphdr_count@entry=0, 
    rsp_payload=rsp_payload@entry=0x0, rsp_payload_count=rsp_payload_count@entry=0, rsp_iobref=rsp_iobref@entry=0x0) at rpc-clnt.c:1672
#7  0x00007f2609e16272 in client_submit_request (this=this@entry=0x7f26040345d0, req=req@entry=0x7f25f6c72770, frame=frame@entry=0x7f25c401a620, prog=0x7f260a05ee40 <clnt3_3_fop_prog>, procnum=procnum@entry=27, 
    cbkfn=cbkfn@entry=0x7f2609e28c60 <client3_3_lookup_cbk>, iobref=iobref@entry=0x0, rsphdr=rsphdr@entry=0x0, rsphdr_count=rsphdr_count@entry=0, rsp_payload=rsp_payload@entry=0x0, rsp_payload_count=rsp_payload_count@entry=0, 
    rsp_iobref=0x0, xdrproc=0x7f26170b9980 <xdr_gfs3_lookup_req>) at client.c:313
#8  0x00007f2609e34d76 in client3_3_lookup (frame=0x7f25c401a620, this=0x7f26040345d0, data=<optimized out>) at client-rpc-fops.c:3394
#9  0x00007f2609e0da40 in client_lookup (frame=0x7f25c401a620, this=<optimized out>, loc=<optimized out>, xdata=<optimized out>) at client.c:538
#10 0x00007f2609bc3b10 in afr_selfheal_unlocked_discover_on (frame=frame@entry=0x7f25c406a590, inode=inode@entry=0x7f25c404c740, gfid=gfid@entry=0x7f25f6c73710 "\005#7\251ښI\260\222\202mN\226", replies=replies@entry=0x7f25f6c72b90, 
    discover_on=0x7f260403cac0 "\001\001\001", <incomplete sequence \360\255\272>) at afr-self-heal-common.c:1842
#11 0x00007f2609bc3c44 in afr_selfheal_unlocked_discover (frame=frame@entry=0x7f25c406a590, inode=inode@entry=0x7f25c404c740, gfid=gfid@entry=0x7f25f6c73710 "\005#7\251ښI\260\222\202mN\226", replies=replies@entry=0x7f25f6c72b90)
    at afr-self-heal-common.c:1861
#12 0x00007f2609bc79ee in afr_selfheal_unlocked_inspect (frame=frame@entry=0x7f25c406a590, this=this@entry=0x7f26040369f0, gfid=gfid@entry=0x7f25f6c73710 "\005#7\251ښI\260\222\202mN\226", link_inode=link_inode@entry=0x7f25f6c735c8, 
    data_selfheal=data_selfheal@entry=0x7f25f6c735b8, metadata_selfheal=metadata_selfheal@entry=0x7f25f6c735bc, entry_selfheal=entry_selfheal@entry=0x7f25f6c735c0) at afr-self-heal-common.c:2274
#13 0x00007f2609bc8ef6 in afr_selfheal_do (frame=frame@entry=0x7f25c406a590, this=this@entry=0x7f26040369f0, gfid=gfid@entry=0x7f25f6c73710 "\005#7\251ښI\260\222\202mN\226") at afr-self-heal-common.c:2516
#14 0x00007f2609bc90c5 in afr_selfheal (this=this@entry=0x7f26040369f0, gfid=gfid@entry=0x7f25f6c73710 "\005#7\251ښI\260\222\202mN\226") at afr-self-heal-common.c:2586
#15 0x00007f2609bd1239 in afr_shd_selfheal (healer=healer@entry=0x7f260409f400, child=2, gfid=gfid@entry=0x7f25f6c73710 "\005#7\251ښI\260\222\202mN\226") at afr-self-heald.c:334
#16 0x00007f2609bd1491 in afr_shd_index_heal (subvol=0x7f26040345d0, entry=<optimized out>, parent=0x7f25da7f3de0, data=0x7f260409f400) at afr-self-heald.c:431
#17 0x00007f261756e012 in _dir_scan_job_fn (data=0x7f25f543c1a0) at syncop-utils.c:262
#18 0x00007f2617548bb0 in synctask_wrap () at syncop.c:375
#19 0x00007f2615b81010 in ?? () from /lib64/libc.so.6
#20 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f256ffff700 (LWP 487)):

#0  0x00007f2615c2da00 in writev () from /lib64/libc.so.6
#1  0x00007f2615c4d56a in backtrace_symbols_fd () from /lib64/libc.so.6
#2  0x00007f2617514c8b in gf_backtrace_fillframes (
    buf=buf@entry=0x7f25f5469330 "(--> /lib64/libglusterfs.so.0(synctask_yield+0x2a)[0x7f2617548b0a] (--> /lib64/libglusterfs.so.0(syncbarrier_wait+0x74)[0x7f261754c784] (--> /usr/lib64/glusterfs/3.12.2/xlator/cluster/replicate.so(+0x"...) at common-utils.c:4267
#3  0x00007f261751c565 in gf_backtrace_save (
    buf=buf@entry=0x7f25f5469330 "(--> /lib64/libglusterfs.so.0(synctask_yield+0x2a)[0x7f2617548b0a] (--> /lib64/libglusterfs.so.0(syncbarrier_wait+0x74)[0x7f261754c784] (--> /usr/lib64/glusterfs/3.12.2/xlator/cluster/replicate.so(+0x"...) at common-utils.c:4323
#4  0x00007f2617548b0a in synctask_yield (task=task@entry=0x7f25f5468ea0) at syncop.c:336
#5  0x00007f261754c784 in __syncbarrier_wait (waitfor=2, barrier=0x7f25b8056f00) at syncop.c:1134
#6  syncbarrier_wait (barrier=barrier@entry=0x7f25b8056f00, waitfor=waitfor@entry=2) at syncop.c:1155
---Type <return> to continue, or q <return> to quit---

#7  0x00007f2609bc4ec6 in afr_selfheal_uninodelk (frame=0x7f25b8033c60, this=this@entry=0x7f26040369f0, inode=<optimized out>, dom=0x7f2604036590 "basevol-11-replicate-0", off=off@entry=9223372036854775806, size=size@entry=0, 
    locked_on=locked_on@entry=0x7f2569a00100 "\001") at afr-self-heal-common.c:2066
#8  0x00007f2609bd04bd in afr_selfheal_metadata (frame=frame@entry=0x7f25b8033c60, this=this@entry=0x7f26040369f0, inode=<optimized out>) at afr-self-heal-metadata.c:451
#9  0x00007f2609bc9021 in afr_selfheal_do (frame=frame@entry=0x7f25b8033c60, this=this@entry=0x7f26040369f0, gfid=gfid@entry=0x7f2569a00390 "\315`q\027\\\305Lc\233\025.Y\251\271\326", <incomplete sequence \371>)
    at afr-self-heal-common.c:2540
#10 0x00007f2609bc90c5 in afr_selfheal (this=this@entry=0x7f26040369f0, gfid=gfid@entry=0x7f2569a00390 "\315`q\027\\\305Lc\233\025.Y\251\271\326", <incomplete sequence \371>) at afr-self-heal-common.c:2586
#11 0x00007f2609bd1239 in afr_shd_selfheal (healer=healer@entry=0x7f260409f400, child=2, gfid=gfid@entry=0x7f2569a00390 "\315`q\027\\\305Lc\233\025.Y\251\271\326", <incomplete sequence \371>) at afr-self-heald.c:334
#12 0x00007f2609bd1491 in afr_shd_index_heal (subvol=0x7f26040345d0, entry=<optimized out>, parent=0x7f25da7f3de0, data=0x7f260409f400) at afr-self-heald.c:431
#13 0x00007f261756e012 in _dir_scan_job_fn (data=0x7f25f40158c0) at syncop-utils.c:262
#14 0x00007f2617548bb0 in synctask_wrap () at syncop.c:375
#15 0x00007f2615b81010 in ?? () from /lib64/libc.so.6
#16 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f2608e54700 (LWP 7217)):
#0  client3_3_lookup_cbk (req=0x7f25c4032dd0, iov=0x7f25c4032e10, count=<optimized out>, myframe=0x7f25c4037fb0) at client-rpc-fops.c:2807
#1  0x00007f26172d7960 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f26040ce3c0, pollin=pollin@entry=0x7f25fc0059c0) at rpc-clnt.c:778
#2  0x00007f26172d7d03 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f26040ce3f0, event=<optimized out>, data=0x7f25fc0059c0) at rpc-clnt.c:971
#3  0x00007f26172d3a73 in rpc_transport_notify (this=this@entry=0x7f26040ce600, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f25fc0059c0) at rpc-transport.c:538
#4  0x00007f260c0f7566 in socket_event_poll_in (this=this@entry=0x7f26040ce600, notify_handled=<optimized out>) at socket.c:2315
#5  0x00007f260c0f9b0c in socket_event_handler (fd=36, idx=26, gen=4, data=0x7f26040ce600, poll_in=1, poll_out=0, poll_err=0) at socket.c:2467
#6  0x00007f261756d7e4 in event_dispatch_epoll_handler (event=0x7f2608e53e80, event_pool=0x55820271c210) at event-epoll.c:583
#7  event_dispatch_epoll_worker (data=0x7f260406fcf0) at event-epoll.c:659
#8  0x00007f261636edd5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f2615c36ead in clone () from /lib64/libc.so.6
(gdb) 
(gdb) 
(gdb) 
(gdb) 








Version-Release number of selected component (if applicable):
==========================
Inservice upgrade from 3.4.0-async/rhel7.5 to 3.4.1-stage/rhel7.6


How reproducible:
=============
hit it once


Steps to Reproduce:
=================
1.6 node cluster with brickmux enabled
2.create 6 1x3 volumes named as basevol-{10..15}-->mounted each on one dedicated fuse client, pumping IOs, like untar and appending o/p untar to a file called "log" along with some file ops like creates or ls using dd
3.created a 2x(2+1) arbiter volume with one brick on each of the nodes
4. set mtsh value to 48 for faster healing 
5. Subscribed to stage (for both kernel and rhgs updates)
6. upgraded 2 nodes at a time or in similar time frame, ie {n1, n4 } {n2,n5} and n3,n6-->making sure no same replica nodes are being upgraded parallely
7. successfully upgraded n1,n4,n2,n5 to 3.12.2-24(this was the rhgs version pushed to stage then), which took close to 2 days, due to n/w being slow for downloading pacakges and hence lot of healing required
8. now upgraded n3,n6 to latest rhgs -->note this is 3.12.2-25
9. now started glusterd---> but stopped it after sometime, say about 2-5 min as reboot was required for kernel to be in effect
10. now rebooted n3,n6
11. healing triggered, all seems to be good
12. some files were pending on one of the volumes, when i tried to trigger manual index heal, on this volume, it failed with staging issue and said some shd was down. 
found a shd core on n4
-->note the shd core based on the stat of core file seemed to have been generated during the reboot of n3,n6

Actual results:


Expected results:


Additional info:

Comment 2 Nag Pavan Chilakam 2018-10-29 12:47:32 UTC

tail of shd log
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2018-10-29 08:46:08
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.2
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x9d)[0x7f261750fdfd]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f2617519ec4]
/lib64/libc.so.6(+0x36280)[0x7f2615b6f280]
/usr/lib64/glusterfs/3.12.2/xlator/protocol/client.so(+0x22ce3)[0x7f2609e28ce3]
/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f26172d7960]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a3)[0x7f26172d7d03]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f26172d3a73]
/usr/lib64/glusterfs/3.12.2/rpc-transport/socket.so(+0x7566)[0x7f260c0f7566]
/usr/lib64/glusterfs/3.12.2/rpc-transport/socket.so(+0x9b0c)[0x7f260c0f9b0c]
/lib64/libglusterfs.so.0(+0x897e4)[0x7f261756d7e4]
/lib64/libpthread.so.0(+0x7dd5)[0x7f261636edd5]
/lib64/libc.so.6(clone+0x6d)[0x7f2615c36ead]

Comment 7 Nag Pavan Chilakam 2018-10-30 07:21:56 UTC

[root@dhcp35-140 glusterfs]# date; gluster v heal basevol-12  info
Tue Oct 30 12:08:45 IST 2018
Brick dhcp35-184.lab.eng.blr.redhat.com:/gluster/brick12/basevol-12
Status: Connected
Number of entries: 0

Brick dhcp35-83.lab.eng.blr.redhat.com:/gluster/brick12/basevol-12
<gfid:0555fe14-760a-4a78-a2be-47043c396cd9> 
<gfid:8dfb2208-a889-46a8-85e9-e86a1ccf16e9> 
<gfid:35fc46ef-fb2c-4cf0-bf4a-e287e6e675e1> 
<gfid:08c28d80-fffd-4f43-9a10-f4d8566ef119> 
Status: Connected
Number of entries: 4

Brick dhcp35-127.lab.eng.blr.redhat.com:/gluster/brick12/basevol-12
<gfid:0555fe14-760a-4a78-a2be-47043c396cd9> 
<gfid:8dfb2208-a889-46a8-85e9-e86a1ccf16e9> 
<gfid:35fc46ef-fb2c-4cf0-bf4a-e287e6e675e1> 
<gfid:08c28d80-fffd-4f43-9a10-f4d8566ef119> 
Status: Connected
Number of entries: 4

Comment 8 Prasad Desala 2018-10-30 08:42:16 UTC

I have also hit this issue while upgrading gluster nodes from 3.4.0-async/rhel7.5 to 3.4.1-stage/rhel7.6

This bt seems to be same as the one in the description. If this is a different issue, please let me know, I will file a new BZ for separate tracking.

(gdb) bt
#0  0x00007f3e191288f2 in afr_selfheal_lock_cbk (frame=<optimized out>, cookie=<optimized out>, this=0x0, op_ret=0, op_errno=0, xdata=0x0) at afr-self-heal-common.c:1887
#1  0x00007f3e1938bb36 in client3_3_inodelk_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f3dfc004760) at client-rpc-fops.c:1515
#2  0x00007f3e26a49960 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f3e1403a9b0, pollin=pollin@entry=0x7f3e0c0034f0) at rpc-clnt.c:778
#3  0x00007f3e26a49d03 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f3e1403a9e0, event=<optimized out>, data=0x7f3e0c0034f0) at rpc-clnt.c:971
#4  0x00007f3e26a45a73 in rpc_transport_notify (this=this@entry=0x7f3e1403ab80, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f3e0c0034f0)
    at rpc-transport.c:538
#5  0x00007f3e1b869566 in socket_event_poll_in (this=this@entry=0x7f3e1403ab80, notify_handled=<optimized out>) at socket.c:2315
#6  0x00007f3e1b86bb0c in socket_event_handler (fd=11, idx=5, gen=10639, data=0x7f3e1403ab80, poll_in=1, poll_out=0, poll_err=0) at socket.c:2467
#7  0x00007f3e26cdf7e4 in event_dispatch_epoll_handler (event=0x7f3e13ffee80, event_pool=0x5639ca76aa30) at event-epoll.c:583
#8  event_dispatch_epoll_worker (data=0x7f3e140361c0) at event-epoll.c:659
#9  0x00007f3e25ae0dd5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f3e253a8ead in clone () from /lib64/libc.so.6


(gdb) t a a bt

Thread 10 (Thread 0x7f3e1da7f700 (LWP 4853)):
#0  0x00007f3e25ae8361 in sigwait () from /lib64/libpthread.so.0
#1  0x00005639ca50052b in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2137
#2  0x00007f3e25ae0dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f3e253a8ead in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f3e1c27c700 (LWP 4858)):
#0  0x00007f3e25ae4d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f3e26cbd188 in syncenv_task (proc=proc@entry=0x5639ca7732b0) at syncop.c:603
#2  0x00007f3e26cbe050 in syncenv_processor (thdata=0x5639ca7732b0) at syncop.c:695
#3  0x00007f3e25ae0dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f3e253a8ead in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f3e1ca7d700 (LWP 4857)):
#0  0x00007f3e25ae4d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f3e26cbd188 in syncenv_task (proc=proc@entry=0x5639ca772ef0) at syncop.c:603
#2  0x00007f3e26cbe050 in syncenv_processor (thdata=0x5639ca772ef0) at syncop.c:695
#3  0x00007f3e25ae0dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f3e253a8ead in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f3e1d27e700 (LWP 4856)):
#0  0x00007f3e2536fe2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f3e2536fcc4 in sleep () from /lib64/libc.so.6
#2  0x00007f3e26caa50d in pool_sweeper (arg=<optimized out>) at mem-pool.c:481
#3  0x00007f3e25ae0dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f3e253a8ead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f3e1e280700 (LWP 4851)):
#0  0x00007f3e25ae7e3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f3e26c8fc96 in gf_timer_proc (data=0x5639ca7722d0) at timer.c:174
#2  0x00007f3e25ae0dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f3e253a8ead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f3e037fe700 (LWP 6222)):
#0  0x00007f3e25ae4965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f3e26cbe7c3 in __syncbarrier_wait (waitfor=3, barrier=0x7f3df800d3f0) at syncop.c:1138
#2  syncbarrier_wait (barrier=barrier@entry=0x7f3df800d3f0, waitfor=waitfor@entry=3) at syncop.c:1155
#3  0x00007f3e1912cb42 in afr_selfheal_unlocked_discover_on (frame=frame@entry=0x7f3df8216b30, inode=inode@entry=0x7f3df8216e50, 
    gfid=gfid@entry=0x7f3df8216e58 "tU\306P W@\211\264\223%/\235v\310j", replies=replies@entry=0x7f3e037fcf40, discover_on=<optimized out>) at afr-self-heal-common.c:1842
#4  0x00007f3e1912cc44 in afr_selfheal_unlocked_discover (frame=frame@entry=0x7f3df8216b30, inode=inode@entry=0x7f3df8216e50, 
    gfid=gfid@entry=0x7f3df8216e58 "tU\306P W@\211\264\223%/\235v\310j", replies=replies@entry=0x7f3e037fcf40) at afr-self-heal-common.c:1861
#5  0x00007f3e191387e6 in __afr_selfheal_metadata_prepare (frame=0x7f3df8216b30, this=this@entry=0x7f3e14018b40, inode=0x7f3df8216e50, 
---Type <return> to continue, or q <return> to quit--- 
    locked_on=locked_on@entry=0x7f3e037fd820 "\001\001", sources=0x7f3e037fd8a0 "", sinks=0x7f3e037fd880 "", healed_sinks=healed_sinks@entry=0x7f3e037fd860 "", 
    undid_pending=undid_pending@entry=0x7f3e037fd840 "", replies=replies@entry=0x7f3e037fcf40, pflag=pflag@entry=0x0) at afr-self-heal-metadata.c:332
#6  0x00007f3e1913945e in afr_selfheal_metadata (frame=frame@entry=0x7f3df8216b30, this=this@entry=0x7f3e14018b40, inode=<optimized out>) at afr-self-heal-metadata.c:417
#7  0x00007f3e19132021 in afr_selfheal_do (frame=frame@entry=0x7f3df8216b30, this=this@entry=0x7f3e14018b40, 
    gfid=gfid@entry=0x7f3e037fdab0 "tU\306P W@\211\264\223%/\235v\310j") at afr-self-heal-common.c:2540
#8  0x00007f3e191320c5 in afr_selfheal (this=this@entry=0x7f3e14018b40, gfid=gfid@entry=0x7f3e037fdab0 "tU\306P W@\211\264\223%/\235v\310j") at afr-self-heal-common.c:2586
#9  0x00007f3e1913a239 in afr_shd_selfheal (healer=healer@entry=0x7f3e14028600, child=0, gfid=gfid@entry=0x7f3e037fdab0 "tU\306P W@\211\264\223%/\235v\310j")
    at afr-self-heald.c:334
#10 0x00007f3e1913a491 in afr_shd_index_heal (subvol=subvol@entry=0x7f3e1400fb70, entry=entry@entry=0x7f3e0c068320, parent=parent@entry=0x7f3e037fdde0, 
    data=data@entry=0x7f3e14028600) at afr-self-heald.c:431
#11 0x00007f3e26ce0b31 in syncop_mt_dir_scan (frame=frame@entry=0x7f3df800aa00, subvol=subvol@entry=0x7f3e1400fb70, loc=loc@entry=0x7f3e037fdde0, pid=pid@entry=-6, 
    data=data@entry=0x7f3e14028600, fn=fn@entry=0x7f3e1913a370 <afr_shd_index_heal>, xdata=xdata@entry=0x7f3df800ae30, max_jobs=1, max_qlen=1024) at syncop-utils.c:407
#12 0x00007f3e1913a95f in afr_shd_index_sweep (healer=healer@entry=0x7f3e14028600, vgfid=vgfid@entry=0x7f3e19157f69 "glusterfs.xattrop_index_gfid") at afr-self-heald.c:481
#13 0x00007f3e1913aa03 in afr_shd_index_sweep_all (healer=healer@entry=0x7f3e14028600) at afr-self-heald.c:504
#14 0x00007f3e1913ab3b in afr_shd_index_healer (data=0x7f3e14028600) at afr-self-heald.c:584
#15 0x00007f3e25ae0dd5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f3e253a8ead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f3e19dc9700 (LWP 4874)):
#0  0x00007f3e253a9483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f3e26cdf6b2 in event_dispatch_epoll_worker (data=0x5639ca7b4080) at event-epoll.c:649
#2  0x00007f3e25ae0dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f3e253a8ead in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f3e117fa700 (LWP 6215)):
#0  0x00007f3e25ae4965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f3e26cbe7c3 in __syncbarrier_wait (waitfor=3, barrier=0x7f3e04218b20) at syncop.c:1138
#2  syncbarrier_wait (barrier=barrier@entry=0x7f3e04218b20, waitfor=waitfor@entry=3) at syncop.c:1155
#3  0x00007f3e1912f707 in afr_selfheal_unentrylk (frame=0x7f3e04012ff0, this=this@entry=0x7f3e14016590, inode=<optimized out>, dom=0x7f3e140156b0 "distrep-replicate-0", 
    name=name@entry=0x0, locked_on=<optimized out>, xdata=xdata@entry=0x0) at afr-self-heal-common.c:2177
#4  0x00007f3e19136817 in afr_selfheal_entry_dirent (frame=frame@entry=0x7f3e04012ff0, this=this@entry=0x7f3e14016590, fd=fd@entry=0x7f3e0400cb90, 
    name=name@entry=0x7f3e0c010c98 "sysfs-devices-firmware_node", parent_idx_inode=<optimized out>, subvol=subvol@entry=0x7f3e1400d810, full_crawl=<optimized out>)
    at afr-self-heal-entry.c:628
#5  0x00007f3e191371ee in afr_selfheal_entry_do_subvol (frame=frame@entry=0x7f3e04002850, this=this@entry=0x7f3e14016590, fd=fd@entry=0x7f3e0400cb90, child=child@entry=2)
    at afr-self-heal-entry.c:742
#6  0x00007f3e19137ed8 in afr_selfheal_entry_do (sources=<optimized out>, healed_sinks=0x7f3e117f9730 "", source=<optimized out>, fd=0x7f3e0400cb90, this=0x7f3e14016590, 
    frame=<optimized out>) at afr-self-heal-entry.c:908
#7  __afr_selfheal_entry (frame=frame@entry=0x7f3e04002850, this=this@entry=0x7f3e14016590, fd=fd@entry=0x7f3e0400cb90, locked_on=<optimized out>)
    at afr-self-heal-entry.c:1002
#8  0x00007f3e1913837b in afr_selfheal_entry (frame=frame@entry=0x7f3e04002850, this=this@entry=0x7f3e14016590, inode=0x7f3e0400da60) at afr-self-heal-entry.c:1112
#9  0x00007f3e19131fb0 in afr_selfheal_do (frame=frame@entry=0x7f3e04002850, this=this@entry=0x7f3e14016590, 
    gfid=gfid@entry=0x7f3e117f9ab0 "}5'[\334&E\267\220̳\331\021\232K+08\023\024>\177") at afr-self-heal-common.c:2543
#10 0x00007f3e191320c5 in afr_selfheal (this=this@entry=0x7f3e14016590, gfid=gfid@entry=0x7f3e117f9ab0 "}5'[\334&E\267\220̳\331\021\232K+08\023\024>\177")
---Type <return> to continue, or q <return> to quit---
    at afr-self-heal-common.c:2586
#11 0x00007f3e1913a239 in afr_shd_selfheal (healer=healer@entry=0x7f3e1402fc40, child=0, gfid=gfid@entry=0x7f3e117f9ab0 "}5'[\334&E\267\220̳\331\021\232K+08\023\024>\177")
    at afr-self-heald.c:334
#12 0x00007f3e1913a491 in afr_shd_index_heal (subvol=subvol@entry=0x7f3e14008ba0, entry=entry@entry=0x7f3e14133a70, parent=parent@entry=0x7f3e117f9de0, 
    data=data@entry=0x7f3e1402fc40) at afr-self-heald.c:431
#13 0x00007f3e26ce0b31 in syncop_mt_dir_scan (frame=frame@entry=0x7f3e0400e700, subvol=subvol@entry=0x7f3e14008ba0, loc=loc@entry=0x7f3e117f9de0, pid=pid@entry=-6, 
    data=data@entry=0x7f3e1402fc40, fn=fn@entry=0x7f3e1913a370 <afr_shd_index_heal>, xdata=xdata@entry=0x7f3e040014b0, max_jobs=1, max_qlen=1024) at syncop-utils.c:407
#14 0x00007f3e1913a95f in afr_shd_index_sweep (healer=healer@entry=0x7f3e1402fc40, vgfid=vgfid@entry=0x7f3e19157f69 "glusterfs.xattrop_index_gfid") at afr-self-heald.c:481
#15 0x00007f3e1913aa03 in afr_shd_index_sweep_all (healer=healer@entry=0x7f3e1402fc40) at afr-self-heald.c:504
#16 0x00007f3e1913ab3b in afr_shd_index_healer (data=0x7f3e1402fc40) at afr-self-heald.c:584
#17 0x00007f3e25ae0dd5 in start_thread () from /lib64/libpthread.so.0
#18 0x00007f3e253a8ead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f3e27165780 (LWP 4846)):
#0  0x00007f3e25ae1f47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f3e26cdfe18 in event_dispatch_epoll (event_pool=0x5639ca76aa30) at event-epoll.c:746
#2  0x00005639ca4fd247 in main (argc=13, argv=<optimized out>) at glusterfsd.c:2550

Thread 1 (Thread 0x7f3e13fff700 (LWP 5318)):
#0  0x00007f3e191288f2 in afr_selfheal_lock_cbk (frame=<optimized out>, cookie=<optimized out>, this=0x0, op_ret=0, op_errno=0, xdata=0x0) at afr-self-heal-common.c:1887
#1  0x00007f3e1938bb36 in client3_3_inodelk_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f3dfc004760) at client-rpc-fops.c:1515
#2  0x00007f3e26a49960 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f3e1403a9b0, pollin=pollin@entry=0x7f3e0c0034f0) at rpc-clnt.c:778
#3  0x00007f3e26a49d03 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f3e1403a9e0, event=<optimized out>, data=0x7f3e0c0034f0) at rpc-clnt.c:971
#4  0x00007f3e26a45a73 in rpc_transport_notify (this=this@entry=0x7f3e1403ab80, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f3e0c0034f0)
    at rpc-transport.c:538
#5  0x00007f3e1b869566 in socket_event_poll_in (this=this@entry=0x7f3e1403ab80, notify_handled=<optimized out>) at socket.c:2315
#6  0x00007f3e1b86bb0c in socket_event_handler (fd=11, idx=5, gen=10639, data=0x7f3e1403ab80, poll_in=1, poll_out=0, poll_err=0) at socket.c:2467
#7  0x00007f3e26cdf7e4 in event_dispatch_epoll_handler (event=0x7f3e13ffee80, event_pool=0x5639ca76aa30) at event-epoll.c:583
#8  event_dispatch_epoll_worker (data=0x7f3e140361c0) at event-epoll.c:659
#9  0x00007f3e25ae0dd5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f3e253a8ead in clone () from /lib64/libc.so.6

Steps:
======
1) On a 3 node (s1,s2,s3) (rhel 7.5 + rhgs 3.4.0) setup created one distrep volume and one pure distribute volume.
Note: brick-mux is not enabled.
2) Volumes are started and FUSE mounted on a client.
3) from client, Created some data (linux kernel untar) on pure distribute volume and after completion of linux untar stopped the volume.
4) On distrep (2x3) volume started linux kernel untar (IO is running throughout the upgrade process) and started doing in-service upgrade to rhel 7.6 + rhgs 3.4.1.
5) in-service on s1 and s2 is successful.
6) Started in-service upgrade on s3. After package update rebooted the node as there is a change in kernel version. 
7) Once s3 is back online, checked gluster v status from s3 and can see glustershd on s1 not running. glustershd on s1 crashed.

Comment 17 Karthik U S 2018-10-31 15:24:01 UTC

Reason for the crash:
In both the crashes, when we reach cliet3_3_<func>_cbk the value of frame->local is empty, leading shd to crash.
The reason for it to become empty is unknown at the moment. Need some more debugging and code reading to find out this.

Comment 20 Karthik U S 2018-11-05 10:33:57 UTC

Current status:

Ravi also went through the core files but did not find anything conclusive there to RC this. We both tried to reproduce this locally, but everything worked fine for both of us. We are working on this and will keep this updated.

Thanks,
Karthik

Comment 21 Ravishankar N 2018-11-06 10:29:39 UTC

Hi Milind/ Mohit

If a client receives both disconnect event as well as a fop cbk response in parallel, is there a chance that we will do an unwind of the frame twice? i.e. one for the fop cbk and the other as a part of unwinding all saved frames with ENOTCONN?

In the crash that was hit on Nag's setup, the entire frame's contents (ref_count, wind_to, unwind_from etc) is zero:
----------------------------------------------------------------------------
Program terminated with signal 11, Segmentation fault.
#0  client3_3_lookup_cbk (req=0x7f25c4032dd0, iov=0x7f25c4032e10, count=<optimized out>, myframe=0x7f25c4037fb0) at client-rpc-fops.c:2807

(gdb) p *frame
$1 = {root = 0x7f26040c7710, parent = 0xaa4c9c9d74728ffb, frames = {next = 0x40ac2b62ca4f8890, prev = 0x0}, local = 0x0, this = 0x0, ret = 0x0, ref_count = 0, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0,
        __owner = 0, __nusers = 0, __kind = 3, __spins = 0, __elision = 0, __list = {__prev = 0x7f25c4038008, __next = 0x7f25c4038008}},
      __size = '\000' <repeats 16 times>, "\003\000\000\000\000\000\000\000\b\200\003\304%\177\000\000\b\200\003\304%\177\000", __align = 0}}, cookie = 0x7f25c4038018, complete = (unknown: 3288563736), op = 32549, begin = {
    tv_sec = 139800179081256, tv_usec = 139800179081256}, end = {tv_sec = 139800112163464, tv_usec = 139799440980552}, wind_from = 0x7f25c40633e0 "", wind_to = 0x0, unwind_from = 0x0, unwind_to = 0x0}

----------------------------------------------------------------------------

Comment 22 Ravishankar N 2018-11-06 12:46:10 UTC

@RPC devs:

There was a similar shd crash (frame corruption) hit by upstream user on glusterfs3.12.3: They have some analysis @ https://lists.gluster.org/pipermail/gluster-users/2018-October/035177.html , but from my code-reading, that seems to be inaccurate, as call_frame and saved_frame are different and every saved_frame that gets allocated in rpc_clnt_submit() gets freed at rpc_clnt_handle_reply().  Please correct me if I am wrong.

Comment 25 zhou lin 2019-01-28 08:49:41 UTC

i also meet the same glustershd coredump as  Prasad Desala 's, although this is not easy to reproduce, still it appear occasionally, the operation i made is sacle out glusterfs client nodes, and the glustershd coredump appears on sn2 node. 
my stack is like following:
[New LWP 5538]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs -s sn-2.local --volfile-id gluster/glustershd -p /var/run/g'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f0fcdfdcf91 in afr_selfheal_lock_cbk (frame=0x7f0f98012040, cookie=0x2, this=0x7f0fc8026850, op_ret=0, op_errno=0, xdata=0x0) at afr-self-heal-common.c:1886
1886         afr-self-heal-common.c: No such file or directory.
[Current thread is 1 (Thread 0x7f0fcec91700 (LWP 5543))]
Missing separate debuginfos, use: dnf debuginfo-install rcp-pack-glusterfs-1.8.1_11_g99e9ca6-RCP2.wf28.x86_64
(gdb) bt
#0  0x00007f0fcdfdcf91 in afr_selfheal_lock_cbk (frame=0x7f0f98012040, cookie=0x2, this=0x7f0fc8026850, op_ret=0, op_errno=0, xdata=0x0) at afr-self-heal-common.c:1886
#1  0x00007f0fce255e6d in client3_3_inodelk_cbk (req=0x7f0f9801d0d0, iov=0x7f0f9801d110, count=1, myframe=0x7f0f98008970) at client-rpc-fops.c:1510
#2  0x00007f0fd41d8db1 in rpc_clnt_handle_reply (clnt=0x7f0fc8076e30, pollin=0x7f0fc80bf6b0) at rpc-clnt.c:782
#3  0x00007f0fd41d934f in rpc_clnt_notify (trans=0x7f0fc8077060, mydata=0x7f0fc8076e60, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f0fc80bf6b0) at rpc-clnt.c:975
#4  0x00007f0fd41d5319 in rpc_transport_notify (this=0x7f0fc8077060, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f0fc80bf6b0) at rpc-transport.c:538
#5  0x00007f0fcf11c34d in socket_event_poll_in (this=0x7f0fc8077060, notify_handled=_gf_true) at socket.c:2328
#6  0x00007f0fcf11c992 in socket_event_handler (fd=22, idx=13, gen=1, data=0x7f0fc8077060, poll_in=1, poll_out=0, poll_err=0) at socket.c:2480
#7  0x00007f0fd44835d4 in event_dispatch_epoll_handler (event_pool=0x25d2730, event=0x7f0fcec90e84) at event-epoll.c:565
#8  0x00007f0fd44838ab in event_dispatch_epoll_worker (data=0x261a040) at event-epoll.c:635
#9  0x00007f0fd31cf5da in start_thread () from /lib64/libpthread.so.0
#10 0x00007f0fd2aa5e8f in clone () from /lib64/libc.so.6

Comment 27 Milind Changire 2019-02-20 06:12:14 UTC

(In reply to Ravishankar N from comment #22)
> @RPC devs:
> 
> There was a similar shd crash (frame corruption) hit by upstream user on
> glusterfs3.12.3: They have some analysis @
> https://lists.gluster.org/pipermail/gluster-users/2018-October/035177.html ,
> but from my code-reading, that seems to be inaccurate, as call_frame and
> saved_frame are different and every saved_frame that gets allocated in
> rpc_clnt_submit() gets freed at rpc_clnt_handle_reply().  Please correct me
> if I am wrong.

Ravi, your observation is correct.
I wonder how a disconnect and a reply would arrive in parallel. But, assuming so, the processing is serialized on a connection lock.
If the disconnect path gets the connection lock first, then all saved frames for that connection are unwound and the saved frames set is destroyed and new saved frames object is assigned to the connection object.
If a call reply path gets the connection lock first, then the frame is retrieved from the saved frames set and unwound as usual.

Also, the frame->local object is wiped off immediately after the client protocol handler returns from the unwinds.

Comment 28 zhou lin 2019-05-16 05:24:10 UTC

this glustershd coredump issue still comes out in my env occassinally, if it is difficult to analysis, is there any convenient way to start up glustershd process? E.g some gluster cli command?

Comment 29 Ravishankar N 2019-05-16 05:43:52 UTC

(In reply to zhou lin from comment #28)
> is there any convenient way to start up glustershd process? E.g some gluster cli command?

`gluster volume start $volname force` will restart the shd

Comment 35 Yaniv Kaul 2020-01-03 07:07:12 UTC

(In reply to Ravishankar N from comment #32)
> (In reply to Sunil Kumar Acharya from comment #31)
> > What are the next steps on this Bug?
> Looking at the core did not give us enough clue as to why the crash
> happened. If we get a reproducer that is consistent, it will help us debug
> further by adding logs etc.

Closing. If there's a consistent reproducer on a recent release, please re-open.

Comment 36 Red Hat Bugzilla 2023-09-14 04:41:27 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.