Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 762535 (GLUSTER-803)

Summary:	[3.0.4rc2] Crash in afr_up_down_flush_post_post_op
Product:	[Community] GlusterFS	Reporter:	Anush Shetty <anush>
Component:	replicate	Assignee:	Pavan Vilas Sondur <pavan>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	low
Version:	mainline	CC:	aavati, amarts, gluster-bugs, rabhat, vijay
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Anush Shetty 2010-04-05 13:23:57 UTC

On a replicate setup with 2 servers, I tried running

Screen1: dd 1G write
Screen2: rm -rf * in a loop
Screen3: killed server1, brought server1 up 

There was a crash

(gdb) bt
#0  strrchr () at ../sysdeps/x86_64/strrchr.S:33
#1  0x00007fa36d549832 in afr_up_down_flush_post_post_op (frame=0x7fa3652919f0, this=0x179bd10) at afr-open.c:362
#2  0x00007fa36d54bd48 in afr_changelog_post_op_cbk (frame=0x7fa3652919f0, cookie=0x7fa360732750, this=0x179bd10, op_ret=0, op_errno=22, 
    xattr=0x7fa3607895b0) at afr-transaction.c:697
#3  0x00007fa36d7924a7 in client_fxattrop_cbk (frame=0x7fa360732750, hdr=0x7fa3607a0d60, hdrlen=197, iobuf=0x0) at client-protocol.c:3857
#4  0x00007fa36d79a6d6 in protocol_client_interpret (this=0x179b7f0, trans=0x179f260, hdr_p=0x7fa3607a0d60 "", hdrlen=197, iobuf=0x0)
    at client-protocol.c:6529
#5  0x00007fa36d79b457 in protocol_client_pollin (this=0x179b7f0, trans=0x179f260) at client-protocol.c:6827
#6  0x00007fa36d79ba50 in notify (this=0x179b7f0, event=2, data=0x179f260) at client-protocol.c:6946
#7  0x00007fa36e94dd2a in xlator_notify (xl=0x179b7f0, event=2, data=0x179f260) at xlator.c:924
#8  0x00007fa36c2bd45a in socket_event_poll_in (this=0x179f260) at socket.c:731
#9  0x00007fa36c2bd78d in socket_event_handler (fd=11, idx=0, data=0x179f260, poll_in=1, poll_out=0, poll_err=0) at socket.c:831
#10 0x00007fa36e9739ee in event_dispatch_epoll_handler (event_pool=0x1795320, events=0x17a2560, i=1) at event.c:804
#11 0x00007fa36e973be0 in event_dispatch_epoll (event_pool=0x1795320) at event.c:867
#12 0x00007fa36e973eff in event_dispatch (event_pool=0x1795320) at event.c:975
#13 0x0000000000406869 in main (argc=8, argv=0x7fffcf2dea28) at glusterfsd.c:1413

Comment 1 Amar Tumballi 2010-04-20 08:20:15 UTC

Here we need to handle the case of 'inode_path()' returning NULL. That should fix the issue. With 3.1.x releases will have proper 'NULL' pointer check and should not result in such crashes.

Comment 2 Anand Avati 2010-05-31 09:39:36 UTC

PATCH: http://patches.gluster.com/patch/3305 in release-3.0 (cluster/afr: Handle open-fds of unlinked files during a possible self heal gracefully.)

Comment 3 Raghavendra Bhat 2010-06-11 09:07:53 UTC

checked with glusterfs-3.0.5rc6. 

dd 1G file
rm -rf * in a loop on the mount point
did server up down several times(using while  true ; ). It did not crashed.


3.0.4 and 3.0.4rc2 crasched with above procedure.