858487 – [8b6534031ab9b60da293e9c2ffb95141d714f973]: glusterfs server crashed due to memory corruption

Bug 858487 - [8b6534031ab9b60da293e9c2ffb95141d714f973]: glusterfs server crashed due to memory corruption

Summary: [8b6534031ab9b60da293e9c2ffb95141d714f973]: glusterfs server crashed due to m...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Raghavendra Bhat
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:	817468 852820
Blocks:
TreeView+	depends on / blocked

Reported:	2012-09-18 22:29 UTC by Scott Haines
Modified:	2015-08-10 07:43 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.4.0qa8
Doc Type:	Bug Fix
Doc Text:
Clone Of:	852820
Environment:
Last Closed:	2015-08-10 07:43:05 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 2 Pranith Kumar K 2012-11-21 10:09:40 UTC

Got the same crash while running https://github.com/pranithk/gluster-tests/blob/master/afr/self-heal.sh

Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id vol.pranithk-laptop.tmp-0'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f4ad769841e in rpcsvc_get_listener (svc=0x1ddd9e0, port=65535, trans=0x1dee6e0) at rpcsvc.c:1238
1238	                                if (listener->trans == trans) {
Missing separate debuginfos, use: debuginfo-install glibc-2.15-37.fc17.x86_64 keyutils-libs-1.5.5-2.fc17.x86_64 krb5-libs-1.10-5.fc17.x86_64 libaio-0.3.109-5.fc17.x86_64 libcom_err-1.42-4.fc17.x86_64 libgcc-4.7.0-5.fc17.x86_64 libselinux-2.1.10-3.fc17.x86_64 openssl-1.0.0j-1.fc17.x86_64 zlib-1.2.5-6.fc17.x86_64
(gdb) bt
#0  0x00007f4ad769841e in rpcsvc_get_listener (svc=0x1ddd9e0, port=65535, trans=0x1dee6e0) at rpcsvc.c:1238
#1  0x00007f4ad769715e in rpcsvc_notify (trans=0x1dfb070, mydata=0x1ddd9e0, event=RPC_TRANSPORT_CLEANUP, data=0x0) at rpcsvc.c:651
#2  0x00007f4ad769be1c in rpc_transport_unref (this=0x1dfb070) at rpc-transport.c:476
#3  0x00007f4ad4729f90 in socket_event_handler (fd=7, idx=3, data=0x1dfb070, poll_in=1, poll_out=0, poll_err=16) at socket.c:2106
#4  0x00007f4ad7915806 in event_dispatch_epoll_handler (event_pool=0x1ddcea0, events=0x1df9a10, i=1) at event-epoll.c:384
#5  0x00007f4ad79159eb in event_dispatch_epoll (event_pool=0x1ddcea0) at event-epoll.c:445
#6  0x00007f4ad78ed40d in event_dispatch (event_pool=0x1ddcea0) at event.c:113
#7  0x000000000040857f in main (argc=19, argv=0x7fff46640108) at glusterfsd.c:1893

Comment 3 Raghavendra Bhat 2012-12-04 09:43:07 UTC

http://review.gluster.org/4230 : root caused the issue as a race between freeing the RPC object and handling socket's POLL_ERR

May be couple of iterations of review before getting the patch accepted, after its accepted in upstream, will backport to 2.0.z

Comment 4 Vijay Bellur 2012-12-05 00:28:15 UTC

CHANGE: http://review.gluster.org/4230 (rpc: check the ctx->listener before accessing rpcsvc object) merged in master by Anand Avati (avati)

Comment 6 Raghavendra Bhat 2013-01-31 10:30:17 UTC

https://code.engineering.redhat.com/gerrit/#/c/1763/ has been submitted for review.

Comment 7 SATHEESARAN 2013-08-13 09:50:18 UTC

Verified with glusterfs-3.4.0.18rhs-1

Test steps
==========

1. Ran the test case - https://tcms.engineering.redhat.com/case/243098/?from_plan=7656

2. Created distributed volume with 3 bricks and fuse mounted it in 2 clients

From clients ran intense IO, ( creating multiple files using dd command ),
and on RHS server side, kickstarted a continuous graph change ( by setting/unsetting a volume option, stat-prefetch on/off for instance ), and
getting sosreports from RHS machines

There seemed to be no crash or errors.

Note You need to log in before you can comment on or make changes to this bug.