Bug 761778 (GLUSTER-46) - git-cloned glusterfsd crashes on simple client-server usage
Summary: git-cloned glusterfsd crashes on simple client-server usage
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-46
Product: GlusterFS
Classification: Community
Component: locks
Version: mainline
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Vikas Gorur
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-24 09:44 UTC by Basavanagowda Kanur
Modified: 2009-11-16 11:20 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
client log (9.42 KB, text/plain)
2009-06-24 06:44 UTC, Basavanagowda Kanur
no flags Details
backtrace (1.06 KB, text/plain)
2009-06-24 06:44 UTC, Basavanagowda Kanur
no flags Details
Client volume configuration file (480 bytes, text/plain)
2009-06-24 06:44 UTC, Basavanagowda Kanur
no flags Details
Client volume configuration file (211 bytes, application/octet-stream)
2009-06-24 06:45 UTC, Basavanagowda Kanur
no flags Details
Server volume configuration file (392 bytes, text/plain)
2009-06-24 06:46 UTC, Basavanagowda Kanur
no flags Details
Server volume configuration file 1 (392 bytes, text/plain)
2009-06-24 06:46 UTC, Basavanagowda Kanur
no flags Details
Server volume configuration file 2 (392 bytes, text/plain)
2009-06-24 06:48 UTC, Basavanagowda Kanur
no flags Details
Server log (7.74 KB, text/plain)
2009-06-24 06:48 UTC, Basavanagowda Kanur
no flags Details

Description Basavanagowda Kanur 2009-06-24 06:44:25 UTC
Created attachment 15 [details]
po-mode.el file, goes into /usr/share/emacs/site-lisp/

Comment 1 Basavanagowda Kanur 2009-06-24 06:44:47 UTC
Created attachment 16 [details]
Patch to apply to any Postgres version in order for it to work in Ultrasparc/Linux

Comment 2 Basavanagowda Kanur 2009-06-24 06:45:56 UTC
Created attachment 17 [details]
Patch to fix bug7366 in initscripts-4.68-1

Comment 3 Basavanagowda Kanur 2009-06-24 06:46:29 UTC
Created attachment 18 [details]
/etc/conf.modules

Comment 4 Basavanagowda Kanur 2009-06-24 06:46:53 UTC
Created attachment 19 [details]
Printout of FDISK partition table - expert mode

Comment 5 Basavanagowda Kanur 2009-06-24 06:48:26 UTC
Created attachment 20 [details]
New spec file

Comment 6 Basavanagowda Kanur 2009-06-24 06:48:48 UTC
Created attachment 21 [details]
My (combined) patch.  I can split it if desired.

Comment 7 Basavanagowda Kanur 2009-06-24 09:44:00 UTC
[Migrated from savannah BTS] - bug 25749 [https://savannah.nongnu.org/bugs/?25749]

Mon 02 Mar 2009 02:44:41 PM GMT, original submission:

I'm using glusterfs git clone, commit fd524dda532a05cb2485935212d1a66f4130256c.

I've set up simple client-server installation and it crashes.
Test scenario:
- start glusterfsd on server
- start glusterfs on client (mounting /home/import)
- `touch /home/import/a` on the client to crash the server

Server crashes with "Segmentation fault (core dumped)"

Logs from running both server and client with --debug option and configuration are attached.

Version information:
root@glusterfs1:~# glusterfs --version
glusterfs 2.0.0tla built on Feb 27 2009 17:42:44
Repository revision: glusterfs--mainline--3.0--patch-928
--------------------------------------------------------------------------------
Mon 02 Mar 2009 02:57:26 PM GMT, comment #1 by 	Vikas Gorur <vikasgp>:

Piotr,

Thanks for reporting this. Could you please run glusterfsd under gdb and send us the backtrace when it segfaults?

You can get the backtrace by typing 'bt' in gdb.

--------------------------------------------------------------------------------
Tue 03 Mar 2009 08:13:46 AM GMT, comment #2 by 	Piotr Findeisen <findepi>:

#0 0xb7d2c1ec in pthread_mutex_lock@plt () from /lib/glusterfs/2.0.0tla/xlator/features/posix-locks.so
#1 0xb7d2ebaf in pl_flush (frame=0x8054a90, this=0x8053c58, fd=0x808dec8) at posix.c:287
#2 0xb7d1a810 in server_flush (frame=0x8056540, bound_xl=0x8053c58, hdr=0x8056428, hdrlen=48, buf=0x0, buflen=0) at server-protocol.c:4307
#3 0xb7d13a62 in protocol_server_interpret (this=0x80541c8, trans=0x8055ec0, hdr_p=0x8056428 "", hdrlen=48, buf=0x0, buflen=0) at server-protocol.c:7722
#4 0xb7d13d43 in protocol_server_pollin (this=0x80541c8, trans=0x8055ec0) at server-protocol.c:7991
#5 0xb7d13daf in notify (this=0x80541c8, event=2, data=0x8055ec0) at server-protocol.c:8040
#6 0xb75094ab in socket_event_poll_in (this=0x8055ec0) at socket.c:699
#7 0xb750956b in socket_event_handler (fd=9, idx=1, data=0x8055ec0, poll_in=1, poll_out=0, poll_err=0) at socket.c:798
#8 0xb7f07f8a in event_dispatch_epoll (event_pool=0x804f150) at event.c:804
#9 0xb7f06d61 in event_dispatch (event_pool=0x0) at event.c:975
#10 0x0804b406 in main (argc=4, argv=0xbfe24424) at glusterfsd.c:1136

--------------------------------------------------------------------------------
Tue 03 Mar 2009 10:56:09 AM GMT, comment #3 by 	Vikas Gorur <vikasgp>:

Piotr,

I'm unable to reproduce this with the same configuration, and our testing team cannot reproduce it either. Does it happen everytime? Any other clues you can give us? Would it be possible for you to give us remote access?

--------------------------------------------------------------------------------

Tue 03 Mar 2009 11:03:30 AM GMT, comment #4 by 	Piotr Findeisen <findepi>:

Maybe this is somehow related to OpenVZ. We're using it and testing GlusterFS inside...

Anyway I have run versions 2.0.0rc2, 1.4.0qa92, 1.3.12 (all under OpenVZ) and none of them crashed. 1.4.0qa92 even had AFR working without problems.

--------------------------------------------------------------------------------
Tue 03 Mar 2009 11:05:40 AM GMT, comment #5 by 	Vikas Gorur <vikasgp>:

Could you please try running it on just a regular machine (outside OpenVZ) so we can corner the bug?

--------------------------------------------------------------------------------
Tue 03 Mar 2009 12:54:53 PM GMT, comment #6 by 	Piotr Findeisen <findepi>:

I've set up another VPS installation and the bug disappeared. The difference is that new VPSes are 64-bit systems and old were 32-bit. The host is 64-bit.
(Don't ask me how this could ever work :) But as I said it worked for older versions)

I will try once more on regular 32-bit machine w/o any virtualization.

--------------------------------------------------------------------------------
Tue 03 Mar 2009 12:59:40 PM GMT, comment #7 by 	Vikas Gorur <vikasgp>:

Thanks for diagnosing. We'll try to figure out why that could have happened.

--------------------------------------------------------------------------------
Tue 03 Mar 2009 01:03:18 PM GMT, comment #8 by 	Piotr Findeisen <findepi>:

Test on 32-bit (localhost only) passed.

Please note that while complicated, testng under some virtualization is a must for me. We use OpenVZ as a kind of sandboxing and we don't have that many physical servers now to put every service separately. Hopefully, we can have glusterfsd running on 64-bit VPSes. Still we need 32-bit clients (lack of 64-bit support for app X, Y, Z...).

--------------------------------------------------------------------------------
Tue 03 Mar 2009 01:12:41 PM GMT, comment #9 by 	Piotr Findeisen <findepi>:

I can give you ssh access to root account on VPSes with exact instructions how to reproduce bug there -- via email.

--------------------------------------------------------------------------------
Mon 09 Mar 2009 08:49:46 AM GMT, comment #10 by Piotr Findeisen <findepi>:

It seems that I haven't made glusterfs correctly on all machines (VPSes). Eh, stupid me...

However, when I setup an AFR over the servers, writing anything crashes the second one. Configs and crash backtrace are attached. Changing "type cluster/afr" to "type cluster/replicate" on the client side doesn't help.

Comment 8 Vijay Bellur 2009-11-16 08:20:03 UTC
Fixed with current version.


Note You need to log in before you can comment on or make changes to this bug.