Bug 1115648 - Server Crashes on EL5/32-bit
Summary: Server Crashes on EL5/32-bit
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: 3.5.1
Hardware: i686
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-02 20:04 UTC by Russell Purinton
Modified: 2016-07-25 12:09 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-01-23 09:22:19 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
nfs.log and core dump (2.62 MB, application/gzip)
2014-07-02 20:04 UTC, Russell Purinton
no flags Details

Description Russell Purinton 2014-07-02 20:04:45 UTC
Created attachment 914288 [details]
nfs.log and core dump

Description of problem:

When using GlusterFS 3.5.1 on 32-bit EL5, the gluster server crashes when data is written or read.

List of servers:
n1,n2,n3   gluster brick servers for volume "nas"
s11,s12,s21,s22,s31,s32    gluster brick servers for volume "san"
f1,f2,f3   EL6 64-bit servers running gluster but no bricks (doing NFS to Fuse)
x1,x2,x3   EL5 32-bit servers running gluster but no bricks (doing NFS to Fuse)

If I make an mount to f1, all works great
If I make an mount to x1, I can mount and browse files, but upon Read or Write, the Gluster process crashes on x1, the mount becomes broken and things die.  Gluster records the crash trace in the log and puts a core.xxxx file in the root directory.

Version-Release number of selected component (if applicable):

3.5.1


How reproducible:
Everytime.   However does not seem to occur on EL6 64-bit


Steps to Reproduce:
1.  Read or Write Data from a mount

Actual results:
Gluster Crashes, no data transferred


Expected results:
Gluster does not crash, data is transferred


Additional info:
May be due to use of multiple performance Xlators or the configuration of the Stripe/Replica volumes.   Originally I experienced this doing NFS mounts and thought it was NFS related, but now I noticed it happens with Fuse mounts too.

Comment 1 Niels de Vos 2014-09-09 08:04:01 UTC
This likely is an issue related to distribute/dht. It happens with fuse and nfs.

From the attached logs:

package-string: glusterfs 3.5.1
/usr/sbin/glusterfs(glusterfsd_print_trace+0x1a)[0x804b74a]
[0xb775a400]
/usr/lib/glusterfs/3.5.1/xlator/cluster/distribute.so[0xb3d246ac]
/usr/lib/glusterfs/3.5.1/xlator/performance/write-behind.so(wb_stat+0x244)[0xb3d1a0c4]
/usr/lib/libglusterfs.so.0(default_stat+0x64)[0xb76d1a64]
/usr/lib/libglusterfs.so.0(default_stat+0x64)[0xb76d1a64]
/usr/lib/glusterfs/3.5.1/xlator/performance/io-threads.so(iot_stat_wrapper+0x109)[0xb3cecd19]
/usr/lib/libglusterfs.so.0(call_resume+0x13c)[0xb76e9eec]
/usr/lib/glusterfs/3.5.1/xlator/performance/io-threads.so(iot_worker+0x14a)[0xb3cf419a]
/lib/libpthread.so.0[0xb766a912]
/lib/libc.so.6(clone+0x5e)[0xb74977ce]


Could you let us know what the s* and n* architectures are? I suspect they are 64-bit, but a confirmation would be good.

Comment 2 Russell Purinton 2014-09-22 14:46:27 UTC
Hello!  s* and n* were 64-bit EL6

I am using a different configuration now.  I will setup a sandbox and retest.

Comment 3 Niels de Vos 2014-12-20 23:34:36 UTC
I tried this on a 32-bit CentOS6 client, with a 64-bit CentOS7 server, all running the latest 3.5 release. I could not reproduce this problem with some simple create/reading of small (few characters) to medium (CentOS minimal installation .iso) files.

The volume looks like this:

Volume Name: bz1115648
Type: Striped-Replicate
Volume ID: 57b8f1e4-be8a-4fed-b47c-b6908f018672
Status: Started
Number of Bricks: 1 x 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: vm100-010.example.com:/bricks/bz1115648-a/data
Brick2: vm100-010.example.com:/bricks/bz1115648-b/data
Brick3: vm100-010.example.com:/bricks/bz1115648-c/data
Brick4: vm100-010.example.com:/bricks/bz1115648-d/data


Please let us know if you can reproduce it, and explain the additional steps that I have missed.


Note You need to log in before you can comment on or make changes to this bug.