Created attachment 914288 [details] nfs.log and core dump Description of problem: When using GlusterFS 3.5.1 on 32-bit EL5, the gluster server crashes when data is written or read. List of servers: n1,n2,n3 gluster brick servers for volume "nas" s11,s12,s21,s22,s31,s32 gluster brick servers for volume "san" f1,f2,f3 EL6 64-bit servers running gluster but no bricks (doing NFS to Fuse) x1,x2,x3 EL5 32-bit servers running gluster but no bricks (doing NFS to Fuse) If I make an mount to f1, all works great If I make an mount to x1, I can mount and browse files, but upon Read or Write, the Gluster process crashes on x1, the mount becomes broken and things die. Gluster records the crash trace in the log and puts a core.xxxx file in the root directory. Version-Release number of selected component (if applicable): 3.5.1 How reproducible: Everytime. However does not seem to occur on EL6 64-bit Steps to Reproduce: 1. Read or Write Data from a mount Actual results: Gluster Crashes, no data transferred Expected results: Gluster does not crash, data is transferred Additional info: May be due to use of multiple performance Xlators or the configuration of the Stripe/Replica volumes. Originally I experienced this doing NFS mounts and thought it was NFS related, but now I noticed it happens with Fuse mounts too.
This likely is an issue related to distribute/dht. It happens with fuse and nfs. From the attached logs: package-string: glusterfs 3.5.1 /usr/sbin/glusterfs(glusterfsd_print_trace+0x1a)[0x804b74a] [0xb775a400] /usr/lib/glusterfs/3.5.1/xlator/cluster/distribute.so[0xb3d246ac] /usr/lib/glusterfs/3.5.1/xlator/performance/write-behind.so(wb_stat+0x244)[0xb3d1a0c4] /usr/lib/libglusterfs.so.0(default_stat+0x64)[0xb76d1a64] /usr/lib/libglusterfs.so.0(default_stat+0x64)[0xb76d1a64] /usr/lib/glusterfs/3.5.1/xlator/performance/io-threads.so(iot_stat_wrapper+0x109)[0xb3cecd19] /usr/lib/libglusterfs.so.0(call_resume+0x13c)[0xb76e9eec] /usr/lib/glusterfs/3.5.1/xlator/performance/io-threads.so(iot_worker+0x14a)[0xb3cf419a] /lib/libpthread.so.0[0xb766a912] /lib/libc.so.6(clone+0x5e)[0xb74977ce] Could you let us know what the s* and n* architectures are? I suspect they are 64-bit, but a confirmation would be good.
Hello! s* and n* were 64-bit EL6 I am using a different configuration now. I will setup a sandbox and retest.
I tried this on a 32-bit CentOS6 client, with a 64-bit CentOS7 server, all running the latest 3.5 release. I could not reproduce this problem with some simple create/reading of small (few characters) to medium (CentOS minimal installation .iso) files. The volume looks like this: Volume Name: bz1115648 Type: Striped-Replicate Volume ID: 57b8f1e4-be8a-4fed-b47c-b6908f018672 Status: Started Number of Bricks: 1 x 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: vm100-010.example.com:/bricks/bz1115648-a/data Brick2: vm100-010.example.com:/bricks/bz1115648-b/data Brick3: vm100-010.example.com:/bricks/bz1115648-c/data Brick4: vm100-010.example.com:/bricks/bz1115648-d/data Please let us know if you can reproduce it, and explain the additional steps that I have missed.