1341243 – fsx crash in 3.8rc2 while running as non-root user

Bug 1341243 - fsx crash in 3.8rc2 while running as non-root user

Summary: fsx crash in 3.8rc2 while running as non-root user

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	nfs
Sub Component:
Version:	3.8
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Niels de Vos
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-05-31 14:10 UTC by Vijay Bellur
Modified:	2017-11-07 10:39 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-11-07 10:39:47 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Vijay Bellur 2016-05-31 14:10:28 UTC

Description of problem:


fsx with the following command line crashes on a nfs mount when run as a non-privileged user:

$fsx -RW fsxy

Correct content saved for comparison
(maybe hexdump "fsxy" vs "fsxy.fsxgood")
Segmentation fault (core dumped)

gluster server configuration:

Volume Name: repl
Type: Distributed-Replicate
Volume ID: 86805ef8-b2f6-41d7-8f0d-65785de1674f
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: gprfs029-10ge:/bricks/b01/data1
Brick2: gprfs030-10ge:/bricks/b01/data1
Brick3: gprfs031-10ge:/bricks/b01/data1
Brick4: gprfs029-10ge:/bricks/b01/data2
Brick5: gprfs030-10ge:/bricks/b01/data2
Brick6: gprfs031-10ge:/bricks/b01/data2
Options Reconfigured:
performance.readdir-ahead: on
transport.address-family: inet
network.ping-timeout: 20
nfs.disable: off


Version-Release number of selected component (if applicable): 3.8.0rc2


How reproducible:

As described above. Has happened 3 times over the last 8-10 days.


Actual results:

fsx dumps core


Expected results:

fsx should not dump core and should run indefinitely.


Additional info:

Attached nfs log has references to possible split-brain and GETATTR failure messages. Not sure if it is related.

Comment 1 Vijay Bellur 2016-05-31 14:31:20 UTC

Unable to add attachments to bugzilla atm. Will try later.

Comment 2 Vijay Bellur 2016-06-14 15:45:40 UTC

Not sure why this bug is assigned to me as I am not looking into fixing this problem actively :).

Here are some more strace details from the time of failure:

1330455 lseek(3, 31537, SEEK_SET)       = 31537
1330455 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41834) 
= 41834
1330455 fstat(3, {st_mode=S_IFREG|0664, st_size=250822, ...}) = 0
1330455 lseek(3, 0, SEEK_END)           = 250822
1330455 ftruncate(3, 226593)            = 0
1330455 fstat(3, {st_mode=S_IFREG|0664, st_size=250822, ...}) = 0
1330455 lseek(3, 0, SEEK_END)           = 250822
1330455 write(1, "Size error: expected 0x37521 sta"..., 55) = 55
1330455 write(1, "LOG DUMP (39834221 total operati"..., 38) = 38


It does look like we obtained a wrong attribute value for size and hence the problem?

The nfs client is on a machine running RHEL Server release 7.1.

$ uname -r
3.10.0-229.el7.x86_64

Comment 3 Niels de Vos 2016-09-12 05:38:12 UTC

All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

Comment 4 Niels de Vos 2017-11-07 10:39:47 UTC

This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.

Note You need to log in before you can comment on or make changes to this bug.