Bug 961929 - NFS - Crash during bonnie++
NFS - Crash during bonnie++
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
2.1
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Anand Subramanian
Ben Turner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-10 15:16 EDT by Ben Turner
Modified: 2016-01-03 23:40 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-23 18:39:44 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
server that crashed sos (500.62 KB, application/x-xz)
2013-05-10 15:35 EDT, Ben Turner
no flags Details
Other server sos (489.79 KB, application/x-xz)
2013-05-10 15:35 EDT, Ben Turner
no flags Details

  None (edit)
Description Ben Turner 2013-05-10 15:16:35 EDT
Description of problem:

When running bonnie++ on my NFS mounted 4x1 volume I see bonnie hang.  In the nfs log I see:

[2013-05-10 15:34:40.440975] I [client-handshake.c:1468:client_setvolume_cbk] 0-testvol-client-3: Server and Client lk-version numbers are not same, reopening the fds
[2013-05-10 15:34:40.441174] I [client-handshake.c:450:client_set_lk_version_cbk] 0-testvol-client-3: Server lk version = 1
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
.
.
.
frame : type(0) op(0)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2013-05-10 15:36:19configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.5rhs

[2013-05-10 15:37:35.448865] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-testvol-client-1: server 10.8.0.23:49152 has not responded in the last 42 seconds, disconnecting.
[2013-05-10 16:06:21.631308] E [rpc-clnt.c:207:call_bail] 0-testvol-client-1: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x29901x sent = 2013-05-10 15:36:19.843401. timeout = 1800
[2013-05-10 16:06:21.631340] W [client-rpc-fops.c:866:client3_3_writev_cbk] 0-testvol-client-1: remote operation failed: Transport endpoint is not connected

Version-Release number of selected component (if applicable):

glusterfs-3.4.0.5rhs-1.el6rhs.x86_64

How reproducible:

I have ran this twice, reproducible both times thus far.

Steps to Reproduce:
1. Create a 4x1 volume.
2. Mount via NFS on a 6.3 client
3. Run bonnie++
  
Actual results:

Hang on client side, crash server side.

Expected results:

Normal operation.

Additional info:
Comment 2 Ben Turner 2013-05-10 15:35:30 EDT
Created attachment 746339 [details]
server that crashed sos
Comment 3 Ben Turner 2013-05-10 15:35:54 EDT
Created attachment 746340 [details]
Other server sos
Comment 4 Ben Turner 2013-05-10 15:55:41 EDT
I couldn't find a core anywhere on this system.  When I ran gluster volume status I saw:

[root@storage-qe10 core]# gluster volume status
Status of volume: testvol
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick storage-qe10.lab.eng.rdu2.redhat.com:/bricks/test
vol_brick0						49152	Y	6723
Brick storage-qe11.lab.eng.rdu2.redhat.com:/bricks/test
vol_brick1						49152	Y	5418
Brick storage-qe10.lab.eng.rdu2.redhat.com:/bricks/test
vol_brick2						49153	Y	6732
Brick storage-qe11.lab.eng.rdu2.redhat.com:/bricks/test
vol_brick3						49153	Y	5435
NFS Server on localhost					2049	Y	6742
NFS Server on 6614425e-2a78-41f6-9604-8af602113ba6	2049	Y	5445
 
There are no active volume tasks

I went to look at the process in GDB and saw:

[root@storage-qe10 core]# gdb -p 6742
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 6742
/usr/sbin/glusterfsd (deleted): No such file or directory.
(gdb) bt
#0  0xf8fbb06e in ?? ()
Cannot access memory at address 0xe1adfd98

I didn't see that PID in ps either.
Comment 5 Pranith Kumar K 2013-05-13 02:35:49 EDT
hi Ben,
   I tried this with 3X1 distribute volume. It worked fine. I tried with 3.4.0.6rhs. Here are the results:
root - /mnt/r2 
11:37:24 :) ⚡ bonnie++ -u root -s 4098 -r 0
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
pranithk-lapt 4098M  1928  97 100855   6 50034  12  4886  98 108903  10  5419  45
Latency             14061us     200ms   11236ms   35341us     100ms    4015us
Version  1.96       ------Sequential Create------ --------Random Create--------
pranithk-laptop     -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16   216   2   289   2   311   2   216   2   982   3   772   3
Latency             37881us    3440ms    2292ms   85868us   11270us   37172us
1.96,1.96,pranithk-laptop,1,1368431974,4098M,,1928,97,100855,6,50034,12,4886,98,108903,10,5419,45,16,,,,,216,2,289,2,311,2,216,2,982,3,772,3,14061us,200ms,11236ms,35341us,100ms,4015us,37881us,3440ms,2292ms,85868us,11270us,37172us

Let me try to re-create with 3.4.0.5rhs.

Pranith.
Comment 6 Pranith Kumar K 2013-05-13 03:17:51 EDT
Works with 3.4.0.5rhs as well.
Ben,
   Since the bug is not re-creatable for me I went through the logs. According to the logs, there are around 7394 frames in transit at the time of the crash. We must figure out how frames with 'type and op as 0' come into the system.

pranithk@pranithk-laptop - ~/Downloads/sos/961929/storage-
qe10-2013051015191368213569/var/log/glusterfs 
12:37:30 :) ⚡ grep "frame : type(0) op(0)" nfs.log | wc -l
7394

The second thing we need to figure out is why are the nfs processes running even after a crash? According to .cmd_log_history The only commands executed are:
[2013-05-10 15:34:37.406987]  : volume start testvol : SUCCESS
[2013-05-10 19:01:56.460244]  : volume status : SUCCESS
[2013-05-10 19:01:56.462865]  : volume status : SUCCESS

So I think may be we are missing other steps in the bug description which lead to nfs still running.

Are you running this whole test as part of some automation?.

Pranith.
Comment 7 Ben Turner 2013-05-13 10:40:10 EDT
Yes I am hitting this in the FS Sanity nightly runs.  I hit this in automation both times I tested it and opened the BZ, I haven't attempeted a manual reproducer yet.  I will run everything manually this this morning and document what I am seeing.
Comment 8 santosh pradhan 2013-05-13 11:24:27 EDT

This sounds very similar to BZ 961198 for which fix is already posted downstream by Rajesh. The issue is already explained by the fix:

https://code.engineering.redhat.com/gerrit/#/c/7503/

Thanks,
Santosh
Comment 9 santosh pradhan 2013-05-13 11:28:13 EDT

If there is a core file available, load the core into gdb. If the stack trace looks similar to this (which is a heap/arena-fast bin corruption issue), then it must be a duplicate one:

Program received signal SIGSEGV, Segmentation fault.
0x00000031b7a7be5b in malloc_consolidate (av=av@entry=0x31b7db1740 <main_arena>) at malloc.c:4143
4143            unlink(av, p, bck, fwd);
(gdb) where
#0  0x00000031b7a7be5b in malloc_consolidate (av=av@entry=0x31b7db1740 <main_arena>) at malloc.c:4143
#1  0x00000031b7a7d09e in _int_malloc (av=av@entry=0x31b7db1740 <main_arena>, bytes=bytes@entry=6388) at malloc.c:3422
#2  0x00000031b7a802d8 in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3220
#3  0x00007f7a0b81e703 in __gf_default_calloc (cnt=1, size=6388) at mem-pool.h:75
#4  0x00007f7a0b81eb1f in __gf_calloc (nmemb=1, size=6388, type=48) at mem-pool.c:104
#5  0x00007f7a0b81f6b0 in mem_get (mem_pool=0x20cfb90) at mem-pool.c:419
#6  0x00007f7a0b5c42f1 in rpcsvc_request_create (svc=0x20cfad0, trans=0x212a230, msg=0x2f33670) at rpcsvc.c:361
#7  0x00007f7a0b5c483f in rpcsvc_handle_rpc_call (svc=0x20cfad0, trans=0x212a230, msg=0x2f33670) at rpcsvc.c:504
#8  0x00007f7a0b5c4d49 in rpcsvc_notify (trans=0x212a230, mydata=0x20cfad0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2f33670) at rpcsvc.c:644
#9  0x00007f7a0b5c95d8 in rpc_transport_notify (this=0x212a230, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2f33670) at rpc-transport.c:497
#10 0x00007f7a083d8fa3 in socket_event_poll_in (this=0x212a230) at socket.c:2119
#11 0x00007f7a083d943b in socket_event_handler (fd=18, idx=9, data=0x212a230, poll_in=1, poll_out=0, poll_err=0) at socket.c:2231
#12 0x00007f7a0b848bba in event_dispatch_epoll_handler (event_pool=0x20b0ea0, events=0x20cdd30, i=0) at event-epoll.c:384
#13 0x00007f7a0b848d9f in event_dispatch_epoll (event_pool=0x20b0ea0) at event-epoll.c:445
#14 0x00007f7a0b81e591 in event_dispatch (event_pool=0x20b0ea0) at event.c:113
#15 0x0000000000408608 in main (argc=11, argv=0x7fffae3fbf88) at glusterfsd.c:1902
Comment 10 Ben Turner 2013-05-13 13:00:02 EDT
I hit this last night on a 6x2 volume on the latest bits:

$ rpm -q glusterfs
glusterfs-3.4.0.6rhs-1.el6rhs.x86_64

[root@storage-qe03 /]# gluster volume info
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 27c4afa8-fadc-4ce5-88e2-8b326d49c46b
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: storage-qe03.lab.eng.rdu2.redhat.com:/bricks/testvol_brick0
Brick2: storage-qe04.lab.eng.rdu2.redhat.com:/bricks/testvol_brick1
Brick3: storage-qe05.lab.eng.rdu2.redhat.com:/bricks/testvol_brick2
Brick4: storage-qe08.lab.eng.rdu2.redhat.com:/bricks/testvol_brick3
Brick5: storage-qe03.lab.eng.rdu2.redhat.com:/bricks/testvol_brick4
Brick6: storage-qe04.lab.eng.rdu2.redhat.com:/bricks/testvol_brick5
Brick7: storage-qe05.lab.eng.rdu2.redhat.com:/bricks/testvol_brick6
Brick8: storage-qe08.lab.eng.rdu2.redhat.com:/bricks/testvol_brick7
Brick9: storage-qe03.lab.eng.rdu2.redhat.com:/bricks/testvol_brick8
Brick10: storage-qe04.lab.eng.rdu2.redhat.com:/bricks/testvol_brick9
Brick11: storage-qe05.lab.eng.rdu2.redhat.com:/bricks/testvol_brick10
Brick12: storage-qe08.lab.eng.rdu2.redhat.com:/bricks/testvol_brick11

I talked to Saurabh and we may be seeing the same thing as:

https://bugzilla.redhat.com/show_bug.cgi?id=961198

It looks like there is a patch attached to that BZ, could we get that patch included in the next downstream build to see if it resolves this issue as well?
Comment 11 Amar Tumballi 2013-05-14 01:53:35 EDT
@Ben, yes, the patch got into the 3.4.0.7rhs, can you please Verify if that build works for you?
Comment 12 Ben Turner 2013-05-14 09:52:12 EDT
I tested this last night on the 3.4.0.7 bits and I can confirm that the crash appears to be fixed but performance is really poor.  Normally my bonnie++ runs take from 30-40 minutes on gigabit lan, the current run has been going for over 16 hours.  I looked on all nodes this morning and I don't see any crashes.  When I straced the PID of bonnie I saw that is was still working, just slow.  I tried to quantify this with dd and I didn't see terribly bad performance:

[root@storage-qe06 gluster-mount]# echo 3 > /proc/sys/vm/drop_caches; time `dd if=/dev/zero of=./myfile.test bs=1024k count=1000; sync`
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 18.7489 s, 55.9 MB/s

real	0m19.050s
user	0m0.001s
sys	0m0.731s

1 GB in 19 seconds works out to about 53 MB/second which is not terribly bad.  I thought maybe bonnie was using a smaller block size so I tried:

[root@storage-qe06 gluster-mount]# echo 3 > /proc/sys/vm/drop_caches; time `dd if=/dev/zero of=./myfile.test bs=1024 count=1000000; sync`
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 18.9923 s, 53.9 MB/s

real	0m19.191s
user	0m0.072s
sys	0m1.406s

That is really close even with the small block size.  I would think that with the time it was taking bonnie to finish single threaded I/O would be in the kilobytes.  That doesn't seem to be the case here.  Were there any other upstream fixes related to NFS performance that need to be plucked along with the patch in 3.4.0.7?
Comment 13 Amar Tumballi 2013-05-14 10:34:15 EDT
Ben, can we open another bug for perf? as it is an crash issue, can we mark the crash as VERIFIED? lets tackle the perf issue in different bug. (hence marking it ON_QA).

anyways, for the performance, do 'gluster volume set <VOL> eager-locking disable' and try it once.
Comment 14 Ben Turner 2013-05-14 10:47:07 EDT
No problem Amar.  The new BZ Saurabh opened and its BZ #962510.  Marking this BZ as verified on 3.4.0.7.
Comment 15 Scott Haines 2013-09-23 18:39:44 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html
Comment 16 Scott Haines 2013-09-23 18:43:47 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.