Description of problem: When running bonnie++ on my NFS mounted 4x1 volume I see bonnie hang. In the nfs log I see: [2013-05-10 15:34:40.440975] I [client-handshake.c:1468:client_setvolume_cbk] 0-testvol-client-3: Server and Client lk-version numbers are not same, reopening the fds [2013-05-10 15:34:40.441174] I [client-handshake.c:450:client_set_lk_version_cbk] 0-testvol-client-3: Server lk version = 1 pending frames: frame : type(0) op(0) frame : type(0) op(0) . . . frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2013-05-10 15:36:19configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.5rhs [2013-05-10 15:37:35.448865] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-testvol-client-1: server 10.8.0.23:49152 has not responded in the last 42 seconds, disconnecting. [2013-05-10 16:06:21.631308] E [rpc-clnt.c:207:call_bail] 0-testvol-client-1: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x29901x sent = 2013-05-10 15:36:19.843401. timeout = 1800 [2013-05-10 16:06:21.631340] W [client-rpc-fops.c:866:client3_3_writev_cbk] 0-testvol-client-1: remote operation failed: Transport endpoint is not connected Version-Release number of selected component (if applicable): glusterfs-3.4.0.5rhs-1.el6rhs.x86_64 How reproducible: I have ran this twice, reproducible both times thus far. Steps to Reproduce: 1. Create a 4x1 volume. 2. Mount via NFS on a 6.3 client 3. Run bonnie++ Actual results: Hang on client side, crash server side. Expected results: Normal operation. Additional info:
Created attachment 746339 [details] server that crashed sos
Created attachment 746340 [details] Other server sos
I couldn't find a core anywhere on this system. When I ran gluster volume status I saw: [root@storage-qe10 core]# gluster volume status Status of volume: testvol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick storage-qe10.lab.eng.rdu2.redhat.com:/bricks/test vol_brick0 49152 Y 6723 Brick storage-qe11.lab.eng.rdu2.redhat.com:/bricks/test vol_brick1 49152 Y 5418 Brick storage-qe10.lab.eng.rdu2.redhat.com:/bricks/test vol_brick2 49153 Y 6732 Brick storage-qe11.lab.eng.rdu2.redhat.com:/bricks/test vol_brick3 49153 Y 5435 NFS Server on localhost 2049 Y 6742 NFS Server on 6614425e-2a78-41f6-9604-8af602113ba6 2049 Y 5445 There are no active volume tasks I went to look at the process in GDB and saw: [root@storage-qe10 core]# gdb -p 6742 GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Attaching to process 6742 /usr/sbin/glusterfsd (deleted): No such file or directory. (gdb) bt #0 0xf8fbb06e in ?? () Cannot access memory at address 0xe1adfd98 I didn't see that PID in ps either.
hi Ben, I tried this with 3X1 distribute volume. It worked fine. I tried with 3.4.0.6rhs. Here are the results: root - /mnt/r2 11:37:24 :) ⚡ bonnie++ -u root -s 4098 -r 0 Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP pranithk-lapt 4098M 1928 97 100855 6 50034 12 4886 98 108903 10 5419 45 Latency 14061us 200ms 11236ms 35341us 100ms 4015us Version 1.96 ------Sequential Create------ --------Random Create-------- pranithk-laptop -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 216 2 289 2 311 2 216 2 982 3 772 3 Latency 37881us 3440ms 2292ms 85868us 11270us 37172us 1.96,1.96,pranithk-laptop,1,1368431974,4098M,,1928,97,100855,6,50034,12,4886,98,108903,10,5419,45,16,,,,,216,2,289,2,311,2,216,2,982,3,772,3,14061us,200ms,11236ms,35341us,100ms,4015us,37881us,3440ms,2292ms,85868us,11270us,37172us Let me try to re-create with 3.4.0.5rhs. Pranith.
Works with 3.4.0.5rhs as well. Ben, Since the bug is not re-creatable for me I went through the logs. According to the logs, there are around 7394 frames in transit at the time of the crash. We must figure out how frames with 'type and op as 0' come into the system. pranithk@pranithk-laptop - ~/Downloads/sos/961929/storage- qe10-2013051015191368213569/var/log/glusterfs 12:37:30 :) ⚡ grep "frame : type(0) op(0)" nfs.log | wc -l 7394 The second thing we need to figure out is why are the nfs processes running even after a crash? According to .cmd_log_history The only commands executed are: [2013-05-10 15:34:37.406987] : volume start testvol : SUCCESS [2013-05-10 19:01:56.460244] : volume status : SUCCESS [2013-05-10 19:01:56.462865] : volume status : SUCCESS So I think may be we are missing other steps in the bug description which lead to nfs still running. Are you running this whole test as part of some automation?. Pranith.
Yes I am hitting this in the FS Sanity nightly runs. I hit this in automation both times I tested it and opened the BZ, I haven't attempeted a manual reproducer yet. I will run everything manually this this morning and document what I am seeing.
This sounds very similar to BZ 961198 for which fix is already posted downstream by Rajesh. The issue is already explained by the fix: https://code.engineering.redhat.com/gerrit/#/c/7503/ Thanks, Santosh
If there is a core file available, load the core into gdb. If the stack trace looks similar to this (which is a heap/arena-fast bin corruption issue), then it must be a duplicate one: Program received signal SIGSEGV, Segmentation fault. 0x00000031b7a7be5b in malloc_consolidate (av=av@entry=0x31b7db1740 <main_arena>) at malloc.c:4143 4143 unlink(av, p, bck, fwd); (gdb) where #0 0x00000031b7a7be5b in malloc_consolidate (av=av@entry=0x31b7db1740 <main_arena>) at malloc.c:4143 #1 0x00000031b7a7d09e in _int_malloc (av=av@entry=0x31b7db1740 <main_arena>, bytes=bytes@entry=6388) at malloc.c:3422 #2 0x00000031b7a802d8 in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3220 #3 0x00007f7a0b81e703 in __gf_default_calloc (cnt=1, size=6388) at mem-pool.h:75 #4 0x00007f7a0b81eb1f in __gf_calloc (nmemb=1, size=6388, type=48) at mem-pool.c:104 #5 0x00007f7a0b81f6b0 in mem_get (mem_pool=0x20cfb90) at mem-pool.c:419 #6 0x00007f7a0b5c42f1 in rpcsvc_request_create (svc=0x20cfad0, trans=0x212a230, msg=0x2f33670) at rpcsvc.c:361 #7 0x00007f7a0b5c483f in rpcsvc_handle_rpc_call (svc=0x20cfad0, trans=0x212a230, msg=0x2f33670) at rpcsvc.c:504 #8 0x00007f7a0b5c4d49 in rpcsvc_notify (trans=0x212a230, mydata=0x20cfad0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2f33670) at rpcsvc.c:644 #9 0x00007f7a0b5c95d8 in rpc_transport_notify (this=0x212a230, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2f33670) at rpc-transport.c:497 #10 0x00007f7a083d8fa3 in socket_event_poll_in (this=0x212a230) at socket.c:2119 #11 0x00007f7a083d943b in socket_event_handler (fd=18, idx=9, data=0x212a230, poll_in=1, poll_out=0, poll_err=0) at socket.c:2231 #12 0x00007f7a0b848bba in event_dispatch_epoll_handler (event_pool=0x20b0ea0, events=0x20cdd30, i=0) at event-epoll.c:384 #13 0x00007f7a0b848d9f in event_dispatch_epoll (event_pool=0x20b0ea0) at event-epoll.c:445 #14 0x00007f7a0b81e591 in event_dispatch (event_pool=0x20b0ea0) at event.c:113 #15 0x0000000000408608 in main (argc=11, argv=0x7fffae3fbf88) at glusterfsd.c:1902
I hit this last night on a 6x2 volume on the latest bits: $ rpm -q glusterfs glusterfs-3.4.0.6rhs-1.el6rhs.x86_64 [root@storage-qe03 /]# gluster volume info Volume Name: testvol Type: Distributed-Replicate Volume ID: 27c4afa8-fadc-4ce5-88e2-8b326d49c46b Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: storage-qe03.lab.eng.rdu2.redhat.com:/bricks/testvol_brick0 Brick2: storage-qe04.lab.eng.rdu2.redhat.com:/bricks/testvol_brick1 Brick3: storage-qe05.lab.eng.rdu2.redhat.com:/bricks/testvol_brick2 Brick4: storage-qe08.lab.eng.rdu2.redhat.com:/bricks/testvol_brick3 Brick5: storage-qe03.lab.eng.rdu2.redhat.com:/bricks/testvol_brick4 Brick6: storage-qe04.lab.eng.rdu2.redhat.com:/bricks/testvol_brick5 Brick7: storage-qe05.lab.eng.rdu2.redhat.com:/bricks/testvol_brick6 Brick8: storage-qe08.lab.eng.rdu2.redhat.com:/bricks/testvol_brick7 Brick9: storage-qe03.lab.eng.rdu2.redhat.com:/bricks/testvol_brick8 Brick10: storage-qe04.lab.eng.rdu2.redhat.com:/bricks/testvol_brick9 Brick11: storage-qe05.lab.eng.rdu2.redhat.com:/bricks/testvol_brick10 Brick12: storage-qe08.lab.eng.rdu2.redhat.com:/bricks/testvol_brick11 I talked to Saurabh and we may be seeing the same thing as: https://bugzilla.redhat.com/show_bug.cgi?id=961198 It looks like there is a patch attached to that BZ, could we get that patch included in the next downstream build to see if it resolves this issue as well?
@Ben, yes, the patch got into the 3.4.0.7rhs, can you please Verify if that build works for you?
I tested this last night on the 3.4.0.7 bits and I can confirm that the crash appears to be fixed but performance is really poor. Normally my bonnie++ runs take from 30-40 minutes on gigabit lan, the current run has been going for over 16 hours. I looked on all nodes this morning and I don't see any crashes. When I straced the PID of bonnie I saw that is was still working, just slow. I tried to quantify this with dd and I didn't see terribly bad performance: [root@storage-qe06 gluster-mount]# echo 3 > /proc/sys/vm/drop_caches; time `dd if=/dev/zero of=./myfile.test bs=1024k count=1000; sync` 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 18.7489 s, 55.9 MB/s real 0m19.050s user 0m0.001s sys 0m0.731s 1 GB in 19 seconds works out to about 53 MB/second which is not terribly bad. I thought maybe bonnie was using a smaller block size so I tried: [root@storage-qe06 gluster-mount]# echo 3 > /proc/sys/vm/drop_caches; time `dd if=/dev/zero of=./myfile.test bs=1024 count=1000000; sync` 1000000+0 records in 1000000+0 records out 1024000000 bytes (1.0 GB) copied, 18.9923 s, 53.9 MB/s real 0m19.191s user 0m0.072s sys 0m1.406s That is really close even with the small block size. I would think that with the time it was taking bonnie to finish single threaded I/O would be in the kilobytes. That doesn't seem to be the case here. Were there any other upstream fixes related to NFS performance that need to be plucked along with the patch in 3.4.0.7?
Ben, can we open another bug for perf? as it is an crash issue, can we mark the crash as VERIFIED? lets tackle the perf issue in different bug. (hence marking it ON_QA). anyways, for the performance, do 'gluster volume set <VOL> eager-locking disable' and try it once.
No problem Amar. The new BZ Saurabh opened and its BZ #962510. Marking this BZ as verified on 3.4.0.7.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html