Description of problem: I/O errors seen while doing equivalent of ls -lR in parallel on multiple directory trees in a Gluster volume. Version-Release number of selected component (if applicable): RHS 3.1 - glusterfs-3.7dev-0.1017.git7fb85e3.el6.x86_64 RHEL6.6 - kernel-2.6.32-504.3.2.el6.x86_64 How reproducible: not reproducible at will, seems like it's triggered by doing a parallel readdir on all clients. I suspect it's a race condition exposed in erasure coding by multi-thread epoll in some way. Can lower client.event-threads to 1 and see if it still happens. For an example: http://pbench.perf.lab.eng.bos.redhat.com/results/gprfc032-10ge/glfs37-ec-100kfpt-2015-04-25/256-KB--fpt-1024-fpd-2-thr-op-ls-l-2015-04-25-12-47/r-2015-04-25-12-47.log Steps to Reproduce: 1. create & mount Gluster 3.7 erasure coded volume on 6 hosts with 12 bricks/host 2. use below smallfile command to create per-thread directory trees with 1000 files/dir 3. use smallfile command to do equivalent of "ls -lR" on the directory trees Actual results: EIO returned by some threads Expected results: No EIO errors Additional info: A ton of these messages in the client mount logs: [2015-04-26 13:52:31.504545] W [ec-common.c:162:ec_check_status] 0-ec42-disperse-0: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=2F, bad=10) here's the command that produces the error: [root@gprfc032-10ge smallfile-master]# /root/smallfile-master/smallfile_cli.py --host-set ../svrs.list --top /mnt/ec42/smf --files 102400 --files-per-dir 1024 --file-size 256 --threads 1 --operation ls-l smallfile version 2.3 hosts in test : ['gprfs022', 'gprfs024', 'gprfs037', 'gprfs038', 'gprfs039', 'gprfs040'] top test directory(s) : ['/mnt/ec42/smf'] operation : ls-l files/thread : 102400 threads : 1 record size (KB, 0 = maximum) : 0 file size (KB) : 256 file size distribution : fixed files per dir : 1024 dirs per dir : 10 threads share directories? : N filename prefix : filename suffix : hash file number into dir.? : N fsync after modify? : N pause between files (microsec) : 0 finish all requests? : Y stonewall? : Y measure response times? : N verify read? : Y verbose? : False log to stderr? : False ext.attr.size : 0 ext.attr.count : 0 permute host directories? : N remote program directory : /root/smallfile-master network thread sync. dir. : /mnt/ec42/smf/network_shared starting all threads by creating starting gate file /mnt/ec42/smf/network_shared/starting_gate.tmp host = gprfs022, thread = 00, elapsed sec. = 82.641262, total files = 97314, total_records = 0, status = ok host = gprfs024, thread = 00, elapsed sec. = -1430055080.684132, total files = 0, total_records = 0, status = ERR: Input/output error host = gprfs037, thread = 00, elapsed sec. = 80.640801, total files = 73794, total_records = 0, status = ok host = gprfs038, thread = 00, elapsed sec. = 78.833245, total files = 102400, total_records = 0, status = ok host = gprfs039, thread = 00, elapsed sec. = 77.121790, total files = 96334, total_records = 0, status = ok host = gprfs040, thread = 00, elapsed sec. = 81.461536, total files = 78890, total_records = 0, status = ok total threads = 6 total files = 448732 73.04% of requested files processed, minimum is 70.00 82.641262 sec elapsed time 5429.878370 files/sec here's what happened in the thread that saw the error: [root@gprfs024 ~]# tail /var/tmp/invoke_logs-00.log 00 2015-04-26 09:31:20,681 - INFO - do_workload: opname=ls-l iterations=102400 top_dirs=['/mnt/ec42/smf'] src_dirs=['/mnt/ec42/smf/file_srcdir/gprfs024/thrd_00'] dest_dirs=['/mnt/ec42/smf/file_dstdir/gprfs024/thrd_00'] network_dir=/mnt/ec42/smf/network_shared shared=False record_sz_kb=0 total_sz_kb=256 filesize_distr=-1 files_per_dir=1024 dirs_per_dir=10 dirs_on_demand=False xattr_size=0 xattr_count=0 starting_gate=/mnt/ec42/smf/network_shared/starting_gate.tmp prefix= suffix= hash_to_dir=False fsync=False stonewall=True files_between_checks=20 verify_read=True incompressible=False finish_all_rq=True rsp_times=False tid=00 loglevel=20 filenum=0 filenum_final=0 rq=0 rq_final=0 start=None end=None elapsed=0 host=gprfs024 status=1 abort=False log_to_stderr=False verbose=False 00 2015-04-26 09:31:23,554 - ERROR - [Errno 5] Input/output error: '/mnt/ec42/smf/file_srcdir/gprfs024/thrd_00/d_000' Traceback (most recent call last): File "/root/smallfile-master/smallfile.py", line 1285, in do_workload func(self) # call the do_ function for that workload type File "/root/smallfile-master/smallfile.py", line 1048, in do_ls_l dir_contents.extend(os.listdir(next_dir)) OSError: [Errno 5] Input/output error: '/mnt/ec42/smf/file_srcdir/gprfs024/thrd_00/d_000' 00 2015-04-26 09:31:25,280 - ERROR - invocation did not complete cleanly 00 2015-04-26 09:31:25,280 - INFO - recorded throughput after 1 files And here's the log messages in the client log confirming that the error was in Gluster not smallfile [root@gprfs039 ~]# tail -f /var/log/glusterfs/mnt-ec42.log | grep -v ' I ' [2015-04-26 13:24:05.804919] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-10: Mismatching xdata in answers of 'LOOKUP' [2015-04-26 13:24:05.805044] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-10: Mismatching xdata in answers of 'LOOKUP' [2015-04-26 13:24:05.805192] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-11: Mismatching xdata in answers of 'LOOKUP' [2015-04-26 13:24:05.805423] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-10: Mismatching xdata in answers of 'LOOKUP' [2015-04-26 13:24:05.805568] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-11: Mismatching xdata in answers of 'LOOKUP' [2015-04-26 13:24:05.805625] W [ec-common.c:162:ec_check_status] 0-ec42-disperse-11: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=1D, bad=22) [2015-04-26 13:24:05.809082] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-2: Mismatching xdata in answers of 'LOOKUP' [2015-04-26 13:24:05.809198] W [MSGID: 109005] [dht-selfheal.c:1771:dht_selfheal_directory] 0-ec42-dht: Directory selfheal failed : 3 subvolumes have unrecoverable errors. path = /smf/file_srcdir/gprfs039/thrd_01/d_001/d_000/d_000, gfid = e7c4c580-11bf-47c1-9f23-e2cb92e6cee4 and I did see this: The message "I [MSGID: 109018] [dht-common.c:772:dht_revalidate_cbk] 0-ec42-dht: Mismatching layouts for /smf/file_srcdir/gprfs039/thrd_00/d_001/d_000/d_000, gfid = 00000000-0000-0000-0000-000000000000" repeated 2 times between [2015-04-26 13:32:41.082975] and [2015-04-26 13:32:41.084756] tail [root@gprfs024 ~]# tail -f /var/log/glusterfs/mnt-ec42.log | grep -v ' I ' [2015-04-26 13:39:47.077979] W [fuse-bridge.c:484:fuse_entry_cbk] 0-glusterfs-fuse: 11298875: LOOKUP() /smf/file_srcdir/gprfs024/thrd_00/d_001/d_000 => -1 (Input/output error)
This bug is already fixed on glusterfs-3.7.1-1. Moving it to ON_QA
verified this on 3.7.1-7 and didn't see the issue. Marking this as fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html