1250704 – Random errors when reading multiple files in parallel on disperse volume

Bug 1250704 - Random errors when reading multiple files in parallel on disperse volume

Summary: Random errors when reading multiple files in parallel on disperse volume

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	disperse
Sub Component:
Version:	3.7.3
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Pranith Kumar K
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-08-05 19:16 UTC by Eivind Sarto
Modified:	2016-01-27 07:18 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-01-27 07:18:21 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
/var/log/glusterfs/voltest.log after mount from client (11.28 KB, text/plain) 2015-08-19 16:43 UTC, Eivind Sarto	no flags	Details
client log from latest fio run (79.06 KB, text/plain) 2015-08-20 18:23 UTC, Eivind Sarto	no flags	Details
View All

Description Eivind Sarto 2015-08-05 19:16:03 UTC

Description of problem:
Reading multiple files in parallel from a disperse volume will get random read errors (EIO).  If I create a set of files and attempt to read each of the at the same time I get random read errors.  The error(s) will occur on different files and in different location each time a run the test.  These errors will happen with 4 or more parallel readers.

Tried to run test with and without direct-io.  Does not matter.
Tried to slow the rate of the reader threads.  See fewer errors, but still fails.
Test does not fail with only 1-3 reader threads.
Test does not fail with other types of volumes (distributed-replicated).


Version-Release number of selected component (if applicable):
3.7.3

How reproducible:


Steps to Reproduce:
1.  Create a disperse volume
# gluster vol info
Volume Name: volec
Type: Disperse
Volume ID: 1a849e84-a9c2-4a08-9950-ac948f6b3d8d
Status: Started
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: science03:/brick_hdd_0/gv
Brick2: science04:/brick_hdd_0/gv
Brick3: science05:/brick_hdd_0/gv
Brick4: science06:/brick_hdd_0/gv
Brick5: science07:/brick_hdd_0/gv
Brick6: science09:/brick_hdd_0/gv
Options Reconfigured:
performance.readdir-ahead: on

2. Mount volume from a client node (separate than gluster server nodes).
3. Create 10 * 64M files (test.0 .. test.10)
4. Run fio with following job file:
[global]
ioengine=sync
size=64m
rw=read
bs=4k
#rate_iops=100
directory=/volec
thread
group_reporting
numjobs=10
filename_format=test.$jobnum
[test]

Actual results:
fio: io_u error on file /volec/test.1: Input/output error: read offset=2609152, buflen=4096
fio: pid=4988, err=5/file:io_u.c:1575, func=io_u error, error=Input/output error
fio: io_u error on file /volec/test.3: Input/output error: read offset=3215360, buflen=4096
fio: pid=4986, err=5/file:io_u.c:1575, func=io_u error, error=Input/output error
fio: io_u error on file /volec/test.0: Input/output error: read offset=3657728, buflen=4096
fio: pid=4981, err=5/file:io_u.c:1575, func=io_u error, error=Input/output error
fio: io_u error on file /volec/test.6: Input/output error: read offset=20516864, buflen=4096
fio: pid=4983, err=5/file:io_u.c:1575, func=io_u error, error=Input/output error



Expected results:
No errors


Additional info:
See the following messages in the client log.

/var/log/glusterfs/volec.log:
[2015-08-05 18:19:36.250244] W [MSGID: 122053] [ec-common.c:166:ec_check_status] 0-volec-disperse-0: Operation failed on some subvolumes (up=3F, mask=2F, remaining=1, good=2E, bad=10)
[2015-08-05 18:19:36.252131] W [MSGID: 122053] [ec-common.c:166:ec_check_status] 0-volec-disperse-0: Operation failed on some subvolumes (up=3F, mask=2F, remaining=2, good=2D, bad=10)
[2015-08-05 18:19:36.253987] W [MSGID: 122053] [ec-common.c:166:ec_check_status] 0-volec-disperse-0: Operation failed on some subvolumes (up=3F, mask=2F, remaining=4, good=2B, bad=10)
[2015-08-05 18:19:36.257672] W [MSGID: 122053] [ec-common.c:166:ec_check_status] 0-volec-disperse-0: Operation failed on some subvolumes (up=3F, mask=2F, remaining=8, good=27, bad=10)
[2015-08-05 18:19:36.259767] W [MSGID: 122053] [ec-common.c:166:ec_check_status] 0-volec-disperse-0: Operation failed on some subvolumes (up=3F, mask=2F, remaining=8, good=27, bad=10)
[2015-08-05 18:19:36.265674] W [MSGID: 122002] [ec-common.c:122:ec_heal_report] 0-volec-disperse-0: Heal failed [Transport endpoint is not connected]
The message "W [MSGID: 122002] [ec-common.c:122:ec_heal_report] 0-volec-disperse-0: Heal failed [Transport endpoint is not connected]" repeated 6 times between [2015-08-05 18:19:36.265674] and [2015-08-05 18:19:36.294575]
[2015-08-05 18:19:36.295111] W [MSGID: 122035] [ec-common.c:462:ec_child_select] 0-volec-disperse-0: Executing operation with some subvolumes unavailable (11)
[2015-08-05 18:19:36.295680] W [MSGID: 122053] [ec-common.c:166:ec_check_status] 0-volec-disperse-0: Operation failed on some subvolumes (up=3F, mask=2E, remaining=0, good=2E, bad=11)
[2015-08-05 18:19:36.301504] W [MSGID: 122002] [ec-common.c:122:ec_heal_report] 0-volec-disperse-0: Heal failed [Transport endpoint is not connected]
[2015-08-05 18:19:36.328919] W [MSGID: 122035] [ec-common.c:462:ec_child_select] 0-volec-disperse-0: Executing operation with some subvolumes unavailable (10)
[2015-08-05 18:19:36.330502] W [MSGID: 122053] [ec-common.c:166:ec_check_status] 0-volec-disperse-0: Operation failed on some subvolumes (up=3F, mask=2F, remaining=8, good=27, bad=10)
The message "W [MSGID: 122002] [ec-common.c:122:ec_heal_report] 0-volec-disperse-0: Heal failed [Transport endpoint is not connected]" repeated 13 times between [2015-08-05 18:19:36.301504] and [2015-08-05 18:19:36.362407]

Comment 1 Eivind Sarto 2015-08-11 22:02:27 UTC

Some additional info.
I tried to run the same test using latest version of fio (which now supports libgfapi).  When setting 'ioengine=gfapi' in the fio job file, the random read errors no longer happen.  But, the errors still occur if I read through FUSE.

Comment 2 Xavi Hernandez 2015-08-19 10:21:23 UTC

I've tried to reproduce the problem but I couldn't. I've executed the test several times and I haven't seen any failure.

Your log entries show that self-heal is working. Is it possible you are experiencing some communication problems during the tests ?

Comment 3 Eivind Sarto 2015-08-19 14:37:44 UTC

I had not yet received my 10G switch, so I was doing some functionality testing using a 1G switch.  Each of the six nodes have a SSD as a brick, so I am probably saturating the 1G link from the client to the cluster.
Maybe that is the difference?
But, I cannot run the test a single time without seeing several read errors.
I also see similar errors if I run 10 * dd in parallel, reading different files.

I will be setting up a 10G network later this week.  I will then rerun the test.

Comment 4 Xavi Hernandez 2015-08-19 15:14:09 UTC

1 Gbit switch can limit the throughput, but not generate these problems unless it's also causing network errors.

I've also tried with 10 parallel dd's several times without any problem.

Could you repeat the same test locally ? (i.e. all bricks and clients on the same server). This will remove any possible network issue and will let us focus on the real problem.

Comment 5 Eivind Sarto 2015-08-19 16:13:30 UTC

I don't have enough drives/slots to run client and all bricks on same node.

But, I do not see any errors if I run same test on a 6-node distribute or distribute-replicate volume.  I only get errors with a disperse volume.  So I don't think it is a network problem.

All nodes (including client) is running same OS:
  # cat /etc/redhat-release
  CentOS Linux release 7.1.1503 (Core)
And the gluster rpms are from the gluster-epel.repo

I'll dig a bit deeper today and see if I can find anything unusual and update this bug with anything I find.

Comment 6 Eivind Sarto 2015-08-19 16:43:35 UTC

Created attachment 1064929 [details]
/var/log/glusterfs/voltest.log after mount from client

Comment 7 Eivind Sarto 2015-08-19 16:46:01 UTC

I attached the client log file right after I mounted the EC volume.  The log complains about "Server and Client lk-version numbers are not same".  I don't know why this is happening, or if it is relevant.

Comment 8 Xavi Hernandez 2015-08-20 08:36:36 UTC

I don't see any suspicious activity in the log. That message is normal.

I'm still unable to reproduce the problem. Could you try to repeat the test after disabling some features ? This might help to identify the problem:

gluster volume set <volname> performance.quick-read off
gluster volume set <volname> performance.io-cache off
gluster volume set <volname> performance.write-behind off
gluster volume set <volname> performance.stat-prefetch off
gluster volume set <volname> performance.read-ahead off
gluster volume set <volname> performance.readdir-ahead off
gluster volume set <volname> performance.open-behind off

Also, if the problem persists, could you attach the full log ?

Thanks

Comment 9 Eivind Sarto 2015-08-20 18:03:19 UTC

I am still getting read errors after turning performance parameters off.
Could there be a bug in CentOS 7.1 FUSE?
I do not see any read errors when I use fio-ioengine=gfapi.
I do not see any errors reading through FUSE with dispersed or replicated-dispersed volumes.

I will attach the full log from the client.  It shows the read errors.

[root@science03 ~]# gluster volume info voltest
Volume Name: voltest
Type: Disperse
Volume ID: 4b04975b-b667-413a-a4d8-d39bbe5bbb3e
Status: Started
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: science03:/brick_ssd_1/gv
Brick2: science04:/brick_ssd_1/gv
Brick3: science05:/brick_ssd_1/gv
Brick4: science06:/brick_ssd_1/gv
Brick5: science07:/brick_ssd_1/gv
Brick6: science09:/brick_ssd_1/gv
Options Reconfigured:
performance.open-behind: off
performance.read-ahead: off
performance.stat-prefetch: off
performance.write-behind: off
performance.io-cache: off
performance.quick-read: off
performance.readdir-ahead: off

[root@science08 ~]# fio job.test
test: (g=0): rw=read, bs=64K-64K/64K-64K/64K-64K, ioengine=sync, iodepth=1
...
fio-2.2.8
Starting 8 threads
fio: io_u error on file /voltest/test.0: Input/output error: read offset=47575040, buflen=4096
fio: pid=18783, err=5/file:io_u.c:1575, func=io_u error, error=Input/output error
fio: io_u error on file /voltest/test.1: Input/output error: read offset=987037696, buflen=65536
fio: pid=18790, err=5/file:io_u.c:1575, func=io_u error, error=Input/output error
Jobs: 3 (f=3): [X(1),_(1),R(2),_(2),R(1),X(1)] [96.6% done] [81600KB/0KB/0KB /s] [1275/0/0 iopsJobs: 3 (f=3): [X(1),_(1),R(2),_(2),R(1),X(1)] [98.9% done] [66176KB/0KB/0KB /s] [1034/0/0 iopsJobs: 1 (f=1): [X(1),_(1),R(1),_(4),X(1)] [100.0% done] [41280KB/0KB/0KB /s] [645/0/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=8): err= 5 (file:io_u.c:1575, func=io_u error, error=Input/output error): pid=18783: Thu Aug 20 10:51:19 2015
  read : io=7130.6MB, bw=83996KB/s, iops=1312, runt= 86929msec
    clat (usec): min=6, max=892821, avg=5231.39, stdev=21733.34
     lat (usec): min=6, max=892821, avg=5231.54, stdev=21733.36
    clat percentiles (usec):
     |  1.00th=[    8],  5.00th=[   10], 10.00th=[   11], 20.00th=[   12],
     | 30.00th=[   13], 40.00th=[   15], 50.00th=[   16], 60.00th=[   22],
     | 70.00th=[   80], 80.00th=[ 8384], 90.00th=[12480], 95.00th=[22912],
     | 99.00th=[53504], 99.50th=[150528], 99.90th=[317440], 99.95th=[399360],
     | 99.99th=[561152]
    bw (KB  /s): min=  101, max=29401, per=14.96%, avg=12566.91, stdev=6000.23
    lat (usec) : 10=4.11%, 20=54.33%, 50=4.02%, 100=11.25%, 250=0.92%
    lat (usec) : 1000=0.01%
    lat (msec) : 2=0.01%, 4=0.11%, 10=9.86%, 20=9.73%, 50=4.50%
    lat (msec) : 100=0.59%, 250=0.38%, 500=0.16%, 750=0.02%, 1000=0.01%
  cpu          : usr=0.04%, sys=0.65%, ctx=35928, majf=0, minf=145
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=114102/w=0/d=0, short=r=11/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=7130.6MB, aggrb=83996KB/s, minb=83996KB/s, maxb=83996KB/s, mint=86929msec, maxt=86929msec

[root@sccience08 ~]# cat job.test
[global]
#direct=1
ioengine=sync
size=1024m
rw=read
bs=64k
directory=/voltest
thread
group_reporting
numjobs=8
filename_format=test.$jobnum
[test]

Comment 10 Eivind Sarto 2015-08-20 18:23:26 UTC

Created attachment 1065356 [details]
client log from latest fio run

The log was > 20MB so I only attached first part of it.  It clearly shows errors.  Errors begin right after first occurrence of "XDR decoding failed".  That doesn't sound like it should ever happen.

Comment 11 Xavi Hernandez 2015-08-20 18:48:09 UTC

That's interesting. Definitely these XDR errors are the cause of the I/O errors you are seeing. Your volume can support up to 2 brick failures. Each XDR error causes that ec considers the corresponding brick as bad or inconsistent, ignoring it for future requests.

As it can be seen in the logs, the two first XDR errors do not cause any failure. However the third error causes the read to return I/O error.

To solve this situation, self-heal is started to repair the brick or bricks that are reporting errors. However if bricks fail faster than what self-heal can repair, and more than 2 bricks are considered bad at the same time, ec cannot return any valid data.

This XDR problem is very weird. Do you see anything relevant in the brick logs ?

Can you double check that all bricks and clients are using the same exact version and that there aren't more than one installed copy of the executables (for example one installed from a repository and another compiled by hand).

Comment 12 Eivind Sarto 2015-08-20 19:03:02 UTC

I see following error in all the server logs:

# grep " E " /var/log/glusterfs/bricks/brick_ssd_1-gv.log
[2015-08-20 18:35:33.820916] E [MSGID: 121050] [ctr-helper.c:256:extract_ctr_options] 0-gfdbdatastore: CTR Xlator is disabled.
[2015-08-20 18:35:35.330438] E [dict.c:1418:dict_copy_with_ref] (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x48) [0x7fed5690dc78] -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x74) [0x7fed5690cff4] -->/lib64/libglusterfs.so.0(dict_copy_with_ref+0x9c) [0x7fed6b292fdc] ) 0-dict: invalid argument: dict [Invalid argument]

If I run the fio job with numjobs=1 or 2, I do not get any errors.
Is it possible that there is some kind of race in disperse xlator?

I installed gluster from the LATEST repo.  Never compiled sources.

Comment 13 Xavi Hernandez 2015-08-21 08:09:36 UTC

Even with those errors on the bricks, that doesn't explain why the answer sent by the server cannot be decoded on the client.

Even if disperse sends bad data, the answer should be valid (i.e. correctly decoded) and contain an error. The XDR error means that the bricks are sending garbage or the answer is getting corrupted in the middle.

Can you check if ethernet inteface reports any errors ? for example with an ifconfig command.

It would really be interesting to do the same test without using network. If you have at least one ssd with several GB, you can create a local volume with multiple bricks on the same ssd using this command:

gluster volume create voltest disperse 6 redundancy 2 science03:/brick_ssd_1/gv{1..6} force

After that, you can mount the volume on the same server and repeat the test locally. A test with 10 files of 1GB each in this configuration will need 15 GB of storage. I hope you have this space available on your ssd.

Comment 14 Eivind Sarto 2015-08-21 15:55:43 UTC

There are no network errors.  But, even if there were errors it should not have mattered since it is TCP - bad frames will be dropped and re-transmitted.
And I can run same test all day long with 6-node disperse volume without any errors.

But, I will try the local test anyway.

Comment 15 Eivind Sarto 2015-08-21 18:51:41 UTC

I have 2 SSDs on a single node. I created 3 bricks on each SSD and a 6-brick disperse volume with redundancy=2.  when I mount this volume locally and run same test, I cannot reproduce the problem.  I don't think corrupted network packets was the cause of the previous failures.  It has to be something else.

My test servers need to be re-installed with a different Linux distro.  When it comes back up, I will install gluster 3.7.3 and try again.
If there errors go away, we can probably close this bug or mark it unreproducible.

Comment 16 Xavi Hernandez 2015-08-22 17:08:16 UTC

I'm not saying that the cause is a lost packet or a corrupted frame. As you say, a TCP connection should be immune to these kind of errors (to some degree). However the error message in the client log says that something bad has been received. This can be a bug or not (a hardware problem for example). That's the reason why I asked you to test it locally. This can allow us to focus on the real problem and identify it faster.

Comment 17 Eivind Sarto 2015-08-24 23:41:41 UTC

My test system was re-installed with Ubuntu 15.04 (vivid) and I installed gluster 3.7.3 from the ubuntu PPA.
I only have access to 4 server, each with 2 SSDs.  I tried the following EC configurations: 6+2, 4+2, 4+1, 3+1, while using one of the server as a client.

All configurations failed with same XDR message when running same parallel read test.
There are no ethernet errors.
Other types of gluster volumes (dispersed, replicated, etc) do not exhibit any errors, so I don't think we can blame it on the hardware (Supermicro X10DRU-i+)

Comment 18 Xavi Hernandez 2015-08-25 08:01:23 UTC

Does it also happen with a 2+1 (total 3 bricks) configuration ? I want to identify the smallest configuration that has the problem to reduce noise.

The number of parallel reads that cause the problem varies with the configuration ?

If possible, it would also be interesting to test some configurations:

* All bricks and clients on same server (this seems to work from a previous test)
* All bricks on same server and the client in another one
* Bricks on different servers, the client on one of them
* Bricks on different servers, the client in another one

Seeing which configurations fail and what bricks fail on each configuration could give us additional information.

Meanwhile I'll try to do some more tests to see if I'm able to reproduce it.

Comment 19 Eivind Sarto 2015-08-25 15:17:41 UTC

Already tried some of these tests before.
* Bricks on different servers, the client on one of them: Fails
* Bricks on different servers, the client in another one: Fails

* The number of parallel reads that cause the problem varies with the configuration ?

I have seen the test fail with as few as 3 readers.  It gets harder to reproduce with fewer readers.  So, it kind of looks like some sort of race.

I will try the the different solutions you suggested, and I will also attempt to add some debug to the code to try an isolate exactly where it fails.

Comment 20 Eivind Sarto 2015-08-25 18:07:46 UTC

I can reproduce the problem with a (2+1) configuration.  It will fail when the client is on a separate server and when the client is on one of the brick servers.  It also appears that it is slightly easier to reproduce when the client is on a separate server.
On a (2+1) I could reproduce failure with as few as 4 readers when the client is on a separate server.  When the client is on a brick server, I can reproduce with 5 readers.
For bigger volumes (more servers/bricks) I have seen failures with as few as 3 readers.
General trend: More bricks/servers cause more failures.  More readers cause more failures.

Smells like a race.

Comment 21 Eivind Sarto 2015-08-25 22:33:43 UTC

I set all log-levels to INFO.
# gluster vol get voltest all |grep log-level
diagnostics.brick-log-level             INFO
diagnostics.client-log-level            INFO
diagnostics.brick-sys-log-level         INFO
diagnostics.client-sys-log-level        INFO

There are no messages in the brick logs after the client detects the bad XDR.

I also added some debug to the client side.
In client-rpc-fops.c:client3_3_readv_cbk(), the gfs3_read_rsp has the following values when XDR return error: op_ret=65536, op_error=0.
Note: I poisoned these two values to 99 before calling xdr_to_generic().

Maybe some memory is being used after free?  On either client or server.

Comment 22 Backer 2015-08-30 11:40:01 UTC

Following message in the rebalancer log. 

Rebalancer puts the following message continuously. Heal info doesn't show any corrupted file in disperse-7 and disperse-18 subvolumes. All bricks are active and live.

Multiple parallel writes are happening during re-balancing.

[2015-08-30 10:50:34.371308] I [MSGID: 122058] [ec-heal.c:2259:ec_heal_do] 0-glustertest-disperse-7: /Packages/Features/MPEG/F/FORCED-TO-FIGHT_FTR_S_HN-XX_IN-XX_51_HD_RIC_OV/FORCED-TO-FIGHT_FTR-V2_S_HN-XX_IN-XX_51_HD_20140203_RIC_OV/FORCED-TO-FIGHT_HINDI_WRAP_MPEG_31012014-reel-1-mpeg2.mxf: name heal successful on 3FF
[2015-08-30 10:50:34.372792] W [MSGID: 122002] [ec-common.c:122:ec_heal_report] 0-glustertest-disperse-18: Heal failed [Transport endpoint is not connected]
[2015-08-30 10:50:34.376271] W [MSGID: 122002] [ec-common.c:122:ec_heal_report] 0-glustertest-disperse-18: Heal failed [Transport endpoint is not connected]
[2015-08-30 10:50:34.378274] W [MSGID: 122002] [ec-common.c:122:ec_heal_report] 0-glustertest-disperse-7: Heal failed [Transport endpoint is not connected]
[2015-08-30 10:50:34.378311] W [MSGID: 122035] [ec-common.c:462:ec_child_select] 0-glustertest-disperse-18: Executing operation with some subvolumes unavailable (20)
[2015-08-30 10:50:34.378827] W [MSGID: 122053] [ec-common.c:166:ec_check_status] 0-glustertest-disperse-18: Operation failed on some subvolumes (up=3FF, mask=3DF, remaining=1, good=3DE, bad=20)
[2015-08-30 10:50:34.381450] I [MSGID: 122058] [ec-heal.c:2259:ec_heal_do] 0-glustertest-disperse-18: /Packages/Features/MPEG/F/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_RIM_OV/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_20121022_RIM_OV/FOR-LOVES-SAKE-ENC-ENC-reel-1-mpeg2.mxf: name heal successful on 3FF
[2015-08-30 10:50:34.382036] W [MSGID: 122002] [ec-common.c:122:ec_heal_report] 0-glustertest-disperse-7: Heal failed [Transport endpoint is not connected]
[2015-08-30 10:50:34.383412] W [MSGID: 122035] [ec-common.c:462:ec_child_select] 0-glustertest-disperse-7: Executing operation with some subvolumes unavailable (20)
[2015-08-30 10:50:34.384032] W [MSGID: 122053] [ec-common.c:166:ec_check_status] 0-glustertest-disperse-7: Operation failed on some subvolumes (up=3FF, mask=3DF, remaining=80, good=35F, bad=20)


Heal info for disperse-7

Brick glustertestdn001:/media/disk8/brick8/
Number of entries: 0

Brick glustertestdn002:/media/disk8/brick8/
Number of entries: 0

Brick glustertestdn003:/media/disk8/brick8/
Number of entries: 0

Brick glustertestdn004:/media/disk8/brick8/
Number of entries: 0

Brick glustertestdn005:/media/disk8/brick8/
Number of entries: 0

Brick glustertestdn006:/media/disk8/brick8/
Number of entries: 0

Brick glustertestdn007:/media/disk8/brick8/
Number of entries: 0

Brick glustertestdn008:/media/disk8/brick8/
Number of entries: 0

Brick glustertestdn009:/media/disk8/brick8/
Number of entries: 0

Brick glustertestdn010:/media/disk8/brick8/
Number of entries: 0


Heal info for disperse-19


Brick glustertestdn001:/media/disk19/brick19/
Number of entries: 0

Brick glustertestdn002:/media/disk19/brick19/
Number of entries: 0

Brick glustertestdn003:/media/disk19/brick19/
Number of entries: 0

Brick glustertestdn004:/media/disk19/brick19/
Number of entries: 0

Brick glustertestdn005:/media/disk19/brick19/
Number of entries: 0

Brick glustertestdn006:/media/disk19/brick19/
Number of entries: 0

Brick glustertestdn007:/media/disk19/brick19/
Number of entries: 0

Brick glustertestdn008:/media/disk19/brick19/
Number of entries: 0

Brick glustertestdn009:/media/disk19/brick19/
Number of entries: 0

Brick glustertestdn010:/media/disk19/brick19/
Number of entries: 0

Comment 23 Backer 2015-08-30 13:40:05 UTC

After deleting the disperse-18 corrupted file manually to continue re-balance for other files,  rebalancer is still checking the same file. Rebalancer puts the following messages continuously. Rebalancer log shows "0-glustertest-disperse-18: /Packages/Features/MPEG/F/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_RIM_OV/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_20121022_RIM_OV/FOR-LOVES-SAKE-ENC-ENC-reel-1-mpeg2.mxf: name heal successful on 3FF" though we deleted the file. We can't understand this info.


[2015-08-30 13:21:10.366911] W [MSGID: 122002] [ec-common.c:122:ec_heal_report] 0-glustertest-disperse-18: Heal failed [Transport endpoint is not connected]
[2015-08-30 13:21:10.368576] W [MSGID: 122035] [ec-common.c:462:ec_child_select] 0-glustertest-disperse-18: Executing operation with some subvolumes unavailable (20)
[2015-08-30 13:21:10.369243] W [MSGID: 122053] [ec-common.c:166:ec_check_status] 0-glustertest-disperse-18: Operation failed on some subvolumes (up=3FF, mask=3DF, remaining=80, good=35F, bad=20)
[2015-08-30 13:21:10.371006] I [MSGID: 122058] [ec-heal.c:2259:ec_heal_do] 0-glustertest-disperse-18: /Packages/Features/MPEG/F/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_RIM_OV/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_20121022_RIM_OV/FOR-LOVES-SAKE-ENC-ENC-reel-1-mpeg2.mxf: name heal successful on 3FF
[2015-08-30 13:21:10.372338] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-glustertest-client-180: remote operation failed. Path: /Packages/Features/MPEG/F/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_RIM_OV/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_20121022_RIM_OV/FOR-LOVES-SAKE-ENC-ENC-reel-1-mpeg2.mxf (e8991656-65a0-4d8c-bdd3-fad0f4153ac4) [No such file or directory]
[2015-08-30 13:21:10.372476] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-glustertest-client-189: remote operation failed. Path: /Packages/Features/MPEG/F/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_RIM_OV/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_20121022_RIM_OV/FOR-LOVES-SAKE-ENC-ENC-reel-1-mpeg2.mxf (e8991656-65a0-4d8c-bdd3-fad0f4153ac4) [No such file or directory]
[2015-08-30 13:21:10.372525] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-glustertest-client-184: remote operation failed. Path: /Packages/Features/MPEG/F/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_RIM_OV/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_20121022_RIM_OV/FOR-LOVES-SAKE-ENC-ENC-reel-1-mpeg2.mxf (e8991656-65a0-4d8c-bdd3-fad0f4153ac4) [No such file or directory]
[2015-08-30 13:21:10.372524] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-glustertest-client-183: remote operation failed. Path: /Packages/Features/MPEG/F/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_RIM_OV/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_20121022_RIM_OV/FOR-LOVES-SAKE-ENC-ENC-reel-1-mpeg2.mxf (e8991656-65a0-4d8c-bdd3-fad0f4153ac4) [No such file or directory]
[2015-08-30 13:21:10.372588] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-glustertest-client-181: remote operation failed. Path: /Packages/Features/MPEG/F/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_RIM_OV/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_20121022_RIM_OV/FOR-LOVES-SAKE-ENC-ENC-reel-1-mpeg2.mxf (e8991656-65a0-4d8c-bdd3-fad0f4153ac4) [No such file or directory]
[2015-08-30 13:21:10.372621] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-glustertest-client-186: remote operation failed. Path: /Packages/Features/MPEG/F/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_RIM_OV/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_20121022_RIM_OV/FOR-LOVES-SAKE-ENC-ENC-reel-1-mpeg2.mxf (e8991656-65a0-4d8c-bdd3-fad0f4153ac4) [No such file or directory]
[2015-08-30 13:21:10.372622] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-glustertest-client-182: remote operation failed. Path: /Packages/Features/MPEG/F/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_RIM_OV/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_20121022_RIM_OV/FOR-LOVES-SAKE-ENC-ENC-reel-1-mpeg2.mxf (e8991656-65a0-4d8c-bdd3-fad0f4153ac4) [No such file or directory]
[2015-08-30 13:21:10.372652] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-glustertest-client-188: remote operation failed. Path: /Packages/Features/MPEG/F/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_RIM_OV/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_20121022_RIM_OV/FOR-LOVES-SAKE-ENC-ENC-reel-1-mpeg2.mxf (e8991656-65a0-4d8c-bdd3-fad0f4153ac4) [No such file or directory]
[2015-08-30 13:21:10.372644] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-glustertest-client-185: remote operation failed. Path: /Packages/Features/MPEG/F/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_RIM_OV/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_20121022_RIM_OV/FOR-LOVES-SAKE-ENC-ENC-reel-1-mpeg2.mxf (e8991656-65a0-4d8c-bdd3-fad0f4153ac4) [No such file or directory]
[2015-08-30 13:21:10.372696] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-glustertest-client-187: remote operation failed. Path: /Packages/Features/MPEG/F/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_RIM_OV/FOR-LOVES-SAKE_FTR_S_JAP-EN_IN-XX_20_HD_20121022_RIM_OV/FOR-LOVES-SAKE-ENC-ENC-reel-1-mpeg2.mxf (e8991656-65a0-4d8c-bdd3-fad0f4153ac4) [No such file or directory]
[2015-08-30 13:21:10.373995] W [MSGID: 122002] [ec-common.c:122:ec_heal_report] 0-glustertest-disperse-18: Heal failed [Transport endpoint is not connected]

Comment 24 Backer 2015-08-31 08:58:24 UTC

Hi xavier,

We are not able to read any files from disperse-0 to disperse-35 subvolumes and it hangs.
But we are able to read files from dipserse-36 to disperse-71 subvolumes.

About cluster setup

Distributed dispersed volume has been created across nodes. Each node node has 36 drives and disperse count is 8+2. So disperse-0 to disperse-35 has been created between node1 and node10 and disperse-36 to disperse-71 has been created between node11 to node20. Node1 to 10 hard-disks are almost full. So we are trying to re-balance the data between disperse set 0-35 to disperse 36 to 71.

when we tried to read a file from a client, it put the following messages in client log.

[2015-08-31 07:07:41.250555] I [MSGID: 109045] [dht-common.c:1665:dht_lookup_everywhere_cbk] 0-glustertest-dht: attempting deletion of stale linkfile /Packages/Features/DCI/J/JAATS-IN-GOLMAAL_FTR_S_PUN-en_CAN-XX_51_2K_OC_RIM_OV/JAATS-IN-GOLMAAL_FTR_S_PUN-en_CAN-XX_51_2K_OC_20130518_RIM_OV/JAAT-IN-GOLMAAL_R5-6-7-8-9_CSC_J2K-reel-5-jp2k.mxf on glustertest-disperse-34 (hashed subvol is glustertest-disperse-14)
[2015-08-31 07:07:41.251942] I [MSGID: 109069] [dht-common.c:990:dht_lookup_unlink_cbk] 0-glustertest-dht: lookup_unlink returned with op_ret -> 0 and op-errno -> 0 for /Packages/Features/DCI/J/JAATS-IN-GOLMAAL_FTR_S_PUN-en_CAN-XX_51_2K_OC_RIM_OV/JAATS-IN-GOLMAAL_FTR_S_PUN-en_CAN-XX_51_2K_OC_20130518_RIM_OV/JAAT-IN-GOLMAAL_R5-6-7-8-9_CSC_J2K-reel-5-jp2k.mxf

Comment 25 Xavi Hernandez 2015-09-04 16:57:56 UTC

Right now I'm unable to do any extensive test till october, sorry.

It seems weird that some bricks are totally unavailable. Have you enabled insecure ports in glusterd and volume ?

I suppose that this rebalance test has nothing to do with the multiple parallel reads you wrote at the beginning, right ?

Comment 26 Pranith Kumar K 2015-09-07 04:40:02 UTC

Xavi,
      I can take a look at this one while you are away.

Backer,
     considering we both are in India timezone, do you mind coming to #gluster today? Lets try to at least come up with steps which will lead to this problem, so that it will be fixed real fast. I have some meetings today. If you let me know the time, I can try to move some meetings to talk about this issue.

Pranith

Comment 27 Pranith Kumar K 2016-01-27 07:18:21 UTC

Closing this bug based on discussions with Backer.

Hi Pranith,

Currently, we are not facing this issue. If we face the issue, i will let you know or open a new ticket.

Regards,
Backer

On Tue, Dec 29, 2015 at 2:13 PM, Pranith Kumar Karampuri <pkarampu> wrote:

    hi,
        Are you still experiencing this issue? I am wondering what is a good next step for this bug. Please let me know how to proceed on this issue.

Pranith

Note You need to log in before you can comment on or make changes to this bug.