Bug 970224 - Under heavy load, Grid Engine array jobs fail; write permission
Under heavy load, Grid Engine array jobs fail; write permission
Product: GlusterFS
Classification: Community
Component: access-control (Show other bugs)
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Nagaprasad Sathyanarayana
Depends On:
  Show dependency treegraph
Reported: 2013-06-03 14:38 EDT by Harry Mangalam
Modified: 2016-02-17 19:19 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2014-12-14 14:40:31 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Harry Mangalam 2013-06-03 14:38:57 EDT
Description of problem:
We have a ~2500core academic cluster with saturating amounts of use.
The main data store is running on a 4 node/8brick/340TB/QDR IB gluster 3.3 Gluster filesystem.  All are 8xOpteron/32GB systems with 3ware 9750 SAS controllers. The servers are all running SL6.2 and are stable, with load running stably at about 2 continuously.

Version-Release number of selected component (if applicable):
Gluster 3.3
SciLinux 6.2 on servers, CentOS 6.3 on clients
"Son of GridEngine 8.1.3"

How reproducible:
Frequently under heavy load with array jobs that write and read small files frequently; otherwise no.  
Many of our users run large array jobs under SGE and especially during those
runs where there is LOTS of IO, we will VERY occasionally (~50 times since last
June, according to brick logs) see these kinds of errors, resulting in the
failure of that particular element of the array job.

> The error below being reported by Grid Engine says:
> user "root" 03/21/2013 15:29:23 [507:26777]: error: can't open output
> file
> "/gl/bio/krthornt/WTCCC/autosomal_analysis_Jan2013/1958BC/COMPUTE_1958BC.o2
> 54058.103": Permission denied 03/21/2013 15:29:23 [400:25458]: wait3

Looking thru all the server logs (/var/log/glusterfs/etc-glusterfs-
glusterd.vol.log), reveals nothing about this error, but the brick logs yield
this set of lines referencing that file at the correct time:

/var/log/glusterfs/bricks/raid1.log:[2013-03-21 15:43:18.667171] W [posix-
handle.c:461:posix_handle_hard] 0-gl-posix: link
analysis_Jan2013/1958BC/COMPUTE_1958BC.o254058.103 ->
/raid1/.glusterfs/5a/0e/5a0e87a6-e35d-4368-841e-b45802fecc4e failed (File

/var/log/glusterfs/bricks/raid1.log:[2013-03-21 15:43:18.667249] E
[posix.c:1730:posix_create] 0-gl-posix: setting gfid on
analysis_Jan2013/1958BC/COMPUTE_1958BC.o254058.103 failed

/var/log/glusterfs/bricks/raid1.log:[2013-03-21 15:43:19.241602] I [server3_1-
fops.c:1538:server_open_cbk] 0-gl-server: 644765: OPEN
mal_analysis_Jan2013/1958BC/COMPUTE_1958BC.o254058.103 (5a0e87a6-
e35d-4368-841e-b45802fecc4e) ==> -1 (Permission denied)

/var/log/glusterfs/bricks/raid1.log:[2013-03-21 15:43:19.520455] I [server3_1-
fops.c:1538:server_open_cbk] 0-gl-server: 644970: OPEN
mal_analysis_Jan2013/1958BC/COMPUTE_1958BC.o254058.103 (5a0e87a6-
e35d-4368-841e-b45802fecc4e) ==> -1 (Permission denied)

Steps to Reproduce:
Difficult to reproduce due to collective requirements of heavy loads (on server, on networks), specific workflow and read/write profile.  Wanted to file the bug so I can update it with more info as it becomes available.  I don't think gluster can do anything about this right now, unless a dev is working on a similar report that I can't find.


Actual results:

Failure of Grid Engine array job, interrupting ~400 jobs, each one of which must write and read 1000's of files.

Expected results:
10,000+ files per array job written and read successfully.  This is generally the case, but a few times a week the prerequisites of this failure are met and a large array job fails.

Additional info: Will try failing job on Fraunhofer Filesystem to see if it completes and if so, will re-run it on gluster to verify it fails.
Comment 1 shishir gowda 2013-06-04 05:46:50 EDT
Does the work load involve renames or links?

What is the type of volume? and can you please provide logs from the clients?
(if nfs mount, nfs server logs, and if fuse, client logs)

Also, can you please provide the exact version being used?
If possible, can you check with release 3.3.2.qa3 to see if the issue is fixed?
Comment 2 Harry Mangalam 2013-06-04 12:39:47 EDT
Re: comment 1:
- No renames or links in any of the cases I've checked.  They are std file creations.

- the volume type is:
Volume Name: gl
Type: Distribute
Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332
Status: Started
Number of Bricks: 8
Transport-type: tcp,rdma
Brick1: bs2:/raid1
Brick2: bs2:/raid2
Brick3: bs3:/raid1
Brick4: bs3:/raid2
Brick5: bs4:/raid1
Brick6: bs4:/raid2
Brick7: bs1:/raid1
Brick8: bs1:/raid2
Options Reconfigured:
performance.write-behind-window-size: 1024MB
performance.flush-behind: on
performance.cache-size: 268435456
nfs.disable: on
performance.io-cache: on
performance.quick-read: on
performance.io-thread-count: 64
auth.allow: 10.2.*.*,10.1.*.*

- the exact version of the clients is:
$ glusterfs --version
glusterfs 3.3.1 built on Oct 11 2012 21:49:36

The version on the servers is:
glusterfs 3.3.0 built on May 31 2012 11:16:28 

> If possible, can you check with release 3.3.2.qa3 to see if the issue is fixed?

That will take a bit more time, but I can start that process.

- the logs will take a bit of time to collect and post.  I'll post when they are ready.
Comment 3 Harry Mangalam 2013-06-04 14:43:54 EDT
The partial client logs (only the relevant bits) are at:  <http://pastie.org/8005913> 

compute-2-7 has an enormous number of error messages from a past errors (~4GB).  trying to figure that out now as well.

the entire logs (except from compute-2-7) are gzipped together at:
Comment 4 Harry Mangalam 2013-06-04 14:49:39 EDT
May not have been clear from previous notes (altho gluster options note it:
nfs.disable: on

We are only using the fuse mounts, not NFS.  Nevertheless, we're still getting NFS error messages associated with a lot of these errors.

[2013-05-23 17:50:43.632112] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-0: remote operation failed: Stale NFS file handle. Path: /cbcl/mengfant/Cmap/CMAP/LM1/run_train_svm_cmap.sh (be46f737-1fc5-48b3-9b75-8eb19f201272)
Comment 5 Harry Mangalam 2013-06-04 16:37:38 EDT
If you want the complete log fo compute-2-7,, it's here:

137MB -> 4GB uncompressed.
Comment 7 Niels de Vos 2014-11-27 09:54:27 EST
The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug.

If there has been no update before 9 December 2014, this bug will get automatocally closed.

Note You need to log in before you can comment on or make changes to this bug.