Bug 1010416 - Errors in gluster/swift/common/DiskFile.py during catalyst run
Errors in gluster/swift/common/DiskFile.py during catalyst run
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Bug Updates Notification Mailing List
Sudhir D
Depends On:
  Show dependency treegraph
Reported: 2013-09-20 13:58 EDT by Nick Dokos
Modified: 2013-11-27 15:50 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-11-27 15:50:02 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Tarball of client logs and ERROR lines of server logs. (6.97 MB, application/x-gzip)
2013-09-20 13:58 EDT, Nick Dokos
no flags Details

  None (edit)
Description Nick Dokos 2013-09-20 13:58:24 EDT
Created attachment 800596 [details]
Tarball of client logs and ERROR lines of server logs.

Description of problem:

* Setup

Two clients (gprfc026, 028) running the catalyst workload: first PUT, then GET
4.78 million files, in groups of 10000, with the whole thing repeated three times.

Six servers (gprfs009-016 minus 013 and its partner 014: 013 kicked the bucket)
in standard configuration: 12 disks in RAID6, one gluster volume, two replicas.
All six are running the gluster server and client components and the swift services.

* Results

The clients were started on 2013/09/11 at 16:58 local time (UTC-5). They kicked
around for a few hours, getting errors during the first PUT run in each case:
run3 for gprfc026 and run1 from gprfc028 (the discrepancy in the file names was
a careless error on my part - no deep significance): see the
gprfc0{26,28}/gl-run{3,1}-PUT-{error,progress}.log files for the details.  Note
that these logs use UTC, so they show times of 21:58 and later.

The server logs (except for 009) show errors mostly clumped around two
different times: 17:58:10 and 18:05:34, except for 010 which got all its errors
at 17:31:14

| Server | number of errors |     time |
|    009 |                0 |          |
|    010 |                8 | 17:31:14 |
|    011 |                2 | 17:58:09 |
|    011 |                8 | 18:05:34 |
|    011 |                1 | 20:41:17 |
|    012 |               11 | 17:58:10 |
|    015 |                4 | 17:58:11 |
|    016 |                2 | 17:58:09 |
|    016 |                7 | 18:05:34 |

The same complaint in each case, except for the path - sometimes it is the directory
for the container, sometimes it's a file underneath:

Sep 11 18:05:34 gprfs016 object-server ERROR __call__ error with 
PUT /vol0/0/AUTH_vol0/gprfc026.pmcDef.pw24.omc1.ow24.dcs2MB.ncs2MB.daw02.cw02.d8192.mtu9k.untuned.clth64.tpf/run3/insightdemo12/docs/20111114_0/job33242/0001/3324200000050.htm : 
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/swift/obj/server.py", line 928, in __call__
    res = method(req)
  File "/usr/lib/python2.6/site-packages/swift/common/utils.py", line 1558, in wrapped
    return func(*a, **kw)
  File "/usr/lib/python2.6/site-packages/swift/common/utils.py", line 520, in _timing_stats
    resp = func(ctrl, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/gluster/swift/obj/server.py", line 63, in PUT
    return server.ObjectController.PUT(self, request)
  File "/usr/lib/python2.6/site-packages/swift/common/utils.py", line 1558, in wrapped
    return func(*a, **kw)
  File "/usr/lib/python2.6/site-packages/swift/common/utils.py", line 520, in _timing_stats
    resp = func(ctrl, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/swift/obj/server.py", line 647, in PUT
    with file.mkstemp() as fd:
  File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__
    return self.gen.next()
  File "/usr/lib/python2.6/site-packages/gluster/swift/common/DiskFile.py", line 755, in mkstemp
  File "/usr/lib/python2.6/site-packages/gluster/swift/common/DiskFile.py", line 430, in _create_dir_object
    ret, newmd = make_directory(cur_path, self.uid, self.gid, md)
  File "/usr/lib/python2.6/site-packages/gluster/swift/common/DiskFile.py", line 149, in _make_directory_unlocked
DiskFileError: _make_directory_unlocked: os.mkdir failed because path 
already exists, and a subsequent os.stat on that same path failed 
([Errno 2] No such file or directory: '/mnt/gluster-object/vol0/gprfc026.pmcDef.pw2

It's not clear to me why the path is truncated in the nested error message (the
last line of the example above.)

Somehow, the errors cleared up eventually and the runs that started around
18:58 (23:58 UTC) finished with no more errors of any kind.  The results were
consistent with what Peter Portante obtained earlier with these bits.

The client logs and the ERROR lines from the server logs are in the
attached tarball.

Version-Release number of selected component (if applicable):


How reproducible:

Unknown - I am about to start a run with more clients. If I see the same problem, I will update the BZ accordingly.

Steps to Reproduce:

Actual results:

Backtrace (see the description above)

Expected results:

No backtrace

Additional info:

Client logs and selections from the server logs in the attached tarball.
Comment 2 Nick Dokos 2013-11-27 15:50:02 EST
I have run catalyst now many times without ever seeing this problem again. I'll close this BZ and reopen it if/when I see the problem again.

Note You need to log in before you can comment on or make changes to this bug.