Bug 1144428 - dist-geo-rep: Session going into faulty with "Can no allocate memory" backtrace when pause, rename and resume is performed
Summary: dist-geo-rep: Session going into faulty with "Can no allocate memory" backtra...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: RHGS 3.0.3
Assignee: Kotresh HR
QA Contact: M S Vishwanath Bhat
URL:
Whiteboard:
Depends On:
Blocks: 1146823 1147422 1159190 1162694
TreeView+ depends on / blocked
 
Reported: 2014-09-19 11:13 UTC by M S Vishwanath Bhat
Modified: 2016-06-01 01:56 UTC (History)
12 users (show)

Fixed In Version: glusterfs-3.6.0.31-1.el6rhs
Doc Type: Bug Fix
Doc Text:
Previously, while geo-replication synchronizes directory renames, File's blob was sent for directory entry creation to gfid-access translator resulting "Invalid blob length" marked as ENOMEM and geo-replication went faulty with "Cannot allocate memory" backtrace. With this fix, during renames, if source is not present on slaves, direct entry creation on slave is done only for files and not for directories and geo-replication can successfully synchronizes rename of directories to slave without ENOMEM backtrace.
Clone Of:
: 1146823 (view as bug list)
Environment:
Last Closed: 2015-01-15 13:40:10 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0038 0 normal SHIPPED_LIVE Red Hat Storage 3.0 enhancement and bug fix update #3 2015-01-15 18:35:28 UTC

Description M S Vishwanath Bhat 2014-09-19 11:13:07 UTC
Description of problem:
The session is going into faulty with OSError: [Errno 12] Cannot allocate memory backtrace in the logs. The operation I performed was sync existing data -> pause session -> rename all the files -> resume the session

Version-Release number of selected component (if applicable):
glusterfs-3.6.0.28-1.el6rhs.x86_64

How reproducible:
Hit only once. Not sure I will be able to reproduce again.

Steps to Reproduce:
1. Create and start a geo-rep session between 2*2 dist-rep master and 2*2 dist-rep slave volume.
2. Create and sync some 5k files in some directory structure.
3. Now pause the session.
5. rename all the files.
6. resume the session.

Actual results:
The session went to faulty

MASTER NODE                 MASTER VOL    MASTER BRICK      SLAVE               STATUS     CHECKPOINT STATUS    CRAWL STATUS        
-----------------------------------------------------------------------------------------------------------------------------
ccr.blr.redhat.com          master        /bricks/brick0    nirvana::slave      faulty     N/A                  N/A                 
metallica.blr.redhat.com    master        /bricks/brick1    acdc::slave         Passive    N/A                  N/A                 
beatles.blr.redhat.com      master        /bricks/brick3    rammstein::slave    Passive    N/A                  N/A                 
pinkfloyd.blr.redhat.com    master        /bricks/brick2    led::slave          faulty     N/A                  N/A                 


The backtrace in the master logs.

[2014-09-19 16:19:53.933645] I [master(/bricks/brick2):1225:crawl] _GMaster: slave's time: (1411061833, 0)
[2014-09-19 16:20:33.653033] E [repce(/bricks/brick2):207:__call__] RepceClient: call 18787:139727562630912:1411123833.64 (entry_ops) failed on peer with OSError
[2014-09-19 16:20:33.653924] E [syncdutils(/bricks/brick2):270:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 643, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1324, in service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 524, in crawlwrap
    self.crawl(no_stime_update=no_stime_update)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1236, in crawl
    self.process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 927, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 891, in process_change
    self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__
    raise res
OSError: [Errno 12] Cannot allocate memory
[2014-09-19 16:20:33.657620] I [syncdutils(/bricks/brick2):214:finalize] <top>: exiting.
[2014-09-19 16:20:33.663028] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2014-09-19 16:20:33.663907] I [syncdutils(agent):214:finalize] <top>: exiting.
[2014-09-19 16:20:33.795839] I [monitor(monitor):222:monitor] Monitor: worker(/bricks/brick2) died in startup phase


This is a remote backtrace propagated to master via RPC. The actual backtrace in slave logs are

[2014-09-19 16:27:45.780600] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 662, in entry_ops
    [ENOENT, ESTALE, EINVAL])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 470, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 78, in lsetxattr
    cls.raise_oserr()
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in raise_oserr
    raise OSError(errn, os.strerror(errn))
OSError: [Errno 12] Cannot allocate memory
[2014-09-19 16:27:45.794786] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF.


Expected results:
There should be no backtraces and no faulty sessions.

Additional info:
The slave volume had Cluster.hash-range-gfid on

I will create a sosreport for further analysis.

Comment 4 Alok 2014-09-22 08:55:56 UTC
Not for 3.0,Dev needs to do RCA and provide inputs.

Comment 5 M S Vishwanath Bhat 2014-09-24 12:35:36 UTC
This is happening with pause, rename and resume and just renames while session is in progress as well. One thing to be noted is the renames even include directory renames.

Comment 6 shilpa 2014-09-25 06:57:50 UTC
Reproduced the issue. Renamed a directory while the geo-rep session was active. The session went into faulty state. But the slave bricks were updated with the renamed file anyway. And further addition of new files were still syncing though the geo-rep session showed faulty. And when these new files are renamed, the old name of the directory remains on the slave bricks.

[root@lightning mastervol]# mkdir test
[root@lightning mastervol]# mkdir test21t1
[root@lightning mastervol]# ls
 test  test21t1
[root@lightning mastervol]# mv test notest



[root@Tim mastervol]# gluster v geo mastervol 10.70.42.197::slavevol status detail
 
MASTER NODE              MASTER VOL    MASTER BRICK                    SLAVE                     STATUS     CHECKPOINT STATUS    CRAWL STATUS    FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING    FILES SKIPPED   
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Tim.blr.redhat.com       mastervol     /bricks/brick1/mastervol_b1     10.70.42.254::slavevol    faulty     N/A                  N/A             9272           1                0                0                  0               
Tim.blr.redhat.com       mastervol     /bricks/brick2/mastervol_b5     10.70.42.254::slavevol    faulty     N/A                  N/A             12763          1                0                0                  0               
Tim.blr.redhat.com       mastervol     /bricks/brick3/mastervol_b9     10.70.42.254::slavevol    faulty     N/A                  N/A             12709          1                0                0                  0               
Tim.blr.redhat.com       mastervol     /bricks/brick3/mastervol_b11    10.70.42.197::slavevol    faulty     N/A                  N/A             7696           1                0                0                  0               
purple                   mastervol     /bricks/brick1/mastervol_b3     10.70.42.197::slavevol    faulty     N/A                  N/A             8              0                0                0                  0               
purple                   mastervol     /bricks/brick2/mastervol_b7     10.70.42.197::slavevol    faulty     N/A                  N/A             2              1                0                0                  0               
green                    mastervol     /bricks/brick1/mastervol_b4     10.70.42.151::slavevol    faulty     N/A                  N/A             0              0                0                0                  0               
green                    mastervol     /bricks/brick2/mastervol_b8     10.70.42.151::slavevol    faulty     N/A                  N/A             0              0                0                0                  0               
Javier.blr.redhat.com    mastervol     /bricks/brick1/mastervol_b2     10.70.42.97::slavevol     Passive    N/A                  N/A             0              0                0                0                  0               
Javier.blr.redhat.com    mastervol     /bricks/brick2/mastervol_b6     10.70.42.97::slavevol     Passive    N/A                  N/A             0              0                0                0                  0               
Javier.blr.redhat.com    mastervol     /bricks/brick3/mastervol_b10    10.70.42.97::slavevol     Passive    N/A                  N/A             0              0                0                0                  0               
Javier.blr.redhat.com    mastervol     /bricks/brick3/mastervol_b12    10.70.42.151::slavevol    Passive    N/A                  N/A          


[root@lightning mastervol]# mv test21t1 test21t2
[root@lightning mastervol]# ls -l
total 1048576
-rw-r--r-- 1 root root 1073741824 Sep 25 00:15 BB
drwxr-xr-x 2 root root         36 Sep 25 01:03 notest
drwxr-xr-x 2 root root         36 Sep 25 01:03 test21t2

The rename was successfully synced. However the old name was still retained in the slave bricks. 


Slave brick:

/bricks/brick3/slavevol_b10:
total 0
drwxr-xr-x 2 root root 6 Sep 25 01:03 notest
drwxr-xr-x 2 root root 6 Sep 25 01:03 test21t1
drwxr-xr-x 2 root root 6 Sep 25 17:49 test21t2

Master brick:

/bricks/brick2/mastervol_b8:
total 0
drwxr-xr-x 2 root root 6 Sep 25 01:03 notest
drwxr-xr-x 2 root root 6 Sep 25 01:03 test21t2



Logs from slavevol.log:

[2014-09-25 12:00:33.436995] E [repce(/bricks/brick2/mastervol_b5):207:__call__] RepceClient: call 32568:140035883935488:1411626633.43 (entry_ops) failed on peer with OSError
[2014-09-25 12:00:33.437428] E [syncdutils(/bricks/brick2/mastervol_b5):270:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 643, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1343, in service_loop
    g2.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 524, in crawlwrap
    self.crawl(no_stime_update=no_stime_update)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1174, in crawl
    self.process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 927, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 891, in process_change
    self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__
    raise res
OSError: [Errno 12] Cannot allocate memory

Comment 7 Kotresh HR 2014-09-25 12:36:28 UTC
Root Cause and Fix:

The rename of directories are captured in all distributed
brick changelogs. gsyncd processess these changelogs on
each brick parallellaly. The first changelog to get processed
will be successful. All subsequent ones will stat the 'src'
and if not present, tries to create freshly on slave. It
should be done only for files and not for directories.
Hence when this code path was hit, regular file's blob
is sent as directory's blob and gfid-access translator
was erroring out as 'Invalid blob length' with errno as
'ENOMEM'. Fix is not to create entry freshly 
if it is a directory.

The fix will be sent upstream once http://review.gluster.org/#/c/8761/
is merged as it's dependant on it.

Comment 8 Kotresh HR 2014-09-26 07:43:14 UTC
Patch sent Upstream:
http://review.gluster.org/#/c/8865/

Comment 9 Kotresh HR 2014-10-16 05:04:47 UTC
Upstream Patch:
http://review.gluster.org/#/c/8865/

Downstream Patch:
https://code.engineering.redhat.com/gerrit/#/c/34634/

Comment 11 shilpa 2014-11-21 08:53:19 UTC
Verified on 3.0.3 glusterfs-3.6.0.33-1.el6rhs.x86_64
The file renames successfully synced after pause/resume and no errors were found. Geo-rep status looked good.
     
Slave files before and after pause/resume

# ls /mnt/slave/
file    file18  file27  file36  file45  file9
file1   file19  file28  file37  file46  glusterfs-3.5.2.tar.gz
file10  file2   file29  file38  file47  linux-3.12.tar.bz2
file11  file20  file3   file39  file48  test.txt
file12  file21  file30  file4   file49  text.txt
file13  file22  file31  file40  file5
file14  file23  file32  file41  file50
file15  file24  file33  file42  file6
file16  file25  file34  file43  file7
file17  file26  file35  file44  file8


# ls /mnt/slave/
file     file-18  file-27  file-36  file-45  file-9
file-1   file-19  file-28  file-37  file-46  glusterfs-3.5.2.tar.gz
file-10  file-2   file-29  file-38  file-47  linux-3.12.tar.bz2
file-11  file-20  file-3   file-39  file-48  test.txt
file-12  file-21  file-30  file-4   file-49  text.txt
file-13  file-22  file-31  file-40  file-5
file-14  file-23  file-32  file-41  file-50
file-15  file-24  file-33  file-42  file-6
file-16  file-25  file-34  file-43  file-7
file-17  file-26  file-35  file-44  file-8

Comment 12 Aravinda VK 2014-11-21 08:59:19 UTC
(In reply to shilpa from comment #11)
> Verified on 3.0.3 glusterfs-3.6.0.33-1.el6rhs.x86_64
> The file renames successfully synced after pause/resume and no errors were
> found. Geo-rep status looked good.
>      
This bug fixes issue with directory renames, only file renames will not cause this bug. Is directory Renames tested?

Comment 13 shilpa 2014-11-21 09:25:01 UTC
In response to c#12. 
Tested both directory and file renames. Both work. Did not find any issues.

Comment 14 Shalaka 2015-01-09 09:31:35 UTC
Please review and sign-off edited doc text.

Comment 15 Kotresh HR 2015-01-12 09:09:31 UTC
Doc text looks fine to me.

Comment 17 errata-xmlrpc 2015-01-15 13:40:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0038.html


Note You need to log in before you can comment on or make changes to this bug.