Bug 984511

Summary: Dist-geo-rep : static renames results in few errors logs in geo-rep log file, consequently status becomes faulty for sometime.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vijaykumar Koppad <vkoppad>
Component: geo-replicationAssignee: Venky Shankar <vshankar>
Status: CLOSED ERRATA QA Contact: Vijaykumar Koppad <vkoppad>
Severity: medium Docs Contact:
Priority: high    
Version: 2.1CC: aavati, amarts, bbandari, csaba, rhs-bugs, sdharane
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.14rhs-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-23 22:38:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vijaykumar Koppad 2013-07-15 12:12:50 UTC
Description of problem: Static renames ( ie rename all the files when geo-rep is stopped and after the renames, start geo-rep) results in few error logs in geo-rep log file consequently status becomes faulty for sometime. 

these are logs 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[2013-07-15 17:18:00.404345] E [resource(/bricks/brick1):204:errlog] Popen: command "rsync -avR0 --inplace --files-fro
m=- --super --stats --numeric-ids --no-implied-dirs . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
 /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-XF4WPr/gsycnd-ssh-%r@%h:%p -
-compress root.43.141:/proc/27762/cwd" returned with 12, saying:
[2013-07-15 17:18:00.405149] E [resource(/bricks/brick1):207:logerr] Popen: rsync> WARNING: .gfid/16a7c480-372a-4ad7-9
057-0c809804ce13 failed verification -- update retained (will try again).
[2013-07-15 17:18:00.405565] E [resource(/bricks/brick1):207:logerr] Popen: rsync> WARNING: .gfid/1ffd6699-6696-4d05-9
f80-84ebbda3c832 failed verification -- update retained (will try again).
[2013-07-15 17:18:00.406018] E [resource(/bricks/brick1):207:logerr] Popen: rsync> WARNING: .gfid/401584bb-e92f-4ca4-8
a70-1e49a388c91e failed verification -- update retained (will try again).
[2013-07-15 17:18:00.406420] E [resource(/bricks/brick1):207:logerr] Popen: rsync> rsync: writefd_unbuffered failed to
 write 4 bytes to socket [sender]: Broken pipe (32)
[2013-07-15 17:18:00.406646] E [resource(/bricks/brick1):207:logerr] Popen: rsync> WARNING: .gfid/73cefe17-6bf6-4b8a-9
073-91c2cff3a88d failed verification -- update retained (will try again).
[2013-07-15 17:18:00.406879] E [resource(/bricks/brick1):207:logerr] Popen: rsync> WARNING: .gfid/7b7eb105-92a6-4cf7-b
8bb-ac867aeaf50c failed verification -- update retained (will try again).
[2013-07-15 17:18:00.407242] E [resource(/bricks/brick1):207:logerr] Popen: rsync> WARNING: .gfid/b4bc8201-b541-4f52-8
a81-c891dbbcdddd failed verification -- update retained (will try again).
[2013-07-15 17:18:00.407605] E [resource(/bricks/brick1):207:logerr] Popen: rsync> WARNING: .gfid/b82d6ea9-883b-4f24-9363-5102fd490a7a failed verification -- update retained (will try again).
[2013-07-15 17:18:00.408001] E [resource(/bricks/brick1):207:logerr] Popen: rsync> WARNING: .gfid/baf3610f-1cd2-437a-921f-44d249cc8ffe failed verification -- update retained (will try again).
[2013-07-15 17:18:00.408221] E [resource(/bricks/brick1):207:logerr] Popen: rsync> WARNING: .gfid/bbf6804d-5b17-4f55-b7a9-b5b58f18e010 failed verification -- update retained (will try again).
[2013-07-15 17:18:00.408412] E [resource(/bricks/brick1):207:logerr] Popen: rsync> rsync: lseek of "/proc/27762/cwd/.gfid/bfd5a4ab-547a-4041-b283-1aac143632dd" returned 11783, not 21583: Success (0)
[2013-07-15 17:18:00.408603] E [resource(/bricks/brick1):207:logerr] Popen: rsync> rsync error: error in file IO (code 11) at receiver.c(274) [receiver=3.0.6]
[2013-07-15 17:18:00.408816] E [resource(/bricks/brick1):207:logerr] Popen: rsync> rsync: connection unexpectedly closed (148281 bytes received so far) [sender]
[2013-07-15 17:18:00.409065] E [resource(/bricks/brick1):207:logerr] Popen: rsync> rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]
[2013-07-15 17:18:00.410257] I [syncdutils(/bricks/brick1):158:finalize] <top>: exiting.
[2013-07-15 17:18:00.422024] I [monitor(monitor):81:set_state] Monitor: new state: faulty


Version-Release number of selected component (if applicable):3.4.0.12rhs.beta4-1.el6rhs.x86_64


How reproducible:Observed it once.


Steps to Reproduce:
1.Create and start a geo-rep relationship between master(DIST-REP) and slave. 
2.Create files using the command, ./crefi.py -n 10 --multi -b 10 -d 10 --random --max=500K --min=10 <MNT_PNT>
3.Let it sync to slave.
4. Stop the geo-rep session,
5. rename   all the files, using the command ./crefi.py -n 10 --multi -b 10 -d 10 --random --max=500K --min=10 --fop=rename <MNT_PNT>
6. start the geo-rep session. 
7. Check the geo-rep log-files. 


Actual results: renames result in some error in geo-rep log file and consequently status becomes faulty for some time. 


Expected results: Any errors should be handled gracefully, it shouldn't result in any faulty states. 


Additional info:

Comment 2 Vijaykumar Koppad 2013-08-07 09:18:04 UTC
verified on glusterfs-3.4.0.17rhs-1.el6rhs.x86_64

Comment 3 Scott Haines 2013-09-23 22:38:44 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 4 Scott Haines 2013-09-23 22:41:28 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html