Red Hat Bugzilla – Bug 867345
geo-rep failed to sync large file of order GB through ssh session.
Last modified: 2014-08-24 20:49:52 EDT
Description of problem: If the file is of the order 1GB, it fails to sync on the slave which is geo-rep session through ssh. During this time there were 3 other geo-rep session on the same volume of all the type of geo-rep possible like through ssh to a volume , through ssh to a file , through gluster to a volume, and to a local file
These are the DEBUG logs ..
[2012-10-17 08:31:09.21575] D [repce:190:__call__] RepceClient: call 5734:139746862941952:1350442869.02 keep_alive -> 66
[2012-10-17 08:31:09.977252] W [master:786:regjob] _GMaster: failed to sync ./file_1G
[2012-10-17 08:31:16.274018] W [master:786:regjob] _GMaster: failed to sync ./file_1G
[2012-10-17 08:31:16.274231] D [master:660:crawl] _GMaster: ... crawl #979 done, took 14.937212 seconds
[2012-10-17 08:31:17.277674] D [master:615:volinfo_state_machine] <top>: (None, 426253f7) << (None, 426253f7) -> (None, 426253f7)
[2012-10-17 08:31:17.277858] D [master:696:crawl] _GMaster: entering .
[2012-10-17 08:31:17.278656] D [repce:175:push] RepceClient: call 5734:139747139553024:1350442877.28 xtime('.', '426253f7-b423-4a69-91c7-53e736e17d00') ...
[2012-10-17 08:31:17.280658] D [repce:190:__call__] RepceClient: call 5734:139747139553024:1350442877.28 xtime -> (1350442787, 818281)
[2012-10-17 08:31:17.283113] D [repce:175:push] RepceClient: call 5734:139747139553024:1350442877.28 entries('.',) ...
[2012-10-17 08:31:17.286344] D [repce:190:__call__] RepceClient: call 5734:139747139553024:1350442877.28 entries -> ['.file_1G.b65DJ1']
[2012-10-17 08:31:17.286490] D [repce:175:push] RepceClient: call 5734:139747139553024:1350442877.29 purge('.', set(['.file_1G.b65DJ1'])) ...
[2012-10-17 08:31:17.288861] D [repce:190:__call__] RepceClient: call 5734:139747139553024:1350442877.29 purge -> None
[2012-10-17 08:31:17.290184] D [master:778:crawl] _GMaster: syncing ./file_1G ...
[2012-10-17 08:31:17.355895] D [resource:526:rsync] SSH: files: ./file_1G
[2012-10-17 08:31:25.858294] W [master:786:regjob] _GMaster: failed to sync ./file_1G
Version-Release number of selected component (if applicable):RHS-2.0.z u3
Steps to Reproduce:
1.Start a geo-rep session between master(dist-replicate) and slave(dist-rep) through ssh
2.Create all other type of geo-rep session mentioned above to the same volume.
3.Create a 1GB sparse file.
4.Check the file in slave mount point.
5. It fails to sync the data , even though all the other slaves got file synced.
6. If you check the log_file , you might get the similar log in geo-rep logs.
Actual results: Large file failed to sync
Expected results: File should sync.
Just to remember the state when it happened. This is detailed setup.
there is a master and a slave machine.
one volume in master machine called master(dist-rep)
two volumes in slave machine called slave (dist-rep) and slave_gfs(dist-stripe)
1.MASTER SLAVE STATUS
master file:///root/slave_local OK
master ssh://<slave>:/mnt/slave_ssh OK
master gluster://<slave>:slave_gfs OK
master ssh://<slave>::slave OK
Apparently , I found out, it got synced after 50 min.
which is again very bad . Initially whenever you create a file, atleast it should get entry on the slave or there should be update on the rysnc temp file, Which was not the case .
"failed to sync" is not necessarily bad, but at least a warning sign. Should not be seen if the setup is static.
However, what do you mean by "there should be update on the rysnc temp file"? It kept being the same size over a period of time?
rsync tmp file kept being 0 for long time.
could be a setup problem (NTP, sync delays).
Reopen if needed.