Hide Forgot
Description of problem: geo-rep status going to faulty with rsync error "rsync> inflate returned -3" This the backtrace, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2013-12-19 12:29:55.408256] I [monitor(monitor):81:set_state] Monitor: new state: Stable [2013-12-19 12:29:56.605798] E [resource(/bricks/master_brick1):204:errlog] Popen: command "rsync -avR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-vC_AbZ/gsycnd-ssh-%r@%h:%p --compress root.42.171:/proc/4762/cwd" returned with 12, saying: [2013-12-19 12:29:56.607383] E [resource(/bricks/master_brick1):207:logerr] Popen: rsync> WARNING: .gfid/d3328146-ba07-4a5a-96a5-d3cab590a49e failed verification -- update retained (will try again). [2013-12-19 12:29:56.607791] E [resource(/bricks/master_brick1):207:logerr] Popen: rsync> WARNING: .gfid/da1bd1f4-2f29-4073-9254-5d3ba6fee8af failed verification -- update retained (will try again). [2013-12-19 12:29:56.608031] E [resource(/bricks/master_brick1):207:logerr] Popen: rsync> WARNING: .gfid/db74a16f-7bcb-4163-8ca9-62f65d392682 failed verification -- update retained (will try again). [2013-12-19 12:29:56.608367] E [resource(/bricks/master_brick1):207:logerr] Popen: rsync> inflate returned -3 (0 bytes) [2013-12-19 12:29:56.608613] E [resource(/bricks/master_brick1):207:logerr] Popen: rsync> rsync error: error in rsync protocol data stream (code 12) at token.c(547) [receiver=3.0.6] [2013-12-19 12:29:56.608886] E [resource(/bricks/master_brick1):207:logerr] Popen: rsync> rsync: connection unexpectedly closed (8086 bytes received so far) [sender] [2013-12-19 12:29:56.609395] E [resource(/bricks/master_brick1):207:logerr] Popen: rsync> rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6] [2013-12-19 12:29:56.610018] I [syncdutils(/bricks/master_brick1):159:finalize] <top>: exiting. [2013-12-19 12:29:56.616862] I [monitor(monitor):81:set_state] Monitor: new state: faulty >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version-Release number of selected component (if applicable):glusterfs-3.4.0.51geo-1 How reproducible: Didn't try to reproduce Steps to Reproduce: 1. create and start a geo-rep realtionship between master and slave 2. run the following script on master for i in {1..10} do ./crefi.py -n 100 --multi -b 10 -d 10 --random --max=2K --min=1K /mnt/master/ sleep 100 ./crefi.py -n 100 --multi -b 10 -d 10 --random --max=2K --min=1K --fop=chmod /mnt/master/ sleep 100 ./crefi.py -n 100 --multi -b 10 -d 10 --random --max=2K --min=1K --fop=chown /mnt/master/ sleep 100 ./crefi.py -n 100 --multi -b 10 -d 10 --random --max=2K --min=1K --fop=chgrp /mnt/master/ sleep 100 ./crefi.py -n 100 --multi -b 10 -d 10 --random --max=2K --min=1K --fop=symlink /mnt/master/ sleep 100 ./crefi.py -n 100 --multi -b 10 -d 10 --random --max=2K --min=1K --fop=hardlink /mnt/master/ sleep 100 ./crefi.py -n 100 --multi -b 10 -d 10 --random --max=2K --min=1K --fop=truncate /mnt/master/ sleep 500 3. parallely on one of the master node, run this, while : ; do ps ax | grep "ssh " | awk '{print $1}' | xargs kill ; sleep 100 ; ps ax | grep "ssh " | awk '{print $1}' | xargs kill ; sleep 1000; done Actual results: geo-rep status keep going to faulty state with rsync errors Expected results: It shouldn't go to faulty. even if it does, it should recover. Additional info:
did some basic debugging and looks like it could be an issue during compression, but still not sure about the exact issue.
This happened again in the build glusterfs-libs-3.4.0.57rhs-1. This was observed while syncing hardlinks in hybrid crawl. Backtrace in geo-rep log-file >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2014-01-24 16:56:56.270279] I [monitor(monitor):81:set_state] Monitor: new state: Stable [2014-01-24 16:56:56.609118] E [resource(/bricks/master_brick1):204:errlog] Popen: command "rsync -avR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-ZLm0kb/061fc87d252b63093ab9bfb765588973.sock --compress root.43.174:/proc/2044/cwd" returned with 12, saying: [2014-01-24 16:56:56.609867] E [resource(/bricks/master_brick1):207:logerr] Popen: rsync> rsync: open "/proc/2044/cwd/.gfid/04472653-c5fb-4226-b6e2-8067bad81234" failed: No data available (61) [2014-01-24 16:56:56.610153] E [resource(/bricks/master_brick1):207:logerr] Popen: rsync> rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32) [2014-01-24 16:56:56.610444] E [resource(/bricks/master_brick1):207:logerr] Popen: rsync> WARNING: .gfid/100924af-70ad-4025-bcf1-d3bafc14dcc9 failed verification -- update retained (will try again). [2014-01-24 16:56:56.610729] E [resource(/bricks/master_brick1):207:logerr] Popen: rsync> inflate returned -3 (0 bytes) [2014-01-24 16:56:56.610985] E [resource(/bricks/master_brick1):207:logerr] Popen: rsync> rsync: connection unexpectedly closed (5419 bytes received so far) [sender] [2014-01-24 16:56:56.611249] E [resource(/bricks/master_brick1):207:logerr] Popen: rsync> rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6] [2014-01-24 16:56:56.611910] I [syncdutils(/bricks/master_brick1):192:finalize] <top>: exiting. [2014-01-24 16:56:56.623743] I [monitor(monitor):81:set_state] Monitor: new state: faulty >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> log snippet from the corresponding client log for the GFID 04472653-c5fb-4226-b6e2-8067bad81234 which got some error in master,( from above log snippet) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6aae5daa288>/hardlink_to_files [2014-01-24 11:52:55.101724] I [fuse-bridge.c:3516:fuse_auxgfid_newentry_cbk] 0-fuse-aux-gfid-mount: failed to create the entry <gfid:0ed073ac-86fa-48b1-80c5-86aae5daa2 88>/hardlink_to_files with gfid (16316676-7dbf-4376-9c52-3382888aedd6): File exists [2014-01-24 11:52:55.101756] W [fuse-bridge.c:1628:fuse_err_cbk] 0-glusterfs-fuse: 31812: MKDIR() <gfid:0ed073ac-86fa-48b1-80c5-86aae5daa288>/hardlink_to_files => -1 (F ile exists) [2014-01-24 11:53:03.658170] E [dht-helper.c:777:dht_migration_complete_check_task] 0-slave-dht: <gfid:04472653-c5fb-4226-b6e2-8067bad81234>: failed to get the 'linkto' xattr No data available [2014-01-24 11:53:03.658287] W [fuse-bridge.c:1134:fuse_attr_cbk] 0-glusterfs-fuse: 32339: STAT() <gfid:04472653-c5fb-4226-b6e2-8067bad81234> => -1 (No data available) [2014-01-24 11:53:03.666448] E [dht-helper.c:777:dht_migration_complete_check_task] 0-slave-dht: <gfid:04472653-c5fb-4226-b6e2-8067bad81234>: failed to get the 'linkto' xattr No data available [2014-01-24 11:53:08.070778] W [client-rpc-fops.c:1170:client3_3_fgetxattr_cbk] 0-slave-client-2: remote operation failed: No data available [2014-01-24 11:53:08.071455] W [client-rpc-fops.c:1170:client3_3_fgetxattr_cbk] 0-slave-client-2: remote operation failed: No data available [2014-01-24 11:53:08.072415] W [client-rpc-fops.c:1170:client3_3_fgetxattr_cbk] 0-slave-client-3: remote operation failed: No data available [2014-01-24 11:53:08.072482] E [dht-helper.c:777:dht_migration_complete_check_task] 0-slave-dht: (null): failed to get the 'linkto' xattr No data available [2014-01-24 11:53:08.073145] W [client-rpc-fops.c:1170:client3_3_fgetxattr_cbk] 0-slave-client-3: remote operation failed: No data available [2014-01-24 11:53:08.073282] E [dht-helper.c:777:dht_migration_complete_check_task] 0-slave-dht: (null): failed to get the 'linkto' xattr No data available [2014-01-24 11:53:08.087516] E [dht-helper.c:777:dht_migration_complete_check_task] 0-slave-dht: <gfid:365036ca-5992-4156-bd63-187bfbe10ad5>: failed to get the 'linkto' xattr No data available >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
rsync inflate errors not seen during regression tests of RHGS 2.1.6, 3.0.4 and 3.1. Closing this bug. Please reopen if this bug found again.