Description of problem: history crawl was hung, while syncing truncates from master to slave. details of one of the files whose md5sum mismatched. ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: [root@targarean ~]# md5sum /mnt/master/./thread5/level08/level18/53b163a1%%FA6QRKVKKU 588696944bb4b1c2e854f6f2785f3655 /mnt/master/./thread5/level08/level18/53b163a1%%FA6QRKVKKU [root@targarean ~]# md5sum /mnt/slave/./thread5/level08/level18/53b163a1%%FA6QRKVKKU 000da1878965d368bace7356f4e228c0 /mnt/slave/./thread5/level08/level18/53b163a1%%FA6QRKVKKU md5sum is clearly different for that file on slave. :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: # getfattr -n glusterfs.gfid.string /mnt/master/./thread5/level08/level18/53b163a1%%FA6QRKVKKU getfattr: Removing leading '/' from absolute path names # file: mnt/master/./thread5/level08/level18/53b163a1%%FA6QRKVKKU glusterfs.gfid.string="ffe74d71-980c-4793-b71b-2327eed92751" grepping for that gfid in working dir gives me this, :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: # grep -r "ffe74d71-980c-4793-b71b-2327eed92751" * 0457c276b8f3b3b0677547937419a2fe/.history/.processing/CHANGELOG.1404149968:D ffe74d71-980c-4793-b71b-2327eed92751 0457c276b8f3b3b0677547937419a2fe/.history/.processing/CHANGELOG.1404149968:M ffe74d71-980c-4793-b71b-2327eed92751 NULL 0457c276b8f3b3b0677547937419a2fe/.history/.processed/CHANGELOG.1404140148:M ffe74d71-980c-4793-b71b-2327eed92751 SETATTR 0457c276b8f3b3b0677547937419a2fe/.history/.processed/CHANGELOG.1404136120:M ffe74d71-980c-4793-b71b-2327eed92751 SETATTR 0457c276b8f3b3b0677547937419a2fe/.history/.processed/CHANGELOG.1404145820:E ffe74d71-980c-4793-b71b-2327eed92751 LINK 34cc4748-f133-405c-9f21-5eee34c2e100%2F53b1926c%25%25FXYXIBWMJ2 0457c276b8f3b3b0677547937419a2fe/.history/.processed/CHANGELOG.1404145820:M ffe74d71-980c-4793-b71b-2327eed92751 NULL 0457c276b8f3b3b0677547937419a2fe/.history/.processed/CHANGELOG.1404133846:E ffe74d71-980c-4793-b71b-2327eed92751 CREATE 33188 0 0 7068405a-1d62-42ba-ae74-ef46a0a63d36%2F53b163a1%25%25FA6QRKVKKU 0457c276b8f3b3b0677547937419a2fe/.history/.processed/CHANGELOG.1404133846:M ffe74d71-980c-4793-b71b-2327eed92751 NULL 0457c276b8f3b3b0677547937419a2fe/.history/.processed/CHANGELOG.1404133846:D ffe74d71-980c-4793-b71b-2327eed92751 0457c276b8f3b3b0677547937419a2fe/.history/.processed/CHANGELOG.1404138133:M ffe74d71-980c-4793-b71b-2327eed92751 SETATTR even after almost 12 hrs, that changelog having entry is still in processing, # gluster v geo master 10.70.43.111::slave status MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS ------------------------------------------------------------------------------------------------------------------------------------------------------ targarean.blr.redhat.com master /bricks/brick1/master_b1 10.70.43.131::slave Initializing... N/A N/A targarean.blr.redhat.com master /bricks/brick2/master_b5 10.70.43.131::slave Initializing... N/A N/A targarean.blr.redhat.com master /bricks/brick3/master_b9 10.70.43.131::slave Initializing... N/A N/A spiderman.blr.redhat.com master /bricks/brick1/master_b4 10.70.43.165::slave Passive N/A N/A spiderman.blr.redhat.com master /bricks/brick2/master_b8 10.70.43.165::slave Passive N/A N/A spiderman.blr.redhat.com master /bricks/brick3/master_b12 10.70.43.165::slave Passive N/A N/A stark.blr.redhat.com master /bricks/brick1/master_b3 10.70.42.236::slave Active N/A Changelog Crawl stark.blr.redhat.com master /bricks/brick2/master_b7 10.70.42.236::slave Active N/A Changelog Crawl stark.blr.redhat.com master /bricks/brick3/master_b11 10.70.42.236::slave Active N/A Changelog Crawl shaktiman.blr.redhat.com master /bricks/brick1/master_b2 10.70.43.111::slave Passive N/A N/A shaktiman.blr.redhat.com master /bricks/brick2/master_b6 10.70.43.111::slave Passive N/A N/A shaktiman.blr.redhat.com master /bricks/brick3/master_b10 10.70.43.111::slave Passive N/A N/A ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Version-Release number of selected component (if applicable):glusterfs-3.6.0.22-1.el6rhs How reproducible: Didn't try to reproduce. Steps to Reproduce: 1. create and start a geo-rep relationship between master and slave. 2. stop geo-rep and create data on master using the command, "refi -T 10 -n 10 --multi -d 10 -b 10 --random --max=10K --min=1K /mnt/master" 3. start geo-rep and let it sync. 4. stop geo-rep and truncate all the files created, using the command "refi -T 10 -n 10 --multi -d 10 -b 10 --random --max=10K --min=1K --fop=truncate /mnt/master" 5. start geo-rep and let it sync to slave. Actual results: Mismatch in md5sum for many files. Expected results: There shouldn't be mismatch in md5sum between master and slave. Additional info:
looks like it could be the effect of the Bug 1112582. Here also there were 4 zombie python processes. :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: ps ax | grep python 772 ? Sl 0:00 python t.compare-arequal.py 10.70.43.8::master 10.70.43.111::slave 878 ? Z 0:00 [python] <defunct> 879 ? Z 0:00 [python] <defunct> 1084 ? S 19:15 /usr/bin/python /usr/share/vdsm-reg/vdsm-reg-setup 1583 ? S 2:03 /usr/bin/python /usr/sbin/tuned -d -c /etc/tuned.conf 1809 ? S<l 0:00 /usr/bin/python /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock --pidfile /var/run/vdsm/supervdsmd.pid 3718 pts/6 R+ 0:00 grep python 26063 ? Ssl 0:08 /usr/bin/python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/bricks/brick1/master_b1 --path=/bricks/brick2/master_b5 --path=/bricks/brick3/master_b9 --monitor -c /var/lib/glusterd/geo-replication/master_10.70.43.111_slave/gsyncd.conf --iprefix=/var :master --glusterd-uuid=33ff53a6-6e34-4e9c-a4c5-e67795ac7e8f 10.70.43.111::slave 26084 ? Ssl 0:09 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/bricks/brick1/master_b1 --path=/bricks/brick2/master_b5 --path=/bricks/brick3/master_b9 -c /var/lib/glusterd/geo-replication/master_10.70.43.111_slave/gsyncd.conf --iprefix=/var :master --glusterd-uuid=33ff53a6-6e34-4e9c-a4c5-e67795ac7e8f 10.70.43.111::slave -N -p --slave-id d58756d6-dbc1-4e93-ac97-e4be70f8a931 --local-path /bricks/brick2/master_b5 --agent --rpc-fd 9,12,11,10 26085 ? Ssl 0:09 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/bricks/brick1/master_b1 --path=/bricks/brick2/master_b5 --path=/bricks/brick3/master_b9 -c /var/lib/glusterd/geo-replication/master_10.70.43.111_slave/gsyncd.conf --iprefix=/var :master --glusterd-uuid=33ff53a6-6e34-4e9c-a4c5-e67795ac7e8f 10.70.43.111::slave -N -p --slave-id d58756d6-dbc1-4e93-ac97-e4be70f8a931 --local-path /bricks/brick1/master_b1 --agent --rpc-fd 7,16,15,8 26086 ? Z 0:00 [python] <defunct> 26087 ? Z 0:00 [python] <defunct> 26088 ? Ssl 0:25 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/bricks/brick1/master_b1 --path=/bricks/brick2/master_b5 --path=/bricks/brick3/master_b9 -c /var/lib/glusterd/geo-replication/master_10.70.43.111_slave/gsyncd.conf --iprefix=/var :master --glusterd-uuid=33ff53a6-6e34-4e9c-a4c5-e67795ac7e8f 10.70.43.111::slave -N -p --slave-id d58756d6-dbc1-4e93-ac97-e4be70f8a931 --local-path /bricks/brick3/master_b9 --agent --rpc-fd 7,11,10,9 26089 ? Sl 3:07 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/bricks/brick1/master_b1 --path=/bricks/brick2/master_b5 --path=/bricks/brick3/master_b9 -c /var/lib/glusterd/geo-replication/master_10.70.43.111_slave/gsyncd.conf --iprefix=/var :master --glusterd-uuid=33ff53a6-6e34-4e9c-a4c5-e67795ac7e8f 10.70.43.111::slave -N -p --slave-id d58756d6-dbc1-4e93-ac97-e4be70f8a931 --feedback-fd 12 --local-path /bricks/brick3/master_b9 --local-id .%2Fbricks%2Fbrick3%2Fmaster_b9 --rpc-fd 10,9,7,11 --resource-remote ssh://root.43.131:gluster://localhost:slave ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
As per comment#1, this is duplicate of BZ 1112582. Closing this bug. Please reopen if issue still exists. *** This bug has been marked as a duplicate of bug 1112582 ***