Bug 1114969 - Dist-geo-rep : history crawl was hung, while syncing truncates from master to slave.
Summary: Dist-geo-rep : history crawl was hung, while syncing truncates from master to...
Keywords:
Status: CLOSED DUPLICATE of bug 1112582
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Aravinda VK
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard: dht, consistency
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-01 11:11 UTC by Vijaykumar Koppad
Modified: 2015-05-13 18:15 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-03-11 17:47:37 UTC
Embargoed:


Attachments (Terms of Use)

Description Vijaykumar Koppad 2014-07-01 11:11:21 UTC
Description of problem:  history crawl was hung, while syncing truncates from master to slave. 

details of one of the files whose md5sum mismatched.
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
[root@targarean ~]# md5sum  /mnt/master/./thread5/level08/level18/53b163a1%%FA6QRKVKKU
588696944bb4b1c2e854f6f2785f3655  /mnt/master/./thread5/level08/level18/53b163a1%%FA6QRKVKKU
[root@targarean ~]# md5sum  /mnt/slave/./thread5/level08/level18/53b163a1%%FA6QRKVKKU
000da1878965d368bace7356f4e228c0  /mnt/slave/./thread5/level08/level18/53b163a1%%FA6QRKVKKU

md5sum is clearly different for that file on slave. 

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
# getfattr -n glusterfs.gfid.string /mnt/master/./thread5/level08/level18/53b163a1%%FA6QRKVKKU
getfattr: Removing leading '/' from absolute path names
# file: mnt/master/./thread5/level08/level18/53b163a1%%FA6QRKVKKU
glusterfs.gfid.string="ffe74d71-980c-4793-b71b-2327eed92751"


grepping for that gfid in working dir gives me this, 
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
# grep -r "ffe74d71-980c-4793-b71b-2327eed92751" *
0457c276b8f3b3b0677547937419a2fe/.history/.processing/CHANGELOG.1404149968:D ffe74d71-980c-4793-b71b-2327eed92751
0457c276b8f3b3b0677547937419a2fe/.history/.processing/CHANGELOG.1404149968:M ffe74d71-980c-4793-b71b-2327eed92751 NULL
0457c276b8f3b3b0677547937419a2fe/.history/.processed/CHANGELOG.1404140148:M ffe74d71-980c-4793-b71b-2327eed92751 SETATTR
0457c276b8f3b3b0677547937419a2fe/.history/.processed/CHANGELOG.1404136120:M ffe74d71-980c-4793-b71b-2327eed92751 SETATTR
0457c276b8f3b3b0677547937419a2fe/.history/.processed/CHANGELOG.1404145820:E ffe74d71-980c-4793-b71b-2327eed92751 LINK 34cc4748-f133-405c-9f21-5eee34c2e100%2F53b1926c%25%25FXYXIBWMJ2
0457c276b8f3b3b0677547937419a2fe/.history/.processed/CHANGELOG.1404145820:M ffe74d71-980c-4793-b71b-2327eed92751 NULL
0457c276b8f3b3b0677547937419a2fe/.history/.processed/CHANGELOG.1404133846:E ffe74d71-980c-4793-b71b-2327eed92751 CREATE 33188 0 0 7068405a-1d62-42ba-ae74-ef46a0a63d36%2F53b163a1%25%25FA6QRKVKKU
0457c276b8f3b3b0677547937419a2fe/.history/.processed/CHANGELOG.1404133846:M ffe74d71-980c-4793-b71b-2327eed92751 NULL
0457c276b8f3b3b0677547937419a2fe/.history/.processed/CHANGELOG.1404133846:D ffe74d71-980c-4793-b71b-2327eed92751
0457c276b8f3b3b0677547937419a2fe/.history/.processed/CHANGELOG.1404138133:M ffe74d71-980c-4793-b71b-2327eed92751 SETATTR


even after almost 12 hrs, that changelog having entry is still in processing,

# gluster v geo master 10.70.43.111::slave status

MASTER NODE                 MASTER VOL    MASTER BRICK                 SLAVE                  STATUS             CHECKPOINT STATUS    CRAWL STATUS
------------------------------------------------------------------------------------------------------------------------------------------------------
targarean.blr.redhat.com    master        /bricks/brick1/master_b1     10.70.43.131::slave    Initializing...    N/A                  N/A
targarean.blr.redhat.com    master        /bricks/brick2/master_b5     10.70.43.131::slave    Initializing...    N/A                  N/A
targarean.blr.redhat.com    master        /bricks/brick3/master_b9     10.70.43.131::slave    Initializing...    N/A                  N/A
spiderman.blr.redhat.com    master        /bricks/brick1/master_b4     10.70.43.165::slave    Passive            N/A                  N/A
spiderman.blr.redhat.com    master        /bricks/brick2/master_b8     10.70.43.165::slave    Passive            N/A                  N/A
spiderman.blr.redhat.com    master        /bricks/brick3/master_b12    10.70.43.165::slave    Passive            N/A                  N/A
stark.blr.redhat.com        master        /bricks/brick1/master_b3     10.70.42.236::slave    Active             N/A                  Changelog Crawl
stark.blr.redhat.com        master        /bricks/brick2/master_b7     10.70.42.236::slave    Active             N/A                  Changelog Crawl
stark.blr.redhat.com        master        /bricks/brick3/master_b11    10.70.42.236::slave    Active             N/A                  Changelog Crawl
shaktiman.blr.redhat.com    master        /bricks/brick1/master_b2     10.70.43.111::slave    Passive            N/A                  N/A
shaktiman.blr.redhat.com    master        /bricks/brick2/master_b6     10.70.43.111::slave    Passive            N/A                  N/A
shaktiman.blr.redhat.com    master        /bricks/brick3/master_b10    10.70.43.111::slave    Passive            N/A                  N/A


:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::


Version-Release number of selected component (if applicable):glusterfs-3.6.0.22-1.el6rhs


How reproducible: Didn't try to reproduce.


Steps to Reproduce:
1. create and start a geo-rep relationship between master and slave. 
2. stop geo-rep and create data on master using the command, "refi -T 10 -n 10 --multi -d 10 -b 10 --random --max=10K --min=1K /mnt/master"
3. start geo-rep and let it sync.
4. stop geo-rep and truncate all the files created, using the command "refi -T 10 -n 10 --multi -d 10 -b 10 --random --max=10K --min=1K --fop=truncate /mnt/master" 
5. start geo-rep and let it sync to slave. 


Actual results: Mismatch in md5sum for many files.


Expected results: There shouldn't be mismatch in md5sum between master and slave. 


Additional info:

Comment 1 Vijaykumar Koppad 2014-07-01 11:17:56 UTC
looks like it could be the effect of the Bug 1112582. 

Here also there were 4 zombie python processes. 

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
ps ax | grep python
  772 ?        Sl     0:00 python t.compare-arequal.py 10.70.43.8::master 10.70.43.111::slave
  878 ?        Z      0:00 [python] <defunct>
  879 ?        Z      0:00 [python] <defunct>
 1084 ?        S     19:15 /usr/bin/python /usr/share/vdsm-reg/vdsm-reg-setup
 1583 ?        S      2:03 /usr/bin/python /usr/sbin/tuned -d -c /etc/tuned.conf
 1809 ?        S<l    0:00 /usr/bin/python /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock --pidfile /var/run/vdsm/supervdsmd.pid
 3718 pts/6    R+     0:00 grep python
26063 ?        Ssl    0:08 /usr/bin/python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/bricks/brick1/master_b1 --path=/bricks/brick2/master_b5 --path=/bricks/brick3/master_b9  --monitor -c /var/lib/glusterd/geo-replication/master_10.70.43.111_slave/gsyncd.conf --iprefix=/var :master --glusterd-uuid=33ff53a6-6e34-4e9c-a4c5-e67795ac7e8f 10.70.43.111::slave
26084 ?        Ssl    0:09 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/bricks/brick1/master_b1 --path=/bricks/brick2/master_b5 --path=/bricks/brick3/master_b9  -c /var/lib/glusterd/geo-replication/master_10.70.43.111_slave/gsyncd.conf --iprefix=/var :master --glusterd-uuid=33ff53a6-6e34-4e9c-a4c5-e67795ac7e8f 10.70.43.111::slave -N -p  --slave-id d58756d6-dbc1-4e93-ac97-e4be70f8a931 --local-path /bricks/brick2/master_b5 --agent --rpc-fd 9,12,11,10
26085 ?        Ssl    0:09 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/bricks/brick1/master_b1 --path=/bricks/brick2/master_b5 --path=/bricks/brick3/master_b9  -c /var/lib/glusterd/geo-replication/master_10.70.43.111_slave/gsyncd.conf --iprefix=/var :master --glusterd-uuid=33ff53a6-6e34-4e9c-a4c5-e67795ac7e8f 10.70.43.111::slave -N -p  --slave-id d58756d6-dbc1-4e93-ac97-e4be70f8a931 --local-path /bricks/brick1/master_b1 --agent --rpc-fd 7,16,15,8
26086 ?        Z      0:00 [python] <defunct>
26087 ?        Z      0:00 [python] <defunct>
26088 ?        Ssl    0:25 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/bricks/brick1/master_b1 --path=/bricks/brick2/master_b5 --path=/bricks/brick3/master_b9  -c /var/lib/glusterd/geo-replication/master_10.70.43.111_slave/gsyncd.conf --iprefix=/var :master --glusterd-uuid=33ff53a6-6e34-4e9c-a4c5-e67795ac7e8f 10.70.43.111::slave -N -p  --slave-id d58756d6-dbc1-4e93-ac97-e4be70f8a931 --local-path /bricks/brick3/master_b9 --agent --rpc-fd 7,11,10,9
26089 ?        Sl     3:07 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/bricks/brick1/master_b1 --path=/bricks/brick2/master_b5 --path=/bricks/brick3/master_b9  -c /var/lib/glusterd/geo-replication/master_10.70.43.111_slave/gsyncd.conf --iprefix=/var :master --glusterd-uuid=33ff53a6-6e34-4e9c-a4c5-e67795ac7e8f 10.70.43.111::slave -N -p  --slave-id d58756d6-dbc1-4e93-ac97-e4be70f8a931 --feedback-fd 12 --local-path /bricks/brick3/master_b9 --local-id .%2Fbricks%2Fbrick3%2Fmaster_b9 --rpc-fd 10,9,7,11 --resource-remote ssh://root.43.131:gluster://localhost:slave

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Comment 4 Aravinda VK 2015-03-11 17:47:37 UTC
As per comment#1, this is duplicate of BZ 1112582. Closing this bug. Please reopen if issue still exists.

*** This bug has been marked as a duplicate of bug 1112582 ***


Note You need to log in before you can comment on or make changes to this bug.