Description of problem: Stopped geo-rep and replaced a node after reinstalling it by following the procedure of Replacing_a_Host_Machine_with_the_Same_IP_Address provided in the doc link below: http://documentation-devel.engineering.redhat.com/site/documentation/en-US/Red_Hat_Storage/3/html-single/Administration_Guide/index.html#Replacing_a_Host_Machine_with_the_Same_IP_Address. After adding the node and starting the geo-rep, some files were not synced. Version-Release number of selected component (if applicable): glusterfs-3.6.0.41-1.el6rhs.x86_64 How reproducible: Tried once Steps to Reproduce: 1. Create distribute-replicate master and slave volumes 2*2 cluster with four nodes in each cluster and start geo-replication. 2. Ensure that all the data if any is replicated. 3. stop geo-replication. 4. Replace an active node from the master volume with a new node with the same hostname and IP and configuration. Follow the procedure of Replacing_a_Host_Machine_with_the_Same_IP_Address as provided in the RHS doc. 5. Run I/O on the master volume when the node that is being replaced is down. 6. After adding the new node to the master volume with same hostname and IP wait for the self heal to copy data to the new node added. 7. Start geo-rep with the following process: a. gluster system:: execute gsec_create b. gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL create push-pem force c. gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start force Actual results: There was mismatch on the number of files on master and slave. All the files are not synced. Expected results: After the geo-rep start force, all the files that were written during the time when the active node was down, should start syncing and no files should be missed if the self-heal is complete. Additional info: Files missing: On master mountpoint: # find /mnt/slave f | wc -l 24975 On slave mountpoint: ## find /mnt/master| wc -l 31011 # arequal-checksum -p /mnt/master Entry counts Regular files : 30000 Directories : 1011 Symbolic links : 0 Other : 0 Total : 31011 Metadata checksums Regular files : 3e9 Directories : 24e15a Symbolic links : 3e9 Other : 3e9 Checksums Regular files : ad313c59da41368f7bc1f35c1237a4e1 Directories : 70560818100c3b63 Symbolic links : 0 Other : 0 Total : a6a6c71dd87aa90d [root@ccr changelogs]# arequal-checksum -p /mnt/slave Entry counts Regular files : 23964 Directories : 1011 Symbolic links : 0 Other : 0 Total : 24975 Metadata checksums Regular files : 3e9 Directories : 24e15a Symbolic links : 3e9 Other : 3e9 Checksums Regular files : c40857f4e79796a793377f8ddbbbeb73 Directories : 5178010c7b174506 Symbolic links : 0 Other : 0 Total : 6472975473b38d2 Self-heal was complete: # find /bricks/master_brick2 -not -path '*/\.*' -type f | wc -l 15028 # find /bricks/master_brick3/ -not -path '*/\.*' -type f | wc -l 15028 In the geo-rep logs, I see there are 326 files in the SKIPPED section but when compared when compared there are more than 326 files missing.. [2015-01-07 13:02:37.657751] W [master(/bricks/master_brick2):996:process] _GMaster: changelogs CHANGELOG.1420615598 CHANGELOG.1420615613 CHANGE LOG.1420615628 could not be processed - moving on... [2015-01-07 13:02:37.661405] W [master(/bricks/master_brick2):1000:process] _GMaster: SKIPPED GFID = 4242dc96-1b85-48fa-b30a-394ccc5242cd,309c25 fe-c8fe-4148-9cc0-25af2888853d,606e3871-318f-4c77-a9cf-f2f7efffc3e2,976cd720-50a3-4170-bf75-278472f44533,a5428043-8c45-4c70-baf5-df95ea962fe2,78 b7871d-4ca8-4313-8ce9-34501010cd26,82536f00-21a6-484b-a14d-3f727062657c,09f03143-3b6b-42e8-bbd4-3350359984f0.....
Hi Aravinda, Can you please review the edited doc text and sign off?
(In reply to Pavithra from comment #2) > Hi Aravinda, > > Can you please review the edited doc text and sign off? doc text looks good to me.
Made a minor edit.
Verified with build: glusterfs-3.7.1-10.el6rhs.x86_64 Once the node is re-installed which got the same IP. Followed the steps mentioned in http://documentation-devel.engineering.redhat.com/site/documentation/en-US/Red_Hat_Storage/3/html-single/Administration_Guide/index.html#Replacing_a_Host_Machine_with_the_Same_Hostname Once the geo-rep started, it started performed the Hybrid crawl and sync the data to the slave. Master: ======= [root@wingo ~]# find /mnt/6m | wc -l 7565 [root@wingo ~]# [root@wingo scripts]# arequal-checksum -p /mnt/6m Entry counts Regular files : 4723 Directories : 871 Symbolic links : 1971 Other : 0 Total : 7565 Metadata checksums Regular files : 47a9e5 Directories : 24d481 Symbolic links : 5a815a Other : 3e9 Checksums Regular files : 2bc0ad4daf6f43f398647ccb254094b5 Directories : 5f77734b7d784455 Symbolic links : 7a33023b4e214744 Other : 0 Total : 96e0a0f6b976d457 [root@wingo scripts]# Slave: ====== [root@wingo ~]# find /mnt/6s | wc -l 7565 [root@wingo ~]# [root@wingo scripts]# arequal-checksum -p /mnt/6s Entry counts Regular files : 4723 Directories : 871 Symbolic links : 1971 Other : 0 Total : 7565 Metadata checksums Regular files : 47a9e5 Directories : 24d481 Symbolic links : 5a815a Other : 3e9 Checksums Regular files : 2bc0ad4daf6f43f398647ccb254094b5 Directories : 5f77734b7d784455 Symbolic links : 7a33023b4e214744 Other : 0 Total : 96e0a0f6b976d457 [root@wingo scripts]# The node which was re-installed was ACTIVE before and after re-installation, the node becomes ACTIVE {Without use-meta-volume} Data was synced to the replicate brick using "heal full" before the geo-rep session is created.
Hi Aravinda, Please review the doc-text and sign-off if this looks ok.
Changing the doc text flag to +
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html