Bug 1287107
Summary: | [georep][sharding] Unable to resume geo-rep session after previous errors | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Sahina Bose <sabose> | ||||||
Component: | geo-replication | Assignee: | Aravinda VK <avishwan> | ||||||
Status: | CLOSED WORKSFORME | QA Contact: | |||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | mainline | CC: | avishwan, bugs, mselvaga, sabose | ||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-03-01 11:37:48 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1258386 | ||||||||
Attachments: |
|
Created attachment 1100928 [details]
georep-slave-log
Looks like log has only partial details. Is it possible to attach the logs before disk expansion.(Interested in Failures related to disk space) I don't have the setup anymore. Closing this for now, as I have not run into this again. Will re-open if I hit it. |
Created attachment 1100927 [details] georep-master-log Description of problem: Geo-replication session that was running on a sharded volume resulted in failures due to lack of space at slave volume. Geo-rep session was stopped, slave volume disk space was extended (using lvextend on underlying brick mount point), and geo-replication session was resumed. But looking at geo-rep status detail shows failures and it seems that files are not being synced. Status detail and volume info in Additional info Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Setup geo-replication session between master and slave (slave volume has lesser capacity than master) 2. Start geo-rep 3. Create data in master volume more than slave capacity 4. geo-rep status will report failures (seen in status detail as failure count) 5. stop geo-rep session 6. Increase capacity of slave volume (in my case, I extended brick lv by adding additional vdisk to VM hosting slave) 7. start geo-rep session again Actual results: Expected results: Additional info: # gluster vol geo-replication data1 10.70.40.112::hc-slavevol status detail MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- rhsdev-docker1.lab.eng.blr.redhat.com data1 /rhgs/data1/b1 root 10.70.40.112::hc-slavevol 10.70.40.112 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A rhsdev9.lab.eng.blr.redhat.com data1 /rhgs/data1/b1 root 10.70.40.112::hc-slavevol 10.70.40.112 Active History Crawl 2015-11-26 15:18:56 0 7226 0 107 2015-12-01 18:09:11 No N/A rhsdev-docker2.lab.eng.blr.redhat.com data1 /rhgs/data1/b1 root 10.70.40.112::hc-slavevol 10.70.40.112 Passive N/A N/A N/A N/A N/A N/A N/A N/A Master volume: Volume Name: data1 Type: Replicate Volume ID: 55bd10b0-f05a-446b-a481-6590cc400263 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: rhsdev9.lab.eng.blr.redhat.com:/rhgs/data1/b1 Brick2: rhsdev-docker2.lab.eng.blr.redhat.com:/rhgs/data1/b1 Brick3: rhsdev-docker1.lab.eng.blr.redhat.com:/rhgs/data1/b1 Options Reconfigured: performance.readdir-ahead: on performance.low-prio-threads: 32 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB geo-replication.indexing: on geo-replication.ignore-pid-check: on changelog.changelog: on Slave volume: Volume Name: hc-slavevol Type: Distribute Volume ID: 56a3d4d9-51bc-4daf-9257-bd13e10511ae Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 10.70.40.112:/brick/hc1 Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.readdir-ahead: on features.shard: on features.shard-block-size: 512MB