Bug 1287107 - [georep][sharding] Unable to resume geo-rep session after previous errors
[georep][sharding] Unable to resume geo-rep session after previous errors
Status: CLOSED WORKSFORME
Product: GlusterFS
Classification: Community
Component: geo-replication (Show other bugs)
mainline
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Aravinda VK
: Triaged
Depends On:
Blocks: Gluster-HC-1
  Show dependency treegraph
 
Reported: 2015-12-01 09:07 EST by Sahina Bose
Modified: 2016-03-01 06:37 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-01 06:37:48 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
georep-master-log (7.12 KB, text/plain)
2015-12-01 09:07 EST, Sahina Bose
no flags Details
georep-slave-log (313.54 KB, text/plain)
2015-12-01 09:09 EST, Sahina Bose
no flags Details

  None (edit)
Description Sahina Bose 2015-12-01 09:07:53 EST
Created attachment 1100927 [details]
georep-master-log

Description of problem:

Geo-replication session that was running on a sharded volume resulted in failures due to lack of space at slave volume.

Geo-rep session was stopped, slave volume disk space was extended (using lvextend on underlying brick mount point), and geo-replication session was resumed.

But looking at geo-rep status detail shows failures and it seems that files are not being synced.

Status detail and volume info in Additional info

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Setup geo-replication session between master and slave (slave volume has lesser capacity than master)
2. Start geo-rep 
3. Create data in master volume more than slave capacity
4. geo-rep status will report failures (seen in status detail as failure count)
5. stop geo-rep session
6. Increase capacity of slave volume (in my case, I extended brick lv by adding additional vdisk to VM hosting slave)
7. start geo-rep session again

Actual results:


Expected results:


Additional info:

# gluster vol geo-replication data1 10.70.40.112::hc-slavevol  status detail
 
MASTER NODE                              MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                        SLAVE NODE      STATUS     CRAWL STATUS     LAST_SYNCED            ENTRY    DATA    META    FAILURES    CHECKPOINT TIME        CHECKPOINT COMPLETED    CHECKPOINT COMPLETION TIME   
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
rhsdev-docker1.lab.eng.blr.redhat.com    data1         /rhgs/data1/b1    root          10.70.40.112::hc-slavevol    10.70.40.112    Passive    N/A              N/A                    N/A      N/A     N/A     N/A         N/A                    N/A                     N/A                          
rhsdev9.lab.eng.blr.redhat.com           data1         /rhgs/data1/b1    root          10.70.40.112::hc-slavevol    10.70.40.112    Active     History Crawl    2015-11-26 15:18:56    0        7226    0       107         2015-12-01 18:09:11    No                      N/A                          
rhsdev-docker2.lab.eng.blr.redhat.com    data1         /rhgs/data1/b1    root          10.70.40.112::hc-slavevol    10.70.40.112    Passive    N/A              N/A                    N/A      N/A     N/A     N/A         N/A                    N/A

Master volume:
Volume Name: data1
Type: Replicate
Volume ID: 55bd10b0-f05a-446b-a481-6590cc400263
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: rhsdev9.lab.eng.blr.redhat.com:/rhgs/data1/b1
Brick2: rhsdev-docker2.lab.eng.blr.redhat.com:/rhgs/data1/b1
Brick3: rhsdev-docker1.lab.eng.blr.redhat.com:/rhgs/data1/b1
Options Reconfigured:
performance.readdir-ahead: on
performance.low-prio-threads: 32
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on

Slave volume:
Volume Name: hc-slavevol
Type: Distribute
Volume ID: 56a3d4d9-51bc-4daf-9257-bd13e10511ae
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.70.40.112:/brick/hc1
Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: on
features.shard: on
features.shard-block-size: 512MB
Comment 1 Sahina Bose 2015-12-01 09:09 EST
Created attachment 1100928 [details]
georep-slave-log
Comment 2 Aravinda VK 2015-12-02 00:38:19 EST
Looks like log has only partial details. Is it possible to attach the logs before disk expansion.(Interested in Failures related to disk space)
Comment 3 Sahina Bose 2016-03-01 06:37:48 EST
I don't have the setup anymore. Closing this for now, as I have not run into this again.
Will re-open if I hit it.

Note You need to log in before you can comment on or make changes to this bug.