Bug 987292
Summary: | [RFE] Dist-geo-rep : Passive replica brick gets changelogs in .processing and processes all of them when it becomes active. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Vijaykumar Koppad <vkoppad> |
Component: | geo-replication | Assignee: | Venky Shankar <vshankar> |
Status: | CLOSED ERRATA | QA Contact: | Vijaykumar Koppad <vkoppad> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 2.1 | CC: | aavati, amarts, bbandari, csaba, grajaiya, kcleveng, pneedle, rhs-bugs, shaines, vagarwal, vkoppad, vshankar |
Target Milestone: | --- | Keywords: | FutureFeature, ZStream |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.4.0.44rhs-1 | Doc Type: | Enhancement |
Doc Text: |
Previously, when a replica pair picks up the active syncing of geo-replication upon its active peer's failure, used to process the whole brick's data to sync data. This was causing lot of delay and redundant data processing. With this enhancement update, this behavior is fixed, and now, upon failover to the passive node, it does very minimal crawl and handles the changes which are not handled by its replica pair only, removing all the unnecessary delays.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2013-11-27 15:29:59 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1025305, 1183918 |
Description
Vijaykumar Koppad
2013-07-23 07:39:24 UTC
Targeting for 3.0.0 (Denali) release. Tried in the build glusterfs-3.4.0.42rhs-1, still it does first xsync crawl, processes the XSYNC-CHANGELOGs like this, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> E f11ad115-c62a-42a9-bb35-1efccafe832c MKDIR 16877 0 0 00000000-0000-0000-0000-000000000001%2Flevel00 E 55a21897-e0a7-4e05-885d-f82b5ff13d3e LINK f11ad115-c62a-42a9-bb35-1efccafe832c%2F527e1c29%25%25M8EVNF3HV7 D 55a21897-e0a7-4e05-885d-f82b5ff13d3e E 799ea081-b2ab-429a-82ae-ed7e744ef3ad LINK f11ad115-c62a-42a9-bb35-1efccafe832c%2F527e1c29%25%25SO3HWDMMLB D 799ea081-b2ab-429a-82ae-ed7e744ef3ad E 69989eb0-3866-4c6b-9cd6-d19cafd04c80 LINK f11ad115-c62a-42a9-bb35-1efccafe832c%2F527e1c29%25%25RL5QANEICG D 69989eb0-3866-4c6b-9cd6-d19cafd04c80 E d54d186a-2490-45c7-bdc4-9e4c7909b6b1 LINK f11ad115-c62a-42a9-bb35-1efccafe832c%2F527e1c29%25%25K2G02N3V6J D d54d186a-2490-45c7-bdc4-9e4c7909b6b1 E 22b9d751-bbbb-4a1a-91ea-31b31b3caf42 LINK f11ad115-c62a-42a9-bb35-1efccafe832c%2F527e1c29%25%25XHX50P3ZKP D 22b9d751-bbbb-4a1a-91ea-31b31b3caf42 E f09504ec-e2fe-4749-9c98-5b9408a64422 LINK f11ad115-c62a-42a9-bb35-1efccafe832c%2F527e1c29%25%25OZMPHG2IEQ D f09504ec-e2fe-4749-9c98-5b9408a64422 E 802e3f49-7c92-4f23-9973-3e49de59119e LINK f11ad115-c62a-42a9-bb35-1efccafe832c%2F527e1c29%25%25I8H3VNIXKV D 802e3f49-7c92-4f23-9973-3e49de59119e E bd337044-b687-4e0d-beff-1e2c782ae2b8 LINK f11ad115-c62a-42a9-bb35-1efccafe832c%2F527e1c29%25%25ZGDJ87029J D bd337044-b687-4e0d-beff-1e2c782ae2b8 E eebcc345-3ec6-4dba-9637-9b651ffc9534 LINK f11ad115-c62a-42a9-bb35-1efccafe832c%2F527e1c29%25%25I4E0T0MVNR D eebcc345-3ec6-4dba-9637-9b651ffc9534 E 79b1fa35-9e44-4df7-a9d5-441ab6c8144e LINK f11ad115-c62a-42a9-bb35-1efccafe832c%2F527e1c29%25%25QDUYHOBGDU D 79b1fa35-9e44-4df7-a9d5-441ab6c8144e >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> why LINK files, when there are no link files created. Files created were just regular files. VijayKumar, The fix was never about the passive should not do the xsync crawl at all... It is about not doing anything 'redundant'. Which means, after a passive node becomes active, it does minimal xsync crawl (ie, not crawling entire filesystem like earlier). After that, process the changelog from only the latest time. Please see if above is achieved, and if yes, you can say the bug is fixed. Amar, Even I wouldn't have worried if it had just done xsync crawl and started with the changelog, instead it generated the a xsync-changelog with the entries like above pasted in comment 5 for all the files in the brick (in my case it had ~5K file), and actually it took some time to process that xsync-changelog. I tried on the build glusterfs-3.4.0.43rhs-1, it is still doing the first xsync crawl, and get the entries in the xsync changelog like this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>. E 4a5a4a8c-4470-4b67-b9b6-810f326ab603 MKDIR 16877 0 0 00000000-0000-0000-0000-000000000001%2Flevel00 E dbf07989-bc2a-4c62-808f-f0ed11b73135 MKNOD 33188 0 0 4a5a4a8c-4470-4b67-b9b6-810f326ab603%2F5281d731%25%258EXB9XKPWM D dbf07989-bc2a-4c62-808f-f0ed11b73135 E bb41540a-8b05-4b33-83fc-5c9ecc954e0b MKNOD 33188 0 0 4a5a4a8c-4470-4b67-b9b6-810f326ab603%2F5281d731%25%251WX5E68VEU D bb41540a-8b05-4b33-83fc-5c9ecc954e0b E c371b687-6b9f-44e2-a8c5-cb96f6971977 MKNOD 33188 0 0 4a5a4a8c-4470-4b67-b9b6-810f326ab603%2F5281d731%25%259JF7ZN2UV0 D c371b687-6b9f-44e2-a8c5-cb96f6971977 E e765f651-54c7-49da-8844-68fe5fcfa4e4 MKNOD 33188 0 0 4a5a4a8c-4470-4b67-b9b6-810f326ab603%2F5281d731%25%25SIKCI7NJ4H D e765f651-54c7-49da-8844-68fe5fcfa4e4 E 84c40447-0cdb-4bee-90bd-888e75191c1e MKNOD 33188 0 0 4a5a4a8c-4470-4b67-b9b6-810f326ab603%2F5281d731%25%258J5K64B78S D 84c40447-0cdb-4bee-90bd-888e75191c1e E 45b5f16e-c612-4a51-a31b-683bc85c66ee MKNOD 33188 0 0 4a5a4a8c-4470-4b67-b9b6-810f326ab603%2F5281d731%25%25CQM1AJ5GSZ D 45b5f16e-c612-4a51-a31b-683bc85c66ee E 1a71c789-0bf4-4b09-9f9c-9e54e470322c MKNOD 33188 0 0 4a5a4a8c-4470-4b67-b9b6-810f326ab603%2F5281d731%25%254LSU0CBQBU D 1a71c789-0bf4-4b09-9f9c-9e54e470322c E 4b92eb5d-35ad-40c4-996b-106f8fc54d3a MKNOD 33188 0 0 4a5a4a8c-4470-4b67-b9b6-810f326ab603%2F5281d731%25%257VSSM3TWG1 D 4b92eb5d-35ad-40c4-996b-106f8fc54d3a E feb3ece0-78a4-4b70-b498-fad811eed0b4 MKNOD 33188 0 0 4a5a4a8c-4470-4b67-b9b6-810f326ab603%2F5281d731%25%25U7C30S161T D feb3ece0-78a4-4b70-b498-fad811eed0b4 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And also, it is working well with the part of skipping the changelogs which are not supposed process from .processing directory. The only problem is first xsync crawl as I mentioned in the comment 9. verified on the build glusterfs-3.4.0.44rhs-1. Now it works fine. these are the logs from geo-rep >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2013-11-13 20:32:10.497644] I [master(/bricks/brick4):1081:crawl] _GMaster: starting hybrid crawl [2013-11-13 20:32:10.583214] I [master(/bricks/brick4):1092:crawl] _GMaster: processing xsync changelog /var/run/gluster/master/ssh%3A%2F%2Froot%4010.70.43.76%3Agluster%3A%2F%2F127.0.0.1%3Aslave/2811da3df46b8e22d3cb492d02d2f14f/xsync/XSYNC-CHANGELOG.1384354930 [2013-11-13 20:32:10.998159] I [master(/bricks/brick4):1087:crawl] _GMaster: finished hybrid crawl syncing [2013-11-13 20:32:11.6500] I [master(/bricks/brick4):415:crawlwrap] _GMaster: primary master with volume id a4275bfe-0a35-47ed-b0a3-818f5d059b0b ... [2013-11-13 20:32:11.117378] I [master(/bricks/brick4):426:crawlwrap] _GMaster: crawl interval: 3 seconds [2013-11-13 20:32:11.226258] I [master(/bricks/brick4):1023:crawl] _GMaster: skipping already processed change: CHANGELOG.1384351489... [2013-11-13 20:32:11.226692] I [master(/bricks/brick4):1023:crawl] _GMaster: skipping already processed change: CHANGELOG.1384351804... [2013-11-13 20:32:11.226928] I [master(/bricks/brick4):1023:crawl] _GMaster: skipping already processed change: CHANGELOG.1384351819... [2013-11-13 20:32:11.227154] I [master(/bricks/brick4):1023:crawl] _GMaster: skipping already processed change: CHANGELOG.1384351834... [2013-11-13 20:32:11.227396] I [master(/bricks/brick4):1023:crawl] _GMaster: skipping already processed change: CHANGELOG.1384351849... [2013-11-13 20:32:11.227643] I [master(/bricks/brick4):1023:crawl] _GMaster: skipping already processed change: CHANGELOG.1384351864... [2013-11-13 20:32:11.227857] I [master(/bricks/brick4):1023:crawl] _GMaster: skipping already processed change: CHANGELOG.1384351879... [2013-11-13 20:32:11.228072] I [master(/bricks/brick4):1023:crawl] _GMaster: skipping already processed change: CHANGELOG.1384351894... [2013-11-13 20:33:12.533623] I [master(/bricks/brick4):439:crawlwrap] _GMaster: 16 crawls, 1 turns >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and it does start hybrid crawl, but generates a zero byte XSYNC-CHANGELOG. So there is almost zero processing time of XSYNC-CHANGELOG. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1769.html |