Bug 1024313
| Summary: | self-heal happening from sink to source in 3 way replica | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | spandura |
| Component: | replicate | Assignee: | Ravishankar N <ravishankar> |
| Status: | CLOSED EOL | QA Contact: | spandura |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 2.1 | CC: | nsathyan, ravishankar, rhs-bugs, rmekala, sdharane, storage-qa-internal, vagarwal, vbellur |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-12-03 17:19:13 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Verified the fix on the build "glusterfs 3.4.0.52rhs built on Dec 19 2013 12:20:16" using the same steps as mentioned in the description of the bug. Bug is not yet fixed. Even now getting the invalid matrix in the INFO message. Moving the bug to "ASSIGNED" state. Info message about heal in glustershd.log =========================================== [2014-01-02 09:00:58.035489] I [afr-self-heal-common.c:2877:afr_log_self_heal_completion_status] 0-vol_rep-replicate-0: metadata self heal is successfully completed, foreground data self heal is successfully completed, data self heal from vol_rep-client-0 to sinks vol_rep-client-2, with 0 bytes on vol_rep-client-0, 0 bytes on vol_rep-client-1, 0 bytes on vol_rep-client-2, data - Pending matrix: [ [ 0 0 2 ] [ 0 0 1 ] [ 0 0 0 ] ] metadata self heal from source vol_rep-client-1 to vol_rep-client-0, vol_rep-client-2, metadata - Pending matrix: [ [ 0 0 4 ] [ 1 0 4 ] [ 0 0 0 ] ], on <gfid:817340e2-9b15-4c18-813d-33d952839f06> Extended attributes of file on brick1 before self-heal: ======================================================= root@rhs-client11 [Jan-02-2014- 9:00:25] >gattr /rhs/bricks/b1/test_file getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/b1/test_file trusted.afr.vol_rep-client-0=0x000000000000000000000000 trusted.afr.vol_rep-client-1=0x000000000000000000000000 trusted.afr.vol_rep-client-2=0x000000010000000200000000 trusted.gfid=0x817340e29b154c18813d33d952839f06 Extended attributes of file on brick2 before self-heal: ======================================================= root@rhs-client12 [Jan-02-2014- 9:00:25] >gattr /rhs/bricks/b1-rep1/test_file getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/b1-rep1/test_file trusted.afr.vol_rep-client-0=0x000000000000000100000000 trusted.afr.vol_rep-client-1=0x000000000000000000000000 trusted.afr.vol_rep-client-2=0x000000010000000300000000 trusted.gfid=0x817340e29b154c18813d33d952839f06 Extended attributes of file on brick3 before self-heal: ======================================================= root@rhs-client13 [Jan-02-2014- 9:00:25] >gattr /rhs/bricks/b1-rep2/test_file getfattr: /rhs/bricks/b1-rep2/test_file: No such file or directory Per tiger team call, removing it from corbett Tested with 3.1.2 (afrv2.0) and not able to reproduce the reported problem and as per the Dev this is fixed as part of v2 implementation so marking this bug as verified Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/ If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release. |
Description of problem: ======================= In a 3-way replica setup self-heal is happening from sink node to source when nodes goes offline and comes back online. Version-Release number of selected component (if applicable): =========================================================== glusterfs 3.4.0.36rhs built on Oct 22 2013 10:56:18 How reproducible: ===================== Executed the test case only once. Steps to Reproduce: ===================== 1.Create 1 x 3 replicate volume. Start the volume. Create a fuse mount 2.killall gluster process on node3 { killall glusterfs glusterfsd glusterd } 3.From mount point create a file "touch test_file" 4. killall gluster process on node1 { killall glusterfs glusterfsd glusterd } 5. From mount point execute: "touch test_file" 6. restart glusterd on all the nodes. Actual results: ================ 1. Entry self-heal happened from node2. [ As expected ] 2. Metadata and data Self-heal happened from node1. [2013-10-29 09:14:25.720195] I [afr-self-heal-common.c:2840:afr_log_self_heal_completion_status] 0-vol_rep-replicate-0: metadata self heal is successfully completed, foreground data self heal is successfully completed, from vol_rep-client-0 with 0 0 0 sizes - Pending matrix: [ [ 0 0 2 ] [ 0 0 1 ] [ 0 0 0 ] ] on <gfid:b8309224-af45-440a-980e-aa588cbeeb8b> 3. In glustershd.log file the matrix shows wrong data. foreground data self heal is successfully completed, from vol_rep-client-0 with 0 0 0 sizes - Pending matrix: [ [ 0 0 2 ] [ 0 0 1 ] [ 0 0 0 ] ] on <gfid:b8309224-af45-440a-980e-aa588cbeeb8b> Extended attributes of file on Brick1 before self-heal ======================================================== root@rhs-client11 [Oct-29-2013- 9:13:48] >getfattr -d -e hex -m . /rhs/bricks/b1/test_file getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/b1/test_file trusted.afr.vol_rep-client-0=0x000000000000000000000000 trusted.afr.vol_rep-client-1=0x000000000000000000000000 trusted.afr.vol_rep-client-2=0x000000010000000200000000 trusted.gfid=0xb8309224af45440a980eaa588cbeeb8b Extended attributes of file on Brick2 before self-heal ======================================================== root@rhs-client12 [Oct-29-2013- 9:13:41] >getfattr -d -e hex -m . /rhs/bricks/b2/testfile getfattr: /rhs/bricks/b2/testfile: No such file or directory root@rhs-client12 [Oct-29-2013- 9:13:48] >getfattr -d -e hex -m . /rhs/bricks/b2/test_file getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/b2/test_file trusted.afr.vol_rep-client-0=0x000000000000000100000000 trusted.afr.vol_rep-client-1=0x000000000000000000000000 trusted.afr.vol_rep-client-2=0x000000010000000300000000 trusted.gfid=0xb8309224af45440a980eaa588cbeeb8b Extended attributes of file on Brick3 before self-heal ======================================================== root@rhs-client13 [Oct-29-2013- 9:13:47] >getfattr -d -e hex -m . /rhs/bricks/b3/test_file getfattr: /rhs/bricks/b3/test_file: No such file or directory 3. heal <volume_name> info healed command output : root@rhs-client11 [Oct-29-2013- 9:15:04] >gluster v heal vol_rep info healed Gathering list of healed entries on volume vol_rep has been successful Brick rhs-client11:/rhs/bricks/b1 Number of entries: 1 at path on brick ----------------------------------- 2013-10-29 09:14:25 /test_file Brick rhs-client12:/rhs/bricks/b2 Number of entries: 1 at path on brick ----------------------------------- 2013-10-29 09:14:26 / Brick rhs-client13:/rhs/bricks/b3 Number of entries: 0 Expected results: ================== Metadata self-heal and data self-heal should have been from node2 to node1 and node3. Additional info: ================= root@rhs-client11 [Oct-29-2013-11:17:19] >gluster v info Volume Name: vol_rep Type: Replicate Volume ID: 8e0dcde2-c326-492d-99fd-2421e951ec3c Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: rhs-client11:/rhs/bricks/b1 Brick2: rhs-client12:/rhs/bricks/b2 Brick3: rhs-client13:/rhs/bricks/b3 root@rhs-client11 [Oct-29-2013-11:18:21] >gluster v status Status of volume: vol_rep Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/bricks/b1 49155 Y 8036 Brick rhs-client12:/rhs/bricks/b2 49155 Y 7695 Brick rhs-client13:/rhs/bricks/b3 49155 Y 26693 NFS Server on localhost 2049 Y 8045 Self-heal Daemon on localhost N/A Y 8049 NFS Server on rhs-client13 2049 Y 26702 Self-heal Daemon on rhs-client13 N/A Y 26706 NFS Server on rhs-client12 2049 Y 9005 Self-heal Daemon on rhs-client12 N/A Y 9012 There are no active volume tasks