Bug 1537602 - Georeplication tests intermittently fail
Summary: Georeplication tests intermittently fail
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kotresh HR
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1539657
TreeView+ depends on / blocked
 
Reported: 2018-01-23 15:15 UTC by Nigel Babu
Modified: 2018-10-23 15:06 UTC (History)
2 users (show)

Fixed In Version: glusterfs-5.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1539657 (view as bug list)
Environment:
Last Closed: 2018-06-20 17:58:10 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Nigel Babu 2018-01-23 15:15:05 UTC
The tests fail due to a possible configuration issue. Shyam and I debugged it down to that. Going to disable the test to unblock all the other reviews.

Comment 1 Worker Ant 2018-01-23 15:18:46 UTC
REVIEW: https://review.gluster.org/19301 (tests: Disable geo-rep tests) posted (#1) for review on master by Nigel Babu

Comment 2 Shyamsundar 2018-01-23 15:24:54 UTC
The test fails in the following runs:

https://build.gluster.org/job/centos6-regression/8602/console
https://build.gluster.org/job/centos6-regression/8604/console
https://build.gluster.org/job/centos6-regression/8607/console
https://build.gluster.org/job/centos6-regression/8608/console
https://build.gluster.org/job/centos6-regression/8612/console

Failure is almost always when checking for which nodes are in "Active" and "Passive" states, 

06:58:02 not ok 22 Got "1" instead of "2", LINENUM:83
06:58:02 FAILED COMMAND: 2 check_status_num_rows Passive
AND/OR
06:58:02 not ok 37 Got "1" instead of "2", LINENUM:102
06:58:02 FAILED COMMAND: 2 check_status_num_rows Passive

On checking slave25 and rerunning this test (from a fresh clone of the sources etc.) post step in line 83 it is noted that the command output looks as follows,


[root@slave25 ~]# gluster volume geo-replication master 127.0.0.1::slave status detail
 
MASTER NODE                  MASTER VOL    MASTER BRICK           SLAVE USER    SLAVE               SLAVE NODE                   STATUS             CRAWL STATUS       LAST_SYNCED            ENTRY    DATA    META    FAILURES    CHECKPOINT TIME    CHECKPOINT COMPLETED    CHECKPOINT COMPLETION TIME   
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
slave25.cloud.gluster.org    master        /d/backends/master1    root          127.0.0.1::slave    slave25.cloud.gluster.org    Active             Changelog Crawl    2018-01-23 14:17:38    7        0       0       0           N/A                N/A                     N/A                          
slave25.cloud.gluster.org    master        /d/backends/master2    root          127.0.0.1::slave    N/A                          Faulty             N/A                N/A                    N/A      N/A     N/A     N/A         N/A                N/A                     N/A                          
slave25.cloud.gluster.org    master        /d/backends/master3    root          127.0.0.1::slave    N/A                          Initializing...    N/A                N/A                    N/A      N/A     N/A     N/A         N/A                N/A                     N/A                          
slave25.cloud.gluster.org    master        /d/backends/master4    root          127.0.0.1::slave    slave25.cloud.gluster.org    Active             Changelog Crawl    2018-01-23 14:17:38    9        0       0       0           N/A                N/A                     N/A   

The above never recovers, and so is not a timing issue per-se.

Can someone from the geo-rep team take a look at the logs from those runs to determine what is going wrong and why is the status "Faulty" or Initializing" as that seem to be th estart of the test failure.

Comment 3 Worker Ant 2018-01-24 02:11:52 UTC
COMMIT: https://review.gluster.org/19301 committed in master by \"Nigel Babu\" <nigelb@redhat.com> with a commit message- tests: Disable geo-rep tests

These tests are prone to issues at the moment that need further
debugging and fixing.

BUG: 1537602
Change-Id: Ic59ca620925c6f43948b8a751eaddb571b791969
Signed-off-by: Nigel Babu <nigelb@redhat.com>

Comment 4 Worker Ant 2018-06-11 07:11:56 UTC
REVIEW: https://review.gluster.org/20208 (tests: Increase timeout for geo-rep testcases) posted (#1) for review on master by Kotresh HR

Comment 5 Worker Ant 2018-06-11 07:13:40 UTC
REVIEW: https://review.gluster.org/20209 (tests: Enable geo-rep test cases) posted (#1) for review on master by Kotresh HR

Comment 6 Worker Ant 2018-06-18 03:56:41 UTC
COMMIT: https://review.gluster.org/20208 committed in master by "Amar Tumballi" <amarts@redhat.com> with a commit message- tests: Increase timeout for geo-rep testcases

Change-Id: I11715c23fddc1dd218a3ab88a55f1d6355dcfd43
Updates: bz#1537602
Signed-off-by: Kotresh HR <khiremat@redhat.com>

Comment 7 Worker Ant 2018-06-18 05:25:38 UTC
COMMIT: https://review.gluster.org/20209 committed in master by "Amar Tumballi" <amarts@redhat.com> with a commit message- tests: Enable geo-rep test cases

Fixes: bz#1537602
Change-Id: I12314262aaa80f8b7818170112529bf62ab93d3f
Signed-off-by: Kotresh HR <khiremat@redhat.com>

Comment 8 Shyamsundar 2018-06-20 17:58:10 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report.

glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 9 Shyamsundar 2018-10-23 15:06:35 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.