Bug 1439708
| Summary: | [geo-rep]: Geo-replication goes to faulty after upgrade from 3.2.0 to 3.3.0 | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rochelle <rallan> |
| Component: | geo-replication | Assignee: | Kotresh HR <khiremat> |
| Status: | CLOSED ERRATA | QA Contact: | Rahul Hinduja <rhinduja> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.3 | CC: | amukherj, csaba, rhs-bugs, storage-qa-internal |
| Target Milestone: | --- | Keywords: | Regression |
| Target Release: | RHGS 3.3.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | glusterfs-3.8.4-22 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-09-21 04:37:54 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1417151 | ||
Following code in master.py was causing this issue. diff between 3.2.0 master.py and 3.3.0 master.py reveals these additional lines:
if not data_stime or data_stime == URXTIME:
raise NoStimeAvailable()
After commenting and restart geo-replication. It works.
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/102726/ verified with build: glusterfs-geo-replication-3.8.4-22.el6rhs.x86_64 After upgrading Master/Slave cluster from 3.2.0 to 3.3.0 latest version. Able to start geo-replication, it goes into history crawl and becomes changelog. It is working as expecting. Moving the bug to verified state. [root@localhost ~]# gluster volume geo-replication firstvol 10.70.43.185::secvol start Starting geo-replication session between firstvol & 10.70.43.185::secvol has been successful [root@localhost ~]# gluster volume geo-replication firstvol 10.70.43.185::secvol status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED -------------------------------------------------------------------------------------------------------------------------------------------------------- 10.70.43.30 firstvol /rochelle/brick1/b2 root 10.70.43.185::secvol N/A Initializing... N/A N/A 10.70.43.30 firstvol /rochelle/brick5/b3 root 10.70.43.185::secvol N/A Initializing... N/A N/A 10.70.43.148 firstvol /rochelle/brick2/b2 root 10.70.43.185::secvol N/A Initializing... N/A N/A 10.70.43.148 firstvol /rochelle/brick6/b3 root 10.70.43.185::secvol 10.70.43.158 Passive N/A N/A [root@localhost ~]# [root@localhost ~]# gluster volume geo-replication firstvol 10.70.43.185::secvol status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED --------------------------------------------------------------------------------------------------------------------------------------------------------- 10.70.43.30 firstvol /rochelle/brick1/b2 root 10.70.43.185::secvol 10.70.43.185 Active History Crawl 2017-04-10 22:53:07 10.70.43.30 firstvol /rochelle/brick5/b3 root 10.70.43.185::secvol 10.70.43.185 Active History Crawl 2017-04-10 22:53:08 10.70.43.148 firstvol /rochelle/brick2/b2 root 10.70.43.185::secvol 10.70.43.158 Passive N/A N/A 10.70.43.148 firstvol /rochelle/brick6/b3 root 10.70.43.185::secvol 10.70.43.158 Passive N/A N/A [root@localhost ~]# [root@localhost ~]# gluster volume geo-replication firstvol 10.70.43.185::secvol status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ----------------------------------------------------------------------------------------------------------------------------------------------------------- 10.70.43.30 firstvol /rochelle/brick1/b2 root 10.70.43.185::secvol 10.70.43.185 Active Changelog Crawl 2017-04-10 22:53:07 10.70.43.30 firstvol /rochelle/brick5/b3 root 10.70.43.185::secvol 10.70.43.185 Active Changelog Crawl 2017-04-10 22:53:08 10.70.43.148 firstvol /rochelle/brick2/b2 root 10.70.43.185::secvol 10.70.43.158 Passive N/A N/A 10.70.43.148 firstvol /rochelle/brick6/b3 root 10.70.43.185::secvol 10.70.43.158 Passive N/A N/A [root@localhost ~]# Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 |
Description of problem: After upgrading the system from 3.2.0 to 3.3.0 geo-replication status appears faulty with following traceback [2017-04-06 11:32:00.796592] E [syncdutils(/rhs/brick1/b1):296:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 204, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 779, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1572, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 570, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1169, in crawl if not data_stime or data_stime == URXTIME: NameError: global name 'data_stime' is not defined [2017-04-06 11:32:00.800887] I [syncdutils(/rhs/brick1/b1):237:finalize] <top>: exiting. [root@localhost ~]# gluster volume geo-replication status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED --------------------------------------------------------------------------------------------------------------------------------------------- 10.70.43.179 vol0 /rhs/brick1/b1 root ssh://10.70.43.87::vol1 N/A Faulty N/A N/A 10.70.43.179 vol0 /rhs/brick2/b3 root ssh://10.70.43.87::vol1 N/A Faulty N/A N/A 10.70.42.90 vol0 /rhs/brick1/b2 root ssh://10.70.43.87::vol1 10.70.43.87 Passive N/A N/A 10.70.42.90 vol0 /rhs/brick2/b4 root ssh://10.70.43.87::vol1 10.70.43.87 Passive N/A N/A [root@localhost ~]# Version-Release number of selected component (if applicable): ============================================================= glusterfs-geo-replication-3.8.4-21.el6rhs.x86_64 How reproducible: ================= Always Steps to Reproduce: =================== 1. Create a geo-replication setup with 3.2.0. builds 2. Stop the geo-replication session to continue upgrade 3. Follow the inservice upgrade path to upgrade to 3.3.0 4. Start the geo-replication session Actual results: =============== Geo-replication session becomes faulty Expected results: ================= All the workers should be either active or passive