Description of problem: I set the checkpoint and the status says checkpoint completed even before completing the actual checkpoint. But there is different bug for it (1025358). Now at this stage if the node reboots, after it comes back up, the status of that particular node goes to faulty with the python back trace in the log file. Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 233, in twrap tf(*aa) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 546, in checkpt_service gconf.confdata.delete('checkpoint-completed') AttributeError: 'GConf' object has no attribute 'confdata' Version-Release number of selected component (if applicable): glusterfs-3.4.0.39rhs-1.el6rhs.x86_64 How reproducible: Consistently Steps to Reproduce: 1. Create and start geo-rep session between 2*2 dist-rep master and 2*2 dist-rep slave node. 2. Now start creating data on the mountpoint. Use can either untar the linux kernel or use smallfiles_cli.py 3. Now set the checkpoint 4. Now the geo-rep status says checkpoint completed. AT this point reboot a node. 5. Now run geo-rep status Actual results: Status goes faulty. NODE MASTER SLAVE HEALTH UPTIME -------------------------------------------------------------------------------------------------------------------------------------------------------- harrier.blr.redhat.com master falcon::slave faulty N/A typhoon.blr.redhat.com master falcon::slave Stable | checkpoint as of 2013-11-06 14:08:00 is completed at 2013-11-06 14:08:18 19:43:55 spitfire.blr.redhat.com master falcon::slave Stable | checkpoint as of 2013-11-06 14:08:00 is completed at 2013-11-06 14:08:17 19:44:00 mustang.blr.redhat.com master falcon::slave Stable | checkpoint as of 2013-11-06 14:08:00 is completed at 2013-11-06 14:08:18 19:43:55 And the log file has following back trace [2013-11-06 14:15:39.476916] E [syncdutils(/rhs/bricks/brick2):207:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 233, in twrap tf(*aa) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 546, in checkpt_service gconf.confdata.delete('checkpoint-completed') AttributeError: 'GConf' object has no attribute 'confdata' [2013-11-06 14:15:39.478607] I [syncdutils(/rhs/bricks/brick2):159:finalize] <top>: exiting. [2013-11-06 14:15:39.482624] I [monitor(monitor):81:set_state] Monitor: new state: faulty Expected results: Status should not go faulty and there should no python exception. Additional info:
Amar, Is the following patch available in this build? https://code.engineering.redhat.com/gerrit/#/c/15289/ In that patch gconf.confdata.delete('checkpoint-completed') is changed to gconf.configinterface.delete('checkpoint-completed') and that should fix the issue.
verified on the build glusterfs-3.4.0.43rhs-1. MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- redcloak.blr.redhat.com master /bricks/brick2 10.70.43.76::slave Passive N/A N/A redwood.blr.redhat.com master /bricks/brick4 10.70.42.151::slave Passive N/A N/A redlake.blr.redhat.com master /bricks/brick3 10.70.43.135::slave Active checkpoint as of 2013-11-13 18:14:30 is completed at 2013-11-13 18:22:25 Changelog Crawl redcell.blr.redhat.com master /bricks/brick1 10.70.43.174::slave Active checkpoint as of 2013-11-13 18:14:30 is completed at 2013-11-13 18:22:25 Changelog Crawl
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1769.html