Description of problem: ========================= In a 6 x 2 distribute-replicate volume , starting glusterd on a node on which all gluster processes were offline , brings all the bricks process online but self-heal daemon process is not started on that node. Version-Release number of selected component (if applicable): ============================================================== root@hicks [Jul-02-2013-18:24:15] >gluster --version glusterfs 3.4.0.12rhs.beta1 built on Jun 28 2013 06:41:38 root@hicks [Jul-02-2013-18:31:57] >rpm -qa | grep glusterfs glusterfs-geo-replication-3.4.0.12rhs.beta1-1.el6rhs.x86_64 glusterfs-fuse-3.4.0.12rhs.beta1-1.el6rhs.x86_64 glusterfs-3.4.0.12rhs.beta1-1.el6rhs.x86_64 glusterfs-rdma-3.4.0.12rhs.beta1-1.el6rhs.x86_64 org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch glusterfs-server-3.4.0.12rhs.beta1-1.el6rhs.x86_64 How reproducible: ================ Steps to Reproduce: ===================== 1. Create 6 x 2 distribute replicate volume with 4 storage nodes and 3 bricks on each storage node. node1 is replica of node2 , node3 is replica of node4. 2. Start the volume. Create fuse and nfs mount. 3. killall glusterfs ; killall glusterfsd ; killall glusterd on node2 and node3 4. Start creating files from the mount points. ( around 5k files were created ) 5. while file creation is in progress on mount point on node2 execute : "service glusterd start" . ( node3 remains offline ) 6. execute : "gluster v status" Actual results: ================== root@hicks [Jul-02-2013-18:02:46] >service glusterd start Starting glusterd: [ OK ] root@hicks [Jul-02-2013-18:02:54] >service glusterd status glusterd (pid 5536) is running... root@hicks [Jul-02-2013-18:02:59] > root@hicks [Jul-02-2013-18:03:00] >gluster v status Status of volume: vol_dis_rep Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick king:/rhs/brick1/brick0 49152 Y 4412 Brick hicks:/rhs/brick1/brick1 49152 Y 5657 Brick king:/rhs/brick1/brick2 49153 Y 4423 Brick hicks:/rhs/brick1/brick3 49153 Y 5670 Brick king:/rhs/brick1/brick4 49154 Y 4434 Brick hicks:/rhs/brick1/brick5 49154 Y 5666 Brick lizzie:/rhs/brick1/brick7 49152 Y 4369 Brick lizzie:/rhs/brick1/brick9 49153 Y 4380 Brick lizzie:/rhs/brick1/brick11 49154 Y 4391 NFS Server on localhost 2049 Y 5678 Self-heal Daemon on localhost N/A N N/A NFS Server on d738e7cb-9bff-4988-807e-10fc5f9f4a64 2049 Y 4446 Self-heal Daemon on d738e7cb-9bff-4988-807e-10fc5f9f4a6 4 N/A Y 4452 NFS Server on ddda6b3a-d570-47fd-a07c-465c328610b4 2049 Y 4403 Self-heal Daemon on ddda6b3a-d570-47fd-a07c-465c328610b 4 N/A Y 4410 There are no active volume tasks Expected results: ================= Self-heal daemon process on node2 should be started when glusterd is started. Additional info: =================== root@king [Jul-02-2013-14:53:57] >gluster peer status Number of Peers: 3 Hostname: hicks Uuid: e741f4ff-0b1e-4445-8434-e2913030d849 State: Peer in Cluster (Connected) Hostname: luigi Uuid: 72f6a0e2-6f03-4e7c-91e0-9ac5dae8a729 State: Peer in Cluster (Connected) Hostname: lizzie Uuid: ddda6b3a-d570-47fd-a07c-465c328610b4 State: Peer in Cluster (Connected) root@king [Jul-02-2013-16:55:45] >gluster v info Volume Name: vol_dis_rep Type: Distributed-Replicate Volume ID: 414a0903-6b64-4cd1-9dbf-8eb0e20b6e3b Status: Created Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: king:/rhs/brick1/brick0 Brick2: hicks:/rhs/brick1/brick1 Brick3: king:/rhs/brick1/brick2 Brick4: hicks:/rhs/brick1/brick3 Brick5: king:/rhs/brick1/brick4 Brick6: hicks:/rhs/brick1/brick5 Brick7: luigi:/rhs/brick1/brick6 Brick8: lizzie:/rhs/brick1/brick7 Brick9: luigi:/rhs/brick1/brick8 Brick10: lizzie:/rhs/brick1/brick9 Brick11: luigi:/rhs/brick1/brick10 Brick12: lizzie:/rhs/brick1/brick11 root@king [Jul-02-2013-16:55:47] >
Created attachment 767724 [details] SOS Reports and Other Useful Information
*** Bug 980097 has been marked as a duplicate of this bug. ***
When are we getting a patch for this... all the proactive self-heal cases are blocked by this bug.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html