Description of problem: ======================= offlined brick process on server1 automatically starts when other server3 in cluster is powered off Version-Release number of selected component (if applicable): ============================================================= [root@rhs-client11 ~]# rpm -qa | grep gluster glusterfs-debuginfo-3.4.0.1rhs-1.el6rhs.x86_64 glusterfs-fuse-3.4.0.1rhs-1.el6rhs.x86_64 gluster-swift-container-1.4.8-4.el6.noarch gluster-swift-1.4.8-4.el6.noarch gluster-swift-doc-1.4.8-4.el6.noarch vdsm-gluster-4.10.2-4.0.qa5.el6rhs.noarch gluster-swift-plugin-1.0-5.noarch gluster-swift-proxy-1.4.8-4.el6.noarch gluster-swift-account-1.4.8-4.el6.noarch glusterfs-geo-replication-3.4.0.1rhs-1.el6rhs.x86_64 org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch glusterfs-3.4.0.1rhs-1.el6rhs.x86_64 glusterfs-server-3.4.0.1rhs-1.el6rhs.x86_64 glusterfs-rdma-3.4.0.1rhs-1.el6rhs.x86_64 gluster-swift-object-1.4.8-4.el6.noarch [root@rhs-client11 ~]# How reproducible: ================= 1/1 Steps to Reproduce: =================== 1. Create 2*2 setup spanning on 4 storage server (server1 to server4) 2. Offline the brick process by killing its pid on server1 3. Poweroff server3 4. Noticed server1 brick process is online again Actual results: =============== Process should still be offline. Log-snippet step by step 1. Volume type: [root@rhs-client11 ~]# gluster v i dist-rep Volume Name: dist-rep Type: Distributed-Replicate Volume ID: 0ff921a8-a01e-4b20-b9aa-c2c612b87a46 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.36.35:/rhs/brick1/dr1 Brick2: 10.70.36.36:/rhs/brick1/dr2 Brick3: 10.70.36.37:/rhs/brick1/dr3 Brick4: 10.70.36.38:/rhs/brick1/dr4 2. Volume status and their pid [root@rhs-client11 ~]# gluster volume status dist-rep Status of volume: dist-rep Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.36.35:/rhs/brick1/dr1 49164 Y 12895 Brick 10.70.36.36:/rhs/brick1/dr2 49170 Y 7203 Brick 10.70.36.37:/rhs/brick1/dr3 49168 Y 3349 Brick 10.70.36.38:/rhs/brick1/dr4 49166 Y 1299 NFS Server on localhost 38467 Y 12688 Self-heal Daemon on localhost N/A Y 12701 NFS Server on e815f97d-4b98-4501-a19f-c9f1d0a9cd2e 38467 Y 1309 Self-heal Daemon on e815f97d-4b98-4501-a19f-c9f1d0a9cd2 e N/A Y 1321 NFS Server on 91d06293-a9d4-467f-bc09-a0d929ebacac 38467 Y 7213 Self-heal Daemon on 91d06293-a9d4-467f-bc09-a0d929ebaca c N/A Y 7225 NFS Server on 53ded8aa-05eb-4d57-a4d4-3db78bbde921 38467 Y 3601 Self-heal Daemon on 53ded8aa-05eb-4d57-a4d4-3db78bbde92 1 N/A Y 3596 There are no active volume tasks 3. Killed the brick process on 10.70.36.35 using "kill -9 <pid>" [root@rhs-client11 ~]# kill -9 12895 [root@rhs-client11 ~]# 4. Volume status confirms that the brick is offline [root@rhs-client11 ~]# gluster volume status dist-rep Status of volume: dist-rep Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.36.35:/rhs/brick1/dr1 N/A N 12895 Brick 10.70.36.36:/rhs/brick1/dr2 49170 Y 7203 Brick 10.70.36.37:/rhs/brick1/dr3 49168 Y 3349 Brick 10.70.36.38:/rhs/brick1/dr4 49166 Y 1299 NFS Server on localhost 38467 Y 12688 Self-heal Daemon on localhost N/A Y 12701 NFS Server on 53ded8aa-05eb-4d57-a4d4-3db78bbde921 38467 Y 3601 Self-heal Daemon on 53ded8aa-05eb-4d57-a4d4-3db78bbde92 1 N/A Y 3596 NFS Server on 91d06293-a9d4-467f-bc09-a0d929ebacac 38467 Y 7213 Self-heal Daemon on 91d06293-a9d4-467f-bc09-a0d929ebaca c N/A Y 7225 NFS Server on e815f97d-4b98-4501-a19f-c9f1d0a9cd2e 38467 Y 1309 Self-heal Daemon on e815f97d-4b98-4501-a19f-c9f1d0a9cd2 e N/A Y 1321 There are no active volume tasks 5. Powered down the server3 (10.70.36.37) using poweroff 6. Executed command gluster volume status which shows the process is Online [root@rhs-client11 ~]# [root@rhs-client11 ~]# gluster volume status dist-rep Status of volume: dist-rep Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.36.35:/rhs/brick1/dr1 49164 Y 13151 Brick 10.70.36.36:/rhs/brick1/dr2 49170 Y 7203 Brick 10.70.36.38:/rhs/brick1/dr4 49166 Y 1299 NFS Server on localhost 38467 Y 12688 Self-heal Daemon on localhost N/A Y 12701 NFS Server on e815f97d-4b98-4501-a19f-c9f1d0a9cd2e 38467 Y 1309 Self-heal Daemon on e815f97d-4b98-4501-a19f-c9f1d0a9cd2 e N/A Y 1321 NFS Server on 91d06293-a9d4-467f-bc09-a0d929ebacac 38467 Y 7213 Self-heal Daemon on 91d06293-a9d4-467f-bc09-a0d929ebaca c N/A Y 7225 There are no active volume tasks [root@rhs-client11 ~]# Expected results: ================= It should not start the offline process of different server1.
Verified with the build: glusterfs-3.4.0.4rhs-1.el6rhs.x86_64 Powering off one server is not bringing the brick process online on other server in cluster. Works as expected. log snippet: ============ [root@rhs-client11 ~]# gluster volume status Status of volume: vol-dis-rep Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.36.35:/rhs/brick1/b1 N/A N 5293 Brick 10.70.36.36:/rhs/brick1/b2 49152 Y 5269 Brick 10.70.36.35:/rhs/brick1/b3 N/A N 5302 Brick 10.70.36.36:/rhs/brick1/b4 49153 Y 5278 Brick 10.70.36.35:/rhs/brick1/b5 N/A N 5311 Brick 10.70.36.36:/rhs/brick1/b6 49154 Y 5287 Brick 10.70.36.37:/rhs/brick1/b7 49152 Y 5271 Brick 10.70.36.38:/rhs/brick1/b8 49152 Y 5269 Brick 10.70.36.37:/rhs/brick1/b9 49153 Y 5280 Brick 10.70.36.38:/rhs/brick1/b10 49153 Y 5278 Brick 10.70.36.37:/rhs/brick1/b11 49154 Y 5289 Brick 10.70.36.38:/rhs/brick1/b12 49154 Y 5287 NFS Server on localhost 2049 Y 5323 Self-heal Daemon on localhost N/A Y 5327 NFS Server on c6b5d4e9-3782-457c-8542-f32b0941ed05 2049 Y 5299 Self-heal Daemon on c6b5d4e9-3782-457c-8542-f32b0941ed0 5 N/A Y 5303 NFS Server on f9cc4b9c-97e1-4f65-9657-3b050d45296e 2049 Y 5299 Self-heal Daemon on f9cc4b9c-97e1-4f65-9657-3b050d45296 e N/A Y 5303 NFS Server on 6962d204-37c8-436b-8ea6-a9698be40ec6 2049 Y 5301 Self-heal Daemon on 6962d204-37c8-436b-8ea6-a9698be40ec 6 N/A Y 5305 There are no active volume tasks [root@rhs-client11 ~]# [root@rhs-client11 ~]# [root@rhs-client11 ~]# [root@rhs-client11 ~]# gluster volume status Status of volume: vol-dis-rep Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.36.35:/rhs/brick1/b1 N/A N 5293 Brick 10.70.36.36:/rhs/brick1/b2 49152 Y 5269 Brick 10.70.36.35:/rhs/brick1/b3 N/A N 5302 Brick 10.70.36.36:/rhs/brick1/b4 49153 Y 5278 Brick 10.70.36.35:/rhs/brick1/b5 N/A N 5311 Brick 10.70.36.36:/rhs/brick1/b6 49154 Y 5287 Brick 10.70.36.38:/rhs/brick1/b8 49152 Y 5269 Brick 10.70.36.38:/rhs/brick1/b10 49153 Y 5278 Brick 10.70.36.38:/rhs/brick1/b12 49154 Y 5287 NFS Server on localhost 2049 Y 5323 Self-heal Daemon on localhost N/A Y 5327 NFS Server on c6b5d4e9-3782-457c-8542-f32b0941ed05 2049 Y 5299 Self-heal Daemon on c6b5d4e9-3782-457c-8542-f32b0941ed0 5 N/A Y 5303 NFS Server on f9cc4b9c-97e1-4f65-9657-3b050d45296e 2049 Y 5299 Self-heal Daemon on f9cc4b9c-97e1-4f65-9657-3b050d45296 e N/A Y 5303 There are no active volume tasks [root@rhs-client11 ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html