Hide Forgot
Description of problem: ======================= After adding new bricks to a distributed replicate volume, continuous "Connection refused" error messages are being seen in both glustershd and FUSE logs. Version-Release number of selected component (if applicable): 3.8.4-3.el7rhgs.x86_64 How reproducible: ================= Tried only once Steps to Reproduce: ================== 1) Create a distributed replica volume and start it. 2) Enable md-cache required settings on the volume ( Please see gluster volume info output for enabled md-cache settings) 3) Fuse mount the volume on multiple clients. 4) Start running IO from all the clients. 5) Add few bricks to the volume. 6) Initiate rebalance. Continuous "Connection refused" error messages are seen in glustershd and fuse logs. Stopped the on going rebalance to see if that stops spamming messages but that didnt help. Actual results: =============== glustershd and FUSE logs are spammed with continuous "Connection refused" error messages Expected results: ================= As all the brick are up and running there should not be any "Connection refused" error messages.
Looking at the setup especially the stale port map entry it does look like you have deleted/recreated volumes with same name and brick path and killed brick process using kill -9 instead of kill -15, please confirm. Here is the portmap entry details for port 49159 which glusterd gives back to the mount and port 49156 which is the actual port consumed by the brick process. (gdb) p $4.ports[49159] $7 = { type = GF_PMAP_PORT_BRICKSERVER, brickname = 0x7f04a814b400 "/bricks/brick3/b3", xprt = 0x7f04a8148380 } (gdb) p $4.ports[49156] $8 = { type = GF_PMAP_PORT_BRICKSERVER, brickname = 0x7f04a8000d00 "/bricks/brick3/b3", xprt = 0x7f04a817b820 } From the above two entries it is clear that 49159 is a stale entry and the reason glusterd gave back this port back to the client is because portmap search always go top down i.e. starting from last_alloc and coming down to base and this was a change introduced recently by BZ 1353426. At the time the patch for this BZ was merged,things were proper as the last_alloc was always fast forwarded however with BZ 1263090 getting fixed after that introduced a side effect to the former fix since now the pmap_registry_alloc always starts from base_port and tries to find any free port which was consumed earlier. Now consider a case: 1. Create 4 1 X 1 volume, so brick ports for vol1 to vol4 would be 49152 to 49155. 2. Start all the volumes 3. Delete vol1, vol2 & vol3 4. kill -9 <brick pid of vol4> 5. stop and delete vol4 6. Create vol4 with same vol name and brick path (use force option) and start the volume, note now the port will be 49152 7. try to mount and mount will fail since glusterd will report back 49155 as the port for vol4. We'd need to think about how to fix it. But this *can* only happen if a PMAP_SIGNOUT is not received when a brick process goes down and then you'd need to delete and recreate volume with same name and brick path. With all this in mind, I am moving this out of 3.2.0. Feel free to think otherwise with proper justification :)
Gaurav - this is a very interesting bug to work with. Can you please add it to your backlog?