Description of problem: We've earlier seen msgs in glusterd like: https://bugzilla.redhat.com/show_bug.cgi?id=1584581#c12 > Glusterd logs: [2018-05-27 08:08:02.530619] W > [rpcsvc.c:265:rpcsvc_program_actor] 0-rpc-service: RPC program not available > (req 1298437 330) for 10.75.149.13:1020 [2018-05-27 08:08:02.547670] E > [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to > complete successfully [2018-05-27 08:08:02.548126] W > [rpcsvc.c:265:rpcsvc_program_actor] 0-rpc-service: RPC program not available > (req 1298437 330) for 10.75.149.13:1020 [2018-05-27 08:08:02.548140] E > [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to > complete successfully [2018-05-27 08:08:02.548209] W > [rpcsvc.c:265:rpcsvc_program_actor] 0-rpc-service: RPC program not available > (req 1298437 330) for 10.75.149.13:1020 > > Note the program number - 1298437 - corresponds to Glusterfs Fop program. > Question is why are fops sent to Glusterd? They should only go to bricks. which is Duplicate of bz 1583937. Bz 1583937 is caused due to client setting its "connected" to true even when handshake with brick is not complete. This means, 1. Fops can be sent to glusterd, as client connection to brick is a two step process - first to glusterd, get the port of brick and then connect to brick. This is the scenario seen in https://bugzilla.redhat.com/show_bug.cgi?id=1584581#c12 2. Fops can be sent to brick when brick stack is not initialized causing crashes like bz 1503137. A fix has been merged upstream: https://review.gluster.org/20101 We need to take this to downstream rhgs-3.4.0 branch as its a long standing issue. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
(In reply to Raghavendra G from comment #0) > > 2. Fops can be sent to brick when brick stack is not initialized causing > crashes like bz 1503137. bz 1520374 and bz 1583937
As suggested by dev, i have followed the steps from # bz 1583937. After upgraded from RHGS-3.3.1(RHEL-7.4) to RHGS-3.4(RHEL-7.5), upgraded node bricks went to offline for most of the volumes. sosreport copied in below location: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/rajesh/1588408/
(In reply to Rajesh Madaka from comment #8) > As suggested by dev, i have followed the steps from # bz 1583937. Did bricks crash? Are cores copied in sosreport? > > > After upgraded from RHGS-3.3.1(RHEL-7.4) to RHGS-3.4(RHEL-7.5), upgraded > node bricks went to offline for most of the volumes. > > sosreport copied in below location: > http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/rajesh/1588408/
No cores are generated, I don't think so its a brick crash, bricks didn't come to online.
(In reply to Rajesh Madaka from comment #10) > No cores are generated, I don't think so its a brick crash, bricks didn't > come to online. Can you explain what do you mean by bricks didn't come online? How were you observing bricks - through gluster volume status, through client connecting to brick, or didn't see the brick process etc?
I am observing brick status through gluster volume status.most of the bricks status showing N/A for ugraded node bricks.
Based on the discussion with QE, moving this BZ to ON_QA again.
Just to clarify why this BZ has been moved to ON_QA, there's absolutely no relation of brick not coming up with the fix which this bug brings in.
Can you please provide steps to verify this bug?
(In reply to Rajesh Madaka from comment #15) > Can you please provide steps to verify this bug? I think if you don't see a brick crash, the bug can be marked as verified. As we discussed on chat, clients are able to connect to bricks and mount is successful. Bricks not being shown online in gluster v status might be a different bug.
I have verified this bug with below two scenarios. First scenario: I have followed steps mentioned in bz #1583937 Didn't find any brick crashes or mount point disconnections, but bricks went to offline,will be raising different bug for this. gluster-build version: glusterfs-fuse-3.12.2-16 Second scenario: -> Created 3 node cluster -> Created volume -> Mounted volume on client -> Then rebooted one of the gluster node. Didn't find any brick crashes or mount disconnections. Moving this bug to verified state. Gluster-build version: glusterfs-fuse-3.12.2-17
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607