Red Hat Bugzilla – Bug 1306656
[GSS] - Brick ports changed after configuring I/O and management encryption
Last modified: 2017-03-23 01:27:00 EDT
Could you please check this?
I am able to set up management SSL, IO encryption and start the volume without a need to force start it. Did you restart glusterd on all nodes after setting up management SSL? If you have not done that, you won't be able to start the volume.
Steps I followed,
1) setup a 2 node gluster cluster with build - glusterfs-server-3.7.1-16.el7rhgs.x86_64
2) create necessary certificates (glusterfs.key, glusterfs.pem and glusterfs.ca) on all server and client
3) Enable management SSL ('touch /var/lib/glusterd/secure-access' on all nodes)
4) Restart glusterd
5) create volume and set necessary SSL options (auth.allow, server.ssl on and client.ssl on)
6) start the volume
GlusterD allocates a port for a brick, if brickinfo->port is 0. Normally, brickinfo->port would be saved and restored from the brick info file in /var/lib/gluster/vols/<volname>/bricks directory.
But as it appears in this case that the ports got changed on restart, it's possible that either the port information wasn't stored or the port information wasn't restored.
I'll check the sos-reports to see what I can find.
Mukul, can you get the names of the actual volumes whose bricks changed ports? I cannot properly analyze logs without knowing what I'm looking for.
Also, I don't think this bug and #1304274 are related right now, but I need to investigate further.
As I mentioned earlier, glusterd should only assign new ports to a brick if the brickinfo->port is 0. This should only happen if brick restore didn't correctly restore the port from the stored info file. I'm still trying to figure out from the logs if/how this could happen.
I do see the assigned ports for a brick changing between in 2 consecutive start logs for the brick. So this is real.
OK, so is this issue has any relation with management encryption ?
Can you explain the issue in detail.
I don't believe this is related to management encryption.
Management encryption should only affect connections to/from GlusterD.
GlusterD assigns the port for a brick, when it starts the brick for the first time only. When the brick is started for the first time, it's port is 0. The brick start function in GlusterD searches for a new port and assigns it to the brick when port is 0. Once a brick has been assigned a port, this information would be persisted by GlusterD in the brickinfo file at /var/lib/glusterd/vols/<volume>/bricks/<brickinfo-file>.
This port is passed to the brick as a command line argument. So there is no way management encryption could affect this.
Whenever GlusterD restarts, it will read this file and restore the port number for the brick, before starting the brick.
But as the port changed/re-assigned, and the only way that a new port could have been assigned is if port number is 0 when starting the brick, I think there could have been a failure to restore port. I'm trying to verify if this is even possible. I'll also be trying to find if we have another path that could lead to port becoming 0.
So I've found a sequence of actions that leads to brick ports getting reassigned. This is not dependent on management encryption in any way, but the sequence can be hit when enabling encryption.
The sequence is as follows, (assume a volume with bricks across a cluster)
1. Stop the volume
2. Stop glusterd on one node.
3a. Start the volume from some other node, or
3b. do a volume set operation
4. Start glusterd on the downed node again.
5. If 3b was done, start volume now.
This should lead to the port for bricks on the node with the downed glusterd changing. This is an existing bug in glusterd, which was unknown till now.
This sequence could be hit during the process of enabling management encryption.
Thanks for the analysis. I had provided the suggested analysis to the customer.
>This should lead to the port for bricks on the node with the downed glusterd changing. This is an existing bug in glusterd, which was unknown till now.
Could you provide the bz ?
(In reply to Mukul Malhotra from comment #24)
> Hello Kaushal,
> Thanks for the analysis. I had provided the suggested analysis to the
> >This should lead to the port for bricks on the node with the downed glusterd changing. This is an existing bug in glusterd, which was unknown till now.
> Could you provide the bz ?
There is no old BZ since the issue was unknown.
(In reply to Mukul Malhotra from comment #26)
> >From the steps that the customer provided in case#01573615, the above mentioned sequence was very likely hit. If you can confirm if this is indeed the case, it would be helpful. In any case, we'll start working on fixing this.
> Customer sequence matches the suggested steps when enabling Encryption.
> Also, customer wanted to know below details as,
> * When would this bug get fixed ?
This is a simple enough bug to fix. But the fix will definitely not be available in 3.1.2. We can get it in for 3.1.3 if we get the fix upstream before the downstream rebase.
Till then, to avoid this bug, the documentation for enabling management encryption could explicitly mention that no operation should be performed on the volumes before all GlusterDs are back up. This includes starting the stopped volume or setting options on the volume.
> * Is there a feature coming to manually assign and force ports ?
This is the first time hearing of a request for this feature. This would be an RFE, and need to be evaluated on the feasibility of implementation.
Could you update the doc text?
Doc text looks good to me.
Upstream patch http://review.gluster.org/13578 is merged now
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see email@example.com with any questions
Brick port allocation logic is changed in rhgs 3.2, it's verified in the BZ-1263090 so with this new logic, brick port change can happen based on the operations.
Expected from this bug is invalidated by the bug-1263090 Fix.
Moving to verified state based on BZ-1263090 verification details.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.