Bug 1737097
| Summary: | [OSP] openshift-installer creates multiple IP addresses on worker and master nodes that aren't allowed by OpenStack Security Groups | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ken Holden <kholden> |
| Component: | Installer | Assignee: | Tomas Sedovic <tsedovic> |
| Installer sub component: | openshift-installer | QA Contact: | David Sanz <dsanzmor> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | low | ||
| Priority: | low | CC: | asimonel, eduen, juriarte |
| Version: | 4.2.0 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.3.0 | ||
| Hardware: | All | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-01-23 11:05:01 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ken Holden
2019-08-02 15:27:38 UTC
@Tomas, I believe you were taking a look at ports and SGs. Can you take a look at this? Thank you. It is possible there's a default security group rule in your OpenStack environment that prevents this. Can you share an example of the extra IP addresss that are assigned to the nodes? If they are in the form `10.10.0.5` or `6` or `7`, those should be assigned to the `api-port`, `dns-port` and `ingress-port` respectively and be managed by Keepalived running on the nodes. It is true that these IPs are not associated with the servers in a normal manner, but we set allowed address pairs on all ports to whitelist them. Are you talking about these addresses or some other ones? I'm not aware of any other extra IPs that we create that would need special handling. I checked and the only security group that was applied was the one created for the instance itself. I ensured PING and SSH were enabled in that applied security group, but was unable to ping or SSH to .6 or .7 however, 10.10.0.5 worked fine which makes sense as its the IP neutron assigned for the port. I didn't have to change anything when testing the previous installation method that used the service vm for IPI install. Perhaps the allowed address pairs weren't set or the application of them didn't get correctly set when I tested the newer non-service-vm method of IPI install. Thanks! The .5-.7 IP addresses are for the internal use of the cluster. A person deploying it is not expected to interact with them in any way. If you got to or near a 100%, that means at least the .5 and .6 VIPs worked as expected. Otherwise you wouldn't even get past bootstrapping. The service VM did not need any of this VIP config. It is here to provide highly-available access to the load balancer and dns services we're running on the master nodes now. For what it's worth, I just ran a deployment on the latest checkout and it succeeded fine so this isn't something that was introduced recently (other than the service VM removal). I wonder if this might actually have something to do with just with a new port that's not open for some reason. Unrelated to the IP addresses. But that will be tricky to figure out. At any rate, we will have to find which security group rules to add. Please try to run the installation again and get it to fail (it should quit at most 30 minutes after the `Waiting up to 30m0s for the cluster at https://api.example.com:6443 to initialize` message) And then, please provide the following: 1. Output of `openstack port show` (on a failing deployment before you do any manual fixes) for the master and worker ports 2. Output of `openstack port show <cluster>-<id>-api-port>` and `openstack port show <cluster>-<id>-ingress-port` 3. How many masters and workers are you deploying with? 4. Outputs of both master and worker security group rules: `openstack security group rule list <cluster>-<id>-master` and ``openstack security group rule list <cluster>-<id>-worker` 5. The `.openshift_install.log` file in your `--dir=<directory>` directroy ("rhte/.openshift_install.log" by the looks of it) 6. Are you deploying via the interactive prompt or writing an install-config.yaml. If the latter, your install-config with the sensitive information (pull secret) omitted 7. If you have access to the underlying OpenStack (I saw Director so you might), could you report the value of `allow_same_net_traffic` in `/etc/nova/nova.conf` (on a controller node) 8. Output of `oc get pod -A`. This will be quite long but it should help us figure out which pods are blocking the deployment success. You will need a kubeconfig for this. You can get this output by running the following: $ export KUBECONFIG=$GOPATH/src/github.com/openshift/installer/<dir>/auth/kubeconfig $ oc get pod -A > get-pod.txt Thank you! I know this is a lot of info, but since I can't reproduce this, there's not much to go on at this point. Oh actually, also: according to the commit you've applied an older version of the "remove service VM" pull request. It's possible something is wrong there. Please try this again from the master branch or a nightly build. No connection refused connection between servers on latest release. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |