Hide Forgot
Description of problem: I've hit this on 3 different install attempts with OTB sshd tuning, so getting a bug open. I have logs from 2 of the attempts and will attach to this bz. While scaling a cluster up to add 100 nodes using the new version of Ansible (2.2.0.0-0.61.rc1.el7), ssh mux_client_request_session errors occur during the node certificate configuration. The error does not occur on all nodes. All nodes are identical from the same gold image. 2016-10-06 11:50:09,580 p=112268 u=root | fatal: [192.1.5.246]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: mux_client_request_session: session request failed: Session open refused by peer\r\nssh_exchange_identification: Connection closed by remote host\r\n", "unreachable": true} The errors all seem to happen in openshift_node_certificates and cause node registration failures down the line (see the end of the first log). From the first log, the errors pop in these sections and nowhere else: openshift_node_certificates : Create openshift_generated_configs_dir if it does not exist openshift_node_certificates : Generate the node client config openshift_node_certificates : Generate the node server certificate openshift_node_certificates : Create a tarball of the node config directories openshift_node_certificates : Unarchive the tarball on the node From the second log (on a completely separate set of nodes - no overlap) openshift_node_certificates : Create openshift_generated_configs_dir if it does not exist openshift_node_certificates : Generate the node server certificate openshift_node_certificates : Create a tarball of the node config directories Version-Release number of selected component (if applicable): 3.3.0.34 How reproducible: 3 out 3 attempts Steps to Reproduce: 1. Install an HA cluster (3 etcd, 3 master, 1 master lb, 2 infra nodes, 3 test nodes). My install was on OpenStack 2. Run the e2e Conformance tests to vet the cluster. Tests passed. 3. Run the openshift-ansible/byo/openshift-node/scaleup.yml playbook to add 100 new nodes to the cluster Actual results: During the node certificate configuration, the ssh errors above occurred for some (not all) nodes. Node registration later failed for those systems. Expected results: Successful install. Additional info: Also saw Ansible sftp warnings I have not seen in the past: [WARNING]: sftp transfer mechanism failed on [192.1.5.27]. Use ANSIBLE_DEBUG=1 to see detailed information
The workaround (solution?) seems to be increasing MaxSessions in /etc/ssh/sshd_config on each node we are installing on. I bumped it to 50 on mine (default is 10) and have had 2 successful scaleups of 100 and 200 nodes. Is there something different about the node cert phase of the install?
Andrew, This is happening on tasks where we delegate_to a specific host meaning that it's slamming openshift_ca_host with 100 connections/tasks at once. What do you think about making all plays that have delegate_to tasks serial: 10 ?
Since nodes are the only component we'll see more than 10 of (in most cases) and delegate_to is isolated to the node certificates role (as far as nodes go), we should try breaking node certificates out of the node configuration plays and run them at serial:10. I think that will have the smallest impact on run time. We could also move node certificates back to with_items (all nodes) and apply the role to the first master host but that moves logic back into the playbook which would make it harder to maintain.
With forks=100 and MaxSessions=50 scale up a batch of 100 nodes = 30 minutes batch of 200 nodes = 56 minutes batch of 175 nodes = 47 minutes
Hi, I've been directed here after Tuesday's "OpenShift(3.4)-on-OpenStack(10) Scalability Testing" call, I'm on the Ansible Core Team. It sounds like you have this partly in hand, is their anything specific you'd like to know from Ansible Core Engineering? I can be found as gundalow on freenode & GitHub
Some initial thoughts after speaking to people: 1) It sounds like you are delegating a task many times to a single host. 1.1) What are you actually doing in that case 1.2) Is the machine you are delegating to the one throwing the mux_client_request_session error 1.3) If you increase mux_client_request_session does it work (ignoring the increase in runtime) from Comment 7 is sounds like this is working 1.4) After increasing mux_client_request_session do you hit other bottlenecks on that machine? 1.5) Can task be rewritten so it doesn't have to always delegate to a single point. Feels like a change in architecture is needed 2) Can you provide a link to the role that's been deletedto, so we can look? 3) From Comment 7 could there be an issue with the response of the IAAS server, are their any logs to show the requests arrving and where the delay is. In our experience performance issues generally boil down to one machine getting over loaded, e.g. fork=200 installs all pulling from a single git server
(In reply to Mike Fiedler from comment #7) > With forks=100 and MaxSessions=50 > > scale up a batch of 100 nodes = 30 minutes > batch of 200 nodes = 56 minutes > batch of 175 nodes = 47 minutes From my understanding every host (so up to 200?) generates the certs by running roles/openshift_node_certificates, which runs 8 tasks which delegate_to "openshift_ca_host". I wonder if the slow down is due to openshift_ca_host becoming over loaded, we know there at at least 10 simultaneous tasks running against it due to hitting the mux_client_request_session limit on that machine. We know there are between 11 and 50 connections to that machine. Generating certificates requires entropy. 1) It would be interesting to watch the following on openshift_ca_host during the playbook runs with different fork levels while true; do paste <(date --rfc-3339=seconds) /proc/sys/kernel/random/entropy_avail <(cut -f1 -d ' ' /proc/loadavg); sleep 0.5; done 2) How are you "scaling up" and limiting batch size, with --limit?
We're going to reinstall this environment today or over the weekend. I can do #1. For #2, I am not setting any sort of --limit parameter. Recommendations for this attempt? sdodson/jdetober - is openshift_ca_host always first master?
I was wondering what the following actually means in practice: > With forks=100 and MaxSessions=50 > > scale up a batch of 100 nodes = 30 minutes > batch of 200 nodes = 56 minutes > batch of 175 nodes = 47 minutes
re: comment 14. I was just trying to document that installs with forks > 20 could be successful if /etc/ssh/sshd_config MaxSessions was bumped to a value greater than the default whereas they were unsuccessful without modifying that parameter.
Are you blocked by this or is your lowered forks value OK for now? We're thinking of closing this.
Andrew, Does the certificate generation serialization mitigate this issue?
A few of the tasks are now "run_once" but there are 4 tasks which will still generate many connections to the first master. Every task in openshift_node_certificates would need to be serialized to mitigate.
We have no immediate plans to support forks greater than 20