Description of problem:
After updating the etcd systems in an HA cluster from 2.3.7 to 3.0.12, migrating the data and configuring the master-api servers to use storage-backend=etcd3, the api-server will not completely start. The logs are flooded with the following errors:
Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken read tcp 192.1.11.211:33308->192.1.11.215:2379: read: connection reset by peer.
Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF.
Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken read tcp 192.1.11.211:34574->192.1.11.214:2379: read: connection reset by peer.
Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF.
Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF.
Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF.
Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF.
Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken read tcp 192.1.11.211:48334->192.1.11.216:2379: read: connection reset by peer.
Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF.
Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF.
Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF.
Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF.
Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken read tcp 192.1.11.211:34570->192.1.11.214:2379: read: connection reset by peer.
Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF.
Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken read tcp 192.1.11.211:48374->192.1.11.216:2379: read: connection reset by peer.
Version-Release number of selected component (if applicable): 3.4.0.11
How reproducible: Always
Steps to Reproduce:
Environment:
3.4.0.11 cluster installed with 3 masters and 3 2.3.7 etcd servers and 300 nodes. Ran the conformance tests in the cluster and then created 1K projects with 4K pods. Everything successful up to this point.
This issue has also been reproduced in a smaller cluster with 1 etcd, 1 master and 3 nodes. Cluster size not an issue.
0. Shutdown all OCP masters
1. Shutdown etcd servers and update (yum swap) to install etcd 3.0.12
2. On each etcd: etcdctl migrate --data-dir=/var/lib/etcd
3. Verify the migration completes with no errors and restart all etcds
4. On each master, edit master-config.yaml and add the following to apiServerArguments:
apiServerArguments:
storage-backend:
- "etcd3"
5. Restart the master services
Actual results:
The master-api services will fail to initialize. The logs will be overrun with the messages above. Attached are the master-api journal logs and etcd with --debug=true. The etcd logs show no errors.
Expected results:
Migrated cluster operates as before the etcd upgrade.
Additional info:
I ran tcpdump/wireshark and every time the master sends something to the etcd system, the etcd system sends back a reset/RST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2017:0066
Description of problem: After updating the etcd systems in an HA cluster from 2.3.7 to 3.0.12, migrating the data and configuring the master-api servers to use storage-backend=etcd3, the api-server will not completely start. The logs are flooded with the following errors: Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken read tcp 192.1.11.211:33308->192.1.11.215:2379: read: connection reset by peer. Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF. Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken read tcp 192.1.11.211:34574->192.1.11.214:2379: read: connection reset by peer. Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF. Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF. Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF. Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF. Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken read tcp 192.1.11.211:48334->192.1.11.216:2379: read: connection reset by peer. Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF. Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF. Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF. Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF. Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken read tcp 192.1.11.211:34570->192.1.11.214:2379: read: connection reset by peer. Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF. Oct 20 13:41:17 svt-m-1.localdomain openshift[44796]: transport: http2Client.notifyError got notified that the client transport was broken read tcp 192.1.11.211:48374->192.1.11.216:2379: read: connection reset by peer. Version-Release number of selected component (if applicable): 3.4.0.11 How reproducible: Always Steps to Reproduce: Environment: 3.4.0.11 cluster installed with 3 masters and 3 2.3.7 etcd servers and 300 nodes. Ran the conformance tests in the cluster and then created 1K projects with 4K pods. Everything successful up to this point. This issue has also been reproduced in a smaller cluster with 1 etcd, 1 master and 3 nodes. Cluster size not an issue. 0. Shutdown all OCP masters 1. Shutdown etcd servers and update (yum swap) to install etcd 3.0.12 2. On each etcd: etcdctl migrate --data-dir=/var/lib/etcd 3. Verify the migration completes with no errors and restart all etcds 4. On each master, edit master-config.yaml and add the following to apiServerArguments: apiServerArguments: storage-backend: - "etcd3" 5. Restart the master services Actual results: The master-api services will fail to initialize. The logs will be overrun with the messages above. Attached are the master-api journal logs and etcd with --debug=true. The etcd logs show no errors. Expected results: Migrated cluster operates as before the etcd upgrade. Additional info: I ran tcpdump/wireshark and every time the master sends something to the etcd system, the etcd system sends back a reset/RST