Bug 1367035
| Summary: | [DOCS] Document etcd cluster recovery after node failure + installing new etcd nodes | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Josep 'Pep' Turro Mauri <pep> |
| Component: | Documentation | Assignee: | brice <bfallonf> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Anping Li <anli> |
| Severity: | high | Docs Contact: | Vikram Goyal <vigoyal> |
| Priority: | high | ||
| Version: | 3.2.0 | CC: | aos-bugs, erich, gpei, jeder, jialiu, jokerman, knakayam, mmccomas, nschuetz, pdwyer, pep, rhowe, tstclair |
| Target Milestone: | --- | Keywords: | Performance |
| Target Release: | --- | Flags: | erich:
needinfo-
erich: needinfo- |
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | aos-scalability-34 | ||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-01-24 05:00:55 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1350875 | ||
| Bug Blocks: | |||
|
Description
Josep 'Pep' Turro Mauri
2016-08-15 10:56:32 UTC
*** Bug 1377487 has been marked as a duplicate of this bug. *** Steps to add a new etcd member using the CA configured during the OpenShift install. https://access.redhat.com/articles/2650151 Pep, Eric, I've included a section on recovering etcd hosts: https://github.com/openshift/openshift-docs/pull/3047 But there's so much going on in the above comments, that I'm not actually sure if I'm going down the right path. Can I get your thoughts? Also, there's a link to a KBase article about adding etcd hosts. How does that tie into this BZ? Should I be including it in the docs? Here perhaps: https://docs.openshift.com/enterprise/3.2/install_config/adding_hosts_to_existing_cluster.html Thanks! Ryan, Eric, Thanks for the comments. I was replicating Pep's suggestion above, but if the article Ryan pasted above is more accurate, then that might be a better source. Can I ask for an ack that it's all there? Then I'll pass this to QE for a test. Thanks! Pep, Hmm I can't see any comments that you made. What did you comment about? And really, all I'd be after would be that you think it fulfills this BZ. I think there's a lot that's being asked for here, and I want to know I'm aiming in the correct direction. Then, I can get it tested and checked out by devel, etc. I don't expect you to test it out and tell me what I'm doing wrong, just that this is what you're after. If I'm headed in the wrong direction, it'd be great to get that under control before I involve anyone else. Thanks. Ryan has given the thumbs up in the PR. I'll put this onto QA. Johnny, The "Adding New etcd Hosts" section has been added with this BZ; https://github.com/bfallonf/openshift-docs/blob/a0da5f21c0db6ba6a4a363198f8c15ff0844b7bf/admin_guide/backup_restore.adoc#backup-restore-adding-etcd-hosts Do you have the capabilities to test the procedure? Please let me know if there's is anything wrong. Thanks much, all. I'd like to have two changes.
1) yum install etcd -> yum install etcd iptables-services
Reason: iptables-services is not installed by default, it is better to guide customer to install it
2) create the ${PREFIX} directory before use it.
'mkdir ${PREFIX}' before the subtitle "Create the server.csr and server.crt certificates:"
Reason: Without this directory, the openssl fail for "No such file or directory".
Thanks, Anping Li I've made the changes you suggest, and I'll move this to peer review. Commit pushed to master at https://github.com/openshift/openshift-docs https://github.com/openshift/openshift-docs/commit/7d657a14ef5f5682d972b6b3576318ad644e5747 Merge pull request #3047 from bfallonf/etcd-1367035 Bug 1367035 added info on restoring etcd hosts Hi, I got an information from an engineer that when we add new etcd nodes with backup process due to 700mb data, we should stop etcd services on other hosts. (Please refer to [1]) We should add the instruction to the doc and article? [1] https://bugzilla.redhat.com/show_bug.cgi?id=1398083#c12 Eric, Kenjiro, I can see Eric's recommendation making it into the docs, and if that doesn't work for the reader, falling back to stopping the other etcd hosts. Something like: "If the etcd backup is larger than 700mb, prune the resource (link to pruning docs). If the backup is still larger than 700mb, stop the other hosts before performing the steps in this topic." My one question would be: Where in the section would this be? Where would be the best time to have the reader stop the etcd hosts? Link to released docs: https://docs.openshift.com/container-platform/3.3/admin_guide/backup_restore.html#backup-restore-adding-etcd-hosts However, I've sent an email to engineering lists about the conversation above, so I'll add something to it when I find something. Actually, I'll put this back to modified in the meantime. Submitted another PR for the extra info: https://github.com/openshift/openshift-docs/pull/3385 PR has merged. |