Received the following feedback from Didi:
Chapter 6 Prerequisites: I'd drop the last one (about db user name).
Perhaps add text that explains the restore scenario. Mention that this
procedure does not backup/restore other VMs.
Procedure 6.1 step 1: You might explain why we have this step. What happens
if we later try to restore an engine that had running VMs on all of its
hosts? I know we stumbled upon issues regarding this, perhaps better to
discuss this and/or link. In particular, what happens if there is only one
host? If following the issues we had and writing this procedure, we decide
that we do not support that, better mention this properly.
Procedure 5.1 step 4: I'd call it something like 'backing up the backup
files', or 'archiving ...' or whatever. Obviously copying them using scp is
just one way to do that, as is using any other means reasonable for the
particular site (nfs share, other 'small' tools (rsync,...), 3rd party
backup software, whatever). The actual title and text there are fine, but
should make it clear they are given as an example. When changing the text
accordingly, drop 'This step is not mandatory'. It is, just not its
details. Also adapt accordingly Procedure 6.3 step 2 (about restore).
Section 6.2 introductory paragraph
Perhaps provide some overview explaining why we do this (failover host etc)?
Also, you seem to mix in several cases the hostname in the dns and
hosted_engine_N, which is the name of the host inside the engine (what
users see in 'Hosts' in the web admin). I agree that the text in the actual
script re this perhaps is not clear either, patches are welcome :-)
Also, you should mention that a new _empty_ storage is required.
Procedures 6.2 and 6.3: these are two parts of a single run of
'hosted-engine --deploy', and seem to be separated to two procedures mostly
for "editorial" reasons. I'd mention this somewhere, and also note in the
beginning of 6.3 (explaining that you are in the middle of deploy).
Procedure 6.3 step 1: usually not needed, see my comment above for
Procedure 4.2 step 15. Also adapt accordingly step 3.
In case you want to keep this text for reference, what's here is more
complete than there. You also have a similar procedure in . Since this
is an "advanced" topic, irrelevant for most setups, it might make sense to
put it in a single article and link there.
Procedure 6.4 step 6: I don't like this. Did we discuss solutions that will
not require these errors/timeout? Do we have an open bug?
Section 6.4 - removing non-operational hosts. This is really ugly! Can't we
provide a decent way to do this? I now skimmed through  to refresh my
memory, and   , and I don't believe that's what we are telling