Bug 1674932 - [Docs][SHE] Self-Hosted Engine guide doesn't have a process for booting the self-hosted engine VM into rescue mode
Summary: [Docs][SHE] Self-Hosted Engine guide doesn't have a process for booting the s...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: Documentation
Version: 4.2.0
Hardware: Unspecified
OS: Linux
medium
high
Target Milestone: ovirt-4.3.4
: 4.3.1
Assignee: Steve Goodman
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-11 17:22 UTC by Robert McSwain
Modified: 2019-06-02 17:02 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-02 14:45:57 UTC
oVirt Team: Integration
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Screenshot.png (1.55 KB, image/png)
2019-05-29 13:39 UTC, Nikolai Sednev
no flags Details

Description Robert McSwain 2019-02-11 17:22:28 UTC
Description of problem:
Self-Hosted engine guide does not have a process for booting the hosted-engine VM into rescue mode for emergency recovery and data collection (sosreport/Log-Collector)

Version-Release number of selected component (if applicable):
4.2 and prior

How reproducible:
N/A

Steps to Reproduce:
N/A

Actual results:
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2/html-single/self-hosted_engine_guide/ does not have a process for booting from an alternate ISO or collecting diagnostic data from within the VM when the VM will not start

Expected results:
Self-Hosted engine guide includes a process for booting into rescue mode.

Comment 1 Steve Goodman 2019-02-14 14:31:07 UTC
Ido or Simone, how do you boot into rescue mode?

Comment 2 Simone Tiraboschi 2019-02-14 14:39:52 UTC
We have


hosted-engine
...
        --vm-start-paused
            start VM on this host with qemu paused

then you can open VNC or spice in advance, and resume when ready entering the rescue mode from the graphical console

We don't have any hosted-engine --resume so the resume command has currently to be sent with vdsm-client or virsh

Comment 3 Steve Goodman 2019-02-14 15:30:25 UTC
Let me see if I understand. This is my best detailed guess based on comment 2:

1. For starters, we are assuming that the vm that has the engine is powered down. In an HA setup, the engine vm may be on any of a number of hosts, so we need to be certain that the vm itself is powered down and not just migrated to a different host.
2. Open a  VNC or spice console (why not ssh?) on a host on which the vm that hosts the hosted engine resides. How do decide which host to log into (taking into account that the vm may have migrated among hosts)?
2. Log in as root.
3. Start the engine VM on this host with qemu paused by running the command `--vm-start-paused`.
4. Enter the rescue mode. How?
5. Do one or both of the following rescue tasks:
   - Execute an emergency recovery How?
   - Collect whatever data you need to (sosreport/Log-Collector)
6. Then what?

Please review.

Comment 5 Simone Tiraboschi 2019-04-29 11:59:22 UTC
(In reply to Steve Goodman from comment #3)
> Please review.

1. Connect to one of the hosted-engine hosts over ssh
2. Set hosted-engine global maintenance mode with 'hosted-engine --set-maintenance --mode=global'
3. Check if there is already a running copy of the hosted-engine VM with 'hosted-engine --vm-status', if so connect to that host over ssh and shut it down with 'hosted-engine --vm-shutdown' or eventually 'hosted-engine --vm-poweroff' if the first fails
4. Connect to one of the hosted-engine hosts via ssh
5. Start the engine VM in pause mode with 'hosted-engine --vm-start-paused'
6. Set a temporary VNC password with 'hosted-engine --add-console-password'
7. Connect to the engine VM via VNC as for the instructions of the previous command
8. Once ready, resume the engine VM with '/usr/bin/virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf resume HostedEngine'
9. Follow https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system_administrators_guide/sec-terminal_menu_editing_during_boot to boot the engine VM in rescue mode

Comment 6 Steve Goodman 2019-05-01 12:46:46 UTC
(In reply to Simone Tiraboschi from comment #5)

> 1. Connect to one of the hosted-engine hosts over ssh
> 2. Set hosted-engine global maintenance mode with 'hosted-engine
> --set-maintenance --mode=global'
> 3. Check if there is already a running copy of the hosted-engine VM with
> 'hosted-engine --vm-status', if so connect to that host over ssh and shut it
> down with 'hosted-engine --vm-shutdown' or eventually 'hosted-engine
> --vm-poweroff' if the first fails

What is the difference between --vm-shutdown and --vm-poweroff?

> 4. Connect to one of the hosted-engine hosts via ssh
> 5. Start the engine VM in pause mode with 'hosted-engine --vm-start-paused'

> 6. Set a temporary VNC password with 'hosted-engine --add-console-password'
> 7. Connect to the engine VM via VNC as for the instructions of the previous
> command

> 8. Once ready, resume the engine VM with '/usr/bin/virsh -c
> qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf resume
> HostedEngine'

Do you run the "resume HostedEngine" command from the engine VM or the host? (It doesn't sound logical that you can run a command from within a paused VM.)

Does "resume" mean "unpause"?

> 9. Follow
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/
> html/system_administrators_guide/sec-terminal_menu_editing_during_boot to
> boot the engine VM in rescue mode

Simone, can you give me a screen log that shows the input and output of this procedure, at least?

This procedure shows how to boot into rescue mode. What do you do after that? How do you collect diagnostic data?

Comment 10 Steve Goodman 2019-05-28 15:27:00 UTC
Scott, can you please finish the peer review for this? Don't merge it until Nikolai approves.

Comment 12 Rolfe Dlugy-Hegwer 2019-05-29 12:33:08 UTC
Verified and merged.

Comment 13 Nikolai Sednev 2019-05-29 13:16:38 UTC
"The command outputs the necessary information you need to log in to the Manger virtual machine with VNC."

alma03 ~]# hosted-engine --add-console-password
Enter password: 
You can now connect the hosted-engine VM with VNC at 10.35.92.3:-1

This is really weird information for the VM login over VNC.
What does ":-1" stnads for?
Port should not have negative value.
I was unable to connect to the engine's VM over VNC, while it was in paused mode.

I used these components:
ovirt-hosted-engine-ha-2.3.1-1.el7ev.noarch
ovirt-hosted-engine-setup-2.3.8-1.el7ev.noarch
ovirt-ansible-hosted-engine-setup-1.0.19-1.el7ev.noarch

Regarding steps 7-9, I couldn't proceed with them forth to being stuck at step 6.

Comment 14 Nikolai Sednev 2019-05-29 13:39:10 UTC
alma03 ~]#  vdsm-client VM getInfo vmID=$vmid
vdsm-client: Command VM.getInfo with args {'vmID': ''} failed:
(code=-32603, message=Internal JSON-RPC error: {'reason': 'list index out of range'})

I had to manually fetch the config to the engine's VM, while in global maintenance:
. /etc/ovirt-hosted-engine/hosted-engine.conf
And then checked that VM got the config by running again:
[root@alma03 ~]#  vdsm-client VM getInfo vmID=$vmid

Now running again step 6, provided me with correct connectivity information:
alma03 ~]# hosted-engine --add-console-password
Enter password: 
You can now connect the hosted-engine VM with VNC at 10.35.92.3:5902

While I was trying to connect over VNC, I've got to the screen with "Guest has not initialized the display (yet).".

I think that step 8 is not possible, as VM gets started in "paused" mode, hence nothing is possible from VNC console.

Please see the attachment.

Comment 15 Nikolai Sednev 2019-05-29 13:39:55 UTC
Created attachment 1574800 [details]
Screenshot.png

Comment 17 Nikolai Sednev 2019-05-29 13:54:51 UTC
OK, so step 7 was just to open VNC console and thats it, then step 8 had to be executed on ha-host that was hosting the paused engine's VM, and that worked fine.
alma03 ~]#  /usr/bin/virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf resume HostedEngine
Domain HostedEngine resumed

Then VNC console became responsive and then I could properly set the engine VM into rescue mode.

I think that we should improve the documentation at step 8, which is incorrect. 
We are releasing the engine from the ha-host, not from the engine.
"On the Manager virtual machine, unpause the engine VM:" should be changed to "On the host running the Manager virtual machine, unpause the engine VM:".
Step 9 is fine.
I also would add step 10, in which I would ask cusomer to turn back the environment from global maintenance to none, once rescue mode is done.

Comment 18 Simone Tiraboschi 2019-05-29 14:10:08 UTC
(In reply to Nikolai Sednev from comment #13)
> "The command outputs the necessary information you need to log in to the
> Manger virtual machine with VNC."
> 
> alma03 ~]# hosted-engine --add-console-password
> Enter password: 

Maybe here we have a kind of race condition between 

hosted-engine --vm-start-paused
hosted-engine --add-console-password

if executed too quickly.

Comment 19 Nikolai Sednev 2019-05-29 14:23:40 UTC
I opened https://bugzilla.redhat.com/show_bug.cgi?id=1715080, to cover the comment#13 issue with negative VNC port.

Comment 25 Steve Goodman 2019-06-02 14:33:42 UTC
Nikolai, I'm publishing this because it's correct and the info is needed on the portal, but if you can answer my question in comment 23, I'll add that to the text and republish.

Comment 26 Steve Goodman 2019-06-02 14:45:57 UTC
Published:

https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/html/administration_guide/booting_a_self-hosted_engine_in_rescue_mode_she_admin

Nikolai, if I understand correctly, the answer to my question in comment 23 is to just proceed to step 5, correct?

Assuming so, I'm closing this bug. Feel free to reopen if you feel it's necessary.


Note You need to log in before you can comment on or make changes to this bug.