Bug 1695523 - [RFE] Allow the user to do manual tuning of the engine vm before running engine-setup
Summary: [RFE] Allow the user to do manual tuning of the engine vm before running engi...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: Documentation
Version: 4.4.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.4.5
: 4.4.5
Assignee: Steve Goodman
QA Contact: Wei Wang
URL:
Whiteboard:
: 1835631 1867198 (view as bug list)
Depends On: 1795672
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-03 09:34 UTC by Yedidyah Bar David
Modified: 2021-10-14 07:36 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-25 10:33:29 UTC
oVirt Team: Integration
Target Upstream Version:
Embargoed:
aturgema: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-ansible-hosted-engine-setup pull 232 0 None closed Add documentation for pausing the hosted-engine deployment 2021-01-27 21:56:37 UTC

Description Yedidyah Bar David 2019-04-03 09:34:20 UTC
Description of problem:

See bug 1660595, especially bug 1660595 comment 32 and 34.

We suggested there an example for how to cause 'hosted-engine --deploy' to wait until the user finishes customizing the engine vm. We discussed this in private today and agreed that it makes sense, as asked for in comment 34, to make this right in the tool, not requiring a custom playbook, and set via a command line option.

We should probably add this option to both restore and setup.

The lock filename should probably be something like /var/lock/ovirt-hosted-engine-setup/DELETE-TO-CONTINUE-deploy.lock , perhaps even more detailed (e.g. include a timestamp, a next-stage - in case we decide to have more than one, etc.).

I think it's best if deploy creates the file right before the wait loop on its removal. This way a tool can know it can start customizing the vm without analyzing the log/output.

It might make sense to allow getting the filename from the user - can be useful for automation - start deploy, wait until the file is created, customize, remove the file.

Comment 1 Nikolai Sednev 2020-03-03 12:37:09 UTC
Please provide documentation on how this functionality should be used.

Comment 2 Yedidyah Bar David 2020-03-04 07:05:14 UTC
(In reply to Nikolai Sednev from comment #1)
> Please provide documentation on how this functionality should be used.

1. The functionality is implemented in the ansible roles.

Documentation for that is in:

https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/README.md#make-changes-in-the-engine-vm-during-the-deployment

Which is also available on an installed system as:

/usr/share/doc/ovirt-ansible-hosted-engine-setup/README.md

This is the only documentation for this feature currently AFAIK.

2. For use from the CLI otopi frontend 'hosted-engine --deploy', you should add to your answer file:

OVEHOSTED_CORE/pauseonRestore=bool:True

Or pass directly on the command line:

hosted-engine --deploy --otopi-environment='OVEHOSTED_CORE/pauseonRestore=bool:True'

For the deploy flow, this is the only way. For the restore flow, the tool prompts you, asking about that. See bug 1686575 (and 4.3 clone bug 1712667).

3. For cockpit this isn't implemented yet, see bug 1780881.

Also, please note that this is actually only a doc bug, as the functionality was already implemented for above bugs (and verified, probably only for restore and not deploy). The linked patch is what adds the text to the README.

Comment 3 Nikolai Sednev 2020-03-04 08:02:20 UTC
(In reply to Yedidyah Bar David from comment #2)
> (In reply to Nikolai Sednev from comment #1)
> > Please provide documentation on how this functionality should be used.
> 
> 1. The functionality is implemented in the ansible roles.
> 
> Documentation for that is in:
> 
> https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/
> README.md#make-changes-in-the-engine-vm-during-the-deployment
> 
> Which is also available on an installed system as:
> 
> /usr/share/doc/ovirt-ansible-hosted-engine-setup/README.md
> 
> This is the only documentation for this feature currently AFAIK.
> 
> 2. For use from the CLI otopi frontend 'hosted-engine --deploy', you should
> add to your answer file:
> 
> OVEHOSTED_CORE/pauseonRestore=bool:True
> 
> Or pass directly on the command line:
> 
> hosted-engine --deploy
> --otopi-environment='OVEHOSTED_CORE/pauseonRestore=bool:True'
> 
> For the deploy flow, this is the only way. For the restore flow, the tool
> prompts you, asking about that. See bug 1686575 (and 4.3 clone bug 1712667).
> 
> 3. For cockpit this isn't implemented yet, see bug 1780881.
> 
> Also, please note that this is actually only a doc bug, as the functionality
> was already implemented for above bugs (and verified, probably only for
> restore and not deploy). The linked patch is what adds the text to the
> README.

If its a doc bug, then lets change it to such.

Comment 4 Sandro Bonazzola 2020-03-04 13:11:45 UTC
marking as doc bug and moving back to new since documentation has not been written yet

Comment 5 Sandro Bonazzola 2020-05-18 14:46:53 UTC
Moved to 4.4.1 not being marked as blocker for 4.4.0 and we are preparing to GA.

Comment 6 Yedidyah Bar David 2020-12-06 15:04:46 UTC
Behavior around current bug recently changed. For details, see bug 1893385 - in particular, bug 1893385 comment 27.

Comment 8 Yedidyah Bar David 2021-01-14 14:39:50 UTC
Copying the current content from the README:

Make changes in the engine VM during the deployment

In some cases, a user may want to make adjustments to the engine VM during the deployment process. There are 2 ways to do that:

Automatic:

Write ansible playbooks that will run on the engine VM before or after the engine VM installation.

You can add the playbooks to the following locations:

    hooks/enginevm_before_engine_setup: These will be ran before running engine-setup on the engine machine.

    hooks/enginevm_after_engine_setup: These will be ran after running engine-setup on the engine machine.

    hooks/after_add_host: These will be ran after adding the host to the engine, but before checking if it is up. You can place here playbooks to customize the host, such as configuring required networks, and then activate it, so that deployment will find it as "Up" and continue successfully. See examples/required_networks_fix.yml for an example.

These playbooks will be consumed automatically by the role when you execute it.

Manual:

To make manual adjustments you can set the variable he_pause_host to true. This will pause the deployment after the engine has been setup and create a lock-file at /tmp that ends with _he_setup_lock on the machine the role was executed on. The deployment will continue after deleting the lock-file, or after 24 hours ( if the lock-file hasn't been removed ).

In order to proceed with the deployment, before deleting the lock-file, make sure that the host is on 'up' state at the engine's URL.

Both of the lock-file path and the engine's URL will be presented during the role execution.

On Failure

If "Add Host" failed and left the host in status "non_operational", by default the deployment will be paused, similarly to "Manual" above, so that the user can try to fix the host to get it to "up" state, before removing the lock file and continuing. If you want the process to fail instead of pausing, set he_pause_after_failed_add_host to false.

Comment 9 Yedidyah Bar David 2021-01-14 14:41:22 UTC
Also, an example I recently used in [1], to make systemd run httpd under strace:

cat << __EOF__ > /usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/hosted_engine_setup/hooks/enginevm_before_engine_setup/httpd-strace.yml
---
- name: Create systemd httpd service drop-in directory
  file:
    path: /etc/systemd/system/httpd.service.d
    state: directory
    mode: 0755
- name: Install strace
  package:
    name: strace
    state: present
- name: Configure systemd to run httpd under strace
  copy:
    dest: /etc/systemd/system/httpd.service.d/strace.conf
    content: |
      [Service]
      ExecStart=
      ExecStart=/usr/bin/strace -f -tt -s 4096 -A -o /var/log/ovirt-engine/httpd-strace.log -D /usr/sbin/httpd $OPTIONS -DFOREGROUND
- name: Reload systemd
  command: systemctl daemon-reload
__EOF__

[1] https://gerrit.ovirt.org/c/ovirt-system-tests/+/112948

Comment 10 Eli Marcus 2021-01-27 22:00:27 UTC
copying content of https://bugzilla.redhat.com/show_bug.cgi?id=1893385#c27

Yedidyah Bar David 2020-11-26 15:19:38 UTC

QE:

To reproduce/verify:

1. Deploy 4.3 hosted-engine
2. Change the "Default" Cluster to have more than one network, and have the other network also "required".
If you do not have a host with more than one nic, you can create a dummy one with e.g. 'ip link add dummy_1 type dummy'.
3. Take a backup
4. Try to upgrade to 4.4 using this backup, twice:
4.1. As-is, just following the docs. Accept the default 'No' to 'pause?'. Instead of failing (which is what happens in previous versions), it:
- Tell you that adding the host failed
- Provide some details. Specifically, the output should include something about the required networks. If it does not, that's a bug, please report it and attach logs.
- Include a link to the web admin ui. The URL will include the host, not the engine, on port :6900. This is temporary, and works only during the deployment.
- Output a lock file you should remove once finished.
So: Connect to the web admin, fix the issue (you might need to add a dummy nic as above), activate the host, then remove the lock file. It should continue successfully.
4.2 Alternatively, supply a hook, see [1][2] for details:
- Copy the file from /usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/hosted_engine_setup/examples/required_networks_fix.yml to /usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/hosted_engine_setup/hooks/after_add_host/
- Edit it replacing "myhost", "eth0" and "net1" as applicable
- Then try to upgrade/restore. It should succeed without asking anything until the storage prompt.

If you intend to reuse the same host for testing more than one flow without complete reinstall, please note that 'ovirt-hosted-engine-cleanup' does not completely clean networks data, so next attempts might be affected. I pushed a patch for this [3], but didn't yet open a bug. So either run, after cleanup, also 'vdsm-tool clear-nets --exclude-net ovirtmgmt', or reinstall the OS.

Doc: Only doc for now is [1][2]. We probably want a doc bug as well.

[1] https://github.com/oVirt/ovirt-ansible-collection/blob/master/roles/hosted_engine_setup/README.md#make-changes-in-the-engine-vm-during-the-deployment
[2] https://github.com/oVirt/ovirt-ansible-collection/blob/master/roles/hosted_engine_setup/examples/required_networks_fix.yml
[3] https://gerrit.ovirt.org/112336

Comment 11 Steve Goodman 2021-04-06 15:06:46 UTC
*** Bug 1835631 has been marked as a duplicate of this bug. ***

Comment 28 Steve Goodman 2021-04-22 09:56:37 UTC
Based on comment 24, I edited the relevant sentence(s) as follows:

> At this point you can log in from the deployment host to the {engine-name} virtual machine to customize it.

Comment 33 Steve Goodman 2021-10-14 07:36:13 UTC
*** Bug 1867198 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.