Bug 2107659 - hosted-engine --deploy --config-append=/tmp/engine-answers.conf --generate-answer=/tmp/restore-engine-answers.conf blows up with a permission error against a temporary directory named after a GUID after running for a long time
Summary: hosted-engine --deploy --config-append=/tmp/engine-answers.conf --generate-an...
Keywords:
Status: CLOSED DUPLICATE of bug 2089332
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-setup
Version: 4.5.0
Hardware: All
OS: All
high
high
Target Milestone: ---
: ---
Assignee: Asaf Rachmani
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-15 15:31 UTC by Greg Scott
Modified: 2022-08-11 00:17 UTC (History)
2 users (show)

Fixed In Version: ovirt-ansible-collection-2.1.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-28 07:16:28 UTC
oVirt Team: Integration
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-47705 0 None None None 2022-07-15 15:32:11 UTC

Description Greg Scott 2022-07-15 15:31:32 UTC
Description of problem:
When upgrading from 4.3.11, hosted-engine --deploy --restore blows up with a permission error against a temporary directory named after a GUID after running for a long time.

Version-Release number of selected component (if applicable):
4.5.0

How reproducible:
At will

Steps to Reproduce:
1. Run an engine-backup on a 4.3.11 RHVM. Store the backup somewhere convenient.
2. Set up a new fiberchannel LUN to hold a new hosted engine.
3. Install VDSM on a RHEL 8.6 system to turn it into a hypervisor.
4. run hosted-engine --deploy --restore with the backup created above.

Actual results:
It runs for more than 1/2 hour and fails with a permission error several steps after setting up the new fiberchannel storage domain.

Expected results:
It should run to completion and make a new hosted engine.

Additional info:
I'll attach the output and log. Here is the critical part.

.
.
.
[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Restart fapolicyd service]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Copy configuration archive to storage]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["dd", "bs=20480", "count=1", "oflag=direct", "if=/var/tmp/localvm0k3z_k73/2cdb117c-93d4-4a1e-b5da-4f95e230bd4b", "of=/rhev/data-center/mnt/blockSD/f8e2740b-d342-44d3-ac3b-deb626798402/images/e08590f5-bebb-4b3b-b3e4-b3fb3bf144eb/2cdb117c-93d4-4a1e-b5da-4f95e230bd4b"], "delta": "0:00:00.002387", "end": "2022-07-15 15:01:49.610596", "msg": "non-zero return code", "rc": 1, "start": "2022-07-15 15:01:49.608209", "stderr": "dd: failed to open '/var/tmp/localvm0k3z_k73/2cdb117c-93d4-4a1e-b5da-4f95e230bd4b': Permission denied", "stderr_lines": ["dd: failed to open '/var/tmp/localvm0k3z_k73/2cdb117c-93d4-4a1e-b5da-4f95e230bd4b': Permission denied"], "stdout": "", "stdout_lines": []}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
.
.
.

Comment 5 Greg Scott 2022-07-15 16:51:46 UTC
We just tried it without the --restore. It failed the same way.

I'll change this BZ title.

The command:
hosted-engine --deploy --config-append=/tmp/engine-answers.conf --generate-answer=/tmp/restore-engine-answers.conf

And the failure with a little bit of context around it:
.
.
.
[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Restart fapolicyd service]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Copy configuration archive to storage]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["dd", "bs=20480", "count=1", "oflag=direct", "if=/var/tmp/localvm289fhjs6/919f7105-b327-4392-9258-c6bf42d2b637", "of=/rhev/data-center/mnt/blockSD/1b4cd36c-0bb0-49f6-8079-8372b9799969/images/7456ecca-c17a-4d54-9bdd-6bdc5ada5e8a/919f7105-b327-4392-9258-c6bf42d2b637"], "delta": "0:00:00.002318", "end": "2022-07-15 16:44:09.347352", "msg": "non-zero return code", "rc": 1, "start": "2022-07-15 16:44:09.345034", "stderr": "dd: failed to open '/var/tmp/localvm289fhjs6/919f7105-b327-4392-9258-c6bf42d2b637': Permission denied", "stderr_lines": ["dd: failed to open '/var/tmp/localvm289fhjs6/919f7105-b327-4392-9258-c6bf42d2b637': Permission denied"], "stdout": "", "stdout_lines": []}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Execute just a specific set of steps]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Force facts gathering]
.
.
.

Comment 6 Greg Scott 2022-07-15 16:53:22 UTC
We just tried it without the --restore. It failed the same way.

I'll change this BZ title.

The command:
hosted-engine --deploy --config-append=/tmp/engine-answers.conf --generate-answer=/tmp/restore-engine-answers.conf

And the failure with a little bit of context around it:
.
.
.
[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Restart fapolicyd service]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Copy configuration archive to storage]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["dd", "bs=20480", "count=1", "oflag=direct", "if=/var/tmp/localvm289fhjs6/919f7105-b327-4392-9258-c6bf42d2b637", "of=/rhev/data-center/mnt/blockSD/1b4cd36c-0bb0-49f6-8079-8372b9799969/images/7456ecca-c17a-4d54-9bdd-6bdc5ada5e8a/919f7105-b327-4392-9258-c6bf42d2b637"], "delta": "0:00:00.002318", "end": "2022-07-15 16:44:09.347352", "msg": "non-zero return code", "rc": 1, "start": "2022-07-15 16:44:09.345034", "stderr": "dd: failed to open '/var/tmp/localvm289fhjs6/919f7105-b327-4392-9258-c6bf42d2b637': Permission denied", "stderr_lines": ["dd: failed to open '/var/tmp/localvm289fhjs6/919f7105-b327-4392-9258-c6bf42d2b637': Permission denied"], "stdout": "", "stdout_lines": []}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Execute just a specific set of steps]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Force facts gathering]
.
.
.

Comment 7 Greg Scott 2022-07-15 17:00:56 UTC
Digging deeper - it looks like the Ansible process switches to user vdsm. But root owns the file(s) in question. And that leads to the permission problem. The umask is 0027 - but wait - Umasks are backwards. I'll bet that's our problem....

Comment 8 Greg Scott 2022-07-15 21:21:57 UTC
That was the problem. User root owned the file and its protection bits were 640. User vdsm could not touch the file. They changed the umask to a more liberal number and retried the operation and it ran to completion. The problem was a umask too strict.

Comment 9 Asaf Rachmani 2022-07-18 09:01:45 UTC
Seems like a duplicate of bug 2089332, which was fixed in ovirt-ansible-collection-2.0.4-1.
Can you please check the ovirt-ansible-collection version?

Comment 10 Greg Scott 2022-07-18 17:06:36 UTC
Looks like it's an older version. 
ovirt-ansible-collection.noarch                        2.0.3-1.el8ev 

It's an offline installation from local repositories. But wouldn't a reposync for the SP1 RHV repositories grab the latest ovirt-ansible-collection?

Comment 11 Greg Scott 2022-07-18 18:07:20 UTC
Ah - I just checked the 4.4 SP1 Package Manifest on the download site at https://access.redhat.com/downloads/content/415/ver=4.4/rhel---8/4.4/x86_64/product-software

Looks like SP1 shipped with ovirt-ansible-collection-2.0.3-1.el8ev.noarch.

I'l bet ovirt-ansible-collection-2.0.4-1.el8ev.noarch with the bugfix ships with batch 1, coming in a few days.

- Greg

Comment 12 Martin Perina 2022-07-22 11:59:01 UTC
(In reply to Greg Scott from comment #11)
> Ah - I just checked the 4.4 SP1 Package Manifest on the download site at
> https://access.redhat.com/downloads/content/415/ver=4.4/rhel---8/4.4/x86_64/
> product-software
> 
> Looks like SP1 shipped with ovirt-ansible-collection-2.0.3-1.el8ev.noarch.
> 
> I'l bet ovirt-ansible-collection-2.0.4-1.el8ev.noarch with the bugfix ships
> with batch 1, coming in a few days.
> 
> - Greg

You are right Greg:

RHV 4.4 SP1 contains ovirt-ansible-collection-2.0.3: https://errata.devel.redhat.com/advisory/84835

RHV 4.4 SP1 Batch 1 contains ovirt-ansible-collection-2.1.0: https://errata.devel.redhat.com/advisory/96101


So could you please retest with latest RHV 4.4 SP1 Batch 1 packages?

Comment 13 Greg Scott 2022-07-25 17:25:21 UTC
> So could you please retest with latest RHV 4.4 SP1 Batch 1 packages?

I'll ask the customer, but I'm not sure it will be possible. They modified their umask and moved forward. Let me see what we can do.

Comment 14 Asaf Rachmani 2022-07-28 07:16:28 UTC
I'm closing it as a duplicate of bug 2089332, please reopen if this reappears.

*** This bug has been marked as a duplicate of bug 2089332 ***


Note You need to log in before you can comment on or make changes to this bug.