Bug 1956487 - Failed to deploy HE with "Failed to connect to the host via ssh: ssh: connect to host xxx.com port 22: No route to host".
Summary: Failed to deploy HE with "Failed to connect to the host via ssh: ssh: connect...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: ovirt-host-deploy-ansible
Version: ---
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.4.6-1
: ---
Assignee: Ales Musil
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks: 1933191
TreeView+ depends on / blocked
 
Reported: 2021-05-03 19:00 UTC by Nikolai Sednev
Modified: 2021-06-02 13:30 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-12 08:45:48 UTC
oVirt Team: Network
Embargoed:
sbonazzo: ovirt-4.4+
sbonazzo: blocker?


Attachments (Terms of Use)
sosreport from alma03 (15.56 MB, application/x-xz)
2021-05-03 19:00 UTC, Nikolai Sednev
no flags Details
ovirt-hosted-engine-setup-20210503203906-jbvsaw.log (482.28 KB, text/plain)
2021-05-03 19:01 UTC, Nikolai Sednev
no flags Details
ovirt-hosted-engine-setup-20210504195419-7cmg5h.log (450.12 KB, text/plain)
2021-05-04 17:45 UTC, Nikolai Sednev
no flags Details
sosreport from alma04 (15.90 MB, application/x-xz)
2021-05-04 17:47 UTC, Nikolai Sednev
no flags Details

Description Nikolai Sednev 2021-05-03 19:00:14 UTC
Created attachment 1779084 [details]
sosreport from alma03

Description of problem:
Failed to deploy HE with "Failed to connect to the host via ssh: ssh: connect to host xxx.com port 22: No route to host".
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Host is not up, please check logs, perhaps also on the engine machine"}
This bug looks very like to https://bugzilla.redhat.com/show_bug.cgi?id=1868409, but the DM is running just fine:
alma03 ~]# systemctl status multipathd
● multipathd.service - Device-Mapper Multipath Device Controller
   Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2021-05-03 21:03:54 IDT; 43min ago
 Main PID: 17381 (multipathd)
   Status: "up"
    Tasks: 7
   Memory: 13.1M
   CGroup: /system.slice/multipathd.service
           └─17381 /sbin/multipathd -d -s

May 03 21:03:54 alma03.qa.lab.tlv.redhat.com systemd[1]: Starting Device-Mapper Multipath Device Controller...
May 03 21:03:54 alma03.qa.lab.tlv.redhat.com multipathd[17381]: --------start up--------
May 03 21:03:54 alma03.qa.lab.tlv.redhat.com multipathd[17381]: read /etc/multipath.conf
May 03 21:03:54 alma03.qa.lab.tlv.redhat.com multipathd[17381]: path checkers start up
May 03 21:03:54 alma03.qa.lab.tlv.redhat.com systemd[1]: Started Device-Mapper Multipath Device Controller.
May 03 21:03:54 alma03.qa.lab.tlv.redhat.com multipathd[17381]: reconfigure (operator)

Version-Release number of selected component (if applicable):
rhvm-appliance-4.4-20210402.1.el8ev.x86_64
vdsm-4.40.60.6-1.el8ev.x86_64
libvirt-lock-sanlock-7.0.0-13.module+el8.4.0+10604+5608c2b4.x86_64
sanlock-3.8.3-1.el8.x86_64
qemu-kvm-5.2.0-15.module+el8.4.0+10650+50781ca0.x86_64
ovirt-hosted-engine-ha-2.4.6-1.el8ev.noarch
ovirt-hosted-engine-setup-2.5.0-2.el8ev.noarch
Linux 4.18.0-304.el8.x86_64 #1 SMP Tue Apr 6 05:19:59 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 8.4 (Ootpa)


How reproducible:
100%

Steps to Reproduce:
1.Try deploying HE with latest bits over any type of storage (you won't even get to choose one, it'll crush before that).

Actual results:
Deployment fails with these errors:
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Host is not up, please check logs, perhaps also on the engine machine"}
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ ERROR ] fatal: [localhost]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host nsednev-he-1.qa.lab.tlv.redhat.com port 22: No route to host", "skip_reason": "Host localhost is unreachable", "unreachable": true}


Expected results:
Deployment should succeed.

Additional info:
Sosreport from the host with all relevant logs attached.

Comment 1 Nikolai Sednev 2021-05-03 19:01:26 UTC
Created attachment 1779085 [details]
ovirt-hosted-engine-setup-20210503203906-jbvsaw.log

Comment 2 Asaf Rachmani 2021-05-04 06:03:05 UTC
From engine.log:
2021-05-03 21:04:14,653+03 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [43e4cca0] Host installation failed for host '38df5b98-323c-4df5-be81-2bc9e23f3643', 'alma03.qa.lab.tlv.redhat.com': Task Start and enable services failed to execute. Please check logs for more details: /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20210503210239-alma03.qa.lab.tlv.redhat.com-43e4cca0.log
2021-05-03 21:04:14,657+03 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1) [43e4cca0] START, SetVdsStatusVDSCommand(HostName = alma03.qa.lab.tlv.redhat.com, SetVdsStatusVDSCommandParameters:{hostId='38df5b98-323c-4df5-be81-2bc9e23f3643', status='InstallFailed', nonOperationalReason='NONE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 35a092b3
2021-05-03 21:04:14,669+03 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1) [43e4cca0] FINISH, SetVdsStatusVDSCommand, return: , log id: 35a092b3
2021-05-03 21:04:14,702+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1) [43e4cca0] EVENT_ID: VDS_INSTALL_FAILED(505), Host alma03.qa.lab.tlv.redhat.com installation failed. Task Start and enable services failed to execute. Please check logs for more details: /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20210503210239-alma03.qa.lab.tlv.redhat.com-43e4cca0.log.
2021-05-03 21:04:14,719+03 INFO  [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [43e4cca0] Lock freed to object 'EngineLock:{exclusiveLocks='[38df5b98-323c-4df5-be81-2bc9e23f3643=VDS]', sharedLocks=''}'


From ovirt-host-deploy-ansible log file:
2021-05-03 21:04:14 IDT - failed: [alma03.qa.lab.tlv.redhat.com] (item=vdsmd.service) => {"ansible_loop_var": "item", "changed": false, "item": "vdsmd.service", "msg": "Unable to start service vdsmd.service: A dependency job for vdsmd.ser
vice failed. See 'journalctl -xe' for details.\n"}

Comment 6 Martin Perina 2021-05-04 11:23:47 UTC
There are many errors in the logs showing that the new installation hasn't been performed on a fresh installed RHEL host, but there were prior installations. Is this reproducible on a clean newly installed host?

Comment 7 Nikolai Sednev 2021-05-04 12:35:04 UTC
(In reply to Martin Perina from comment #6)
> There are many errors in the logs showing that the new installation hasn't
> been performed on a fresh installed RHEL host, but there were prior
> installations. Is this reproducible on a clean newly installed host?

It was a clean environment as usual, I never try to reran without reprovisioning the host.

Comment 8 Nikolai Sednev 2021-05-04 14:06:43 UTC
On the engine I see that it fails to receive external DHCP IP address and remains in state of locally defined IP address by libvirt DHCP server:
nsednev-he-1 ~]# ip a show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:16:3e:7b:b8:53 brd ff:ff:ff:ff:ff:ff
    inet 192.168.222.167/24 brd 192.168.222.255 scope global dynamic noprefixroute eth0
       valid_lft 3318sec preferred_lft 3318sec
    inet6 fe80::216:3eff:fe7b:b853/64 scope link 
       valid_lft forever preferred_lft forever

Deployment stays pretty much time in state "[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Wait for the host to be up]" and then fails the deployment.

Comment 10 Nikolai Sednev 2021-05-04 15:30:51 UTC
Just updated to openvswitch2.11-2.11.3-87.el8fdp.x86_64.rpm provided by CI from Yesterday evening, retesting now...

Comment 11 Nikolai Sednev 2021-05-04 17:43:31 UTC
OK, so the appliance and the host were running with the same openvswitch2.11-2.11.3-87.el8fdp.x86_64 now. Engine was running with ovirt-engine-setup-4.4.6.6-0.10.el8ev.noarch, yet the deployment have failed.


[ ERROR ] fatal: [localhost -> nsednev-he-1.qa.lab.tlv.redhat.com]: FAILED! => {"attempts": 30, "changed": false, "connection": "close", "content": "<html><head><title>Error</title></head><body>404 - Not Found</body></html>", "content_encoding": "identity", "content_length": "74", "content_type": "text/html; charset=UTF-8", "date": "Tue, 04 May 2021 17:36:44 GMT", "elapsed": 0, "msg": "Status code was 404 and not [200]: HTTP Error 404: Not Found", "redirected": false, "server": "Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1g mod_auth_gssapi/1.6.1 mod_wsgi/4.6.4 Python/3.6", "status": 404, "url": "http://localhost/ovirt-engine/services/health"}

[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "There was a failure deploying the engine on the local engine VM. The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20210504195419-7cmg5h.log

Another error for now, but still deployment fails.

Sosreport from host alma04 is attached.

Comment 12 Nikolai Sednev 2021-05-04 17:45:06 UTC
Created attachment 1779461 [details]
ovirt-hosted-engine-setup-20210504195419-7cmg5h.log

Comment 13 Nikolai Sednev 2021-05-04 17:47:13 UTC
Created attachment 1779462 [details]
sosreport from alma04

Comment 14 Ales Musil 2021-05-05 05:40:10 UTC
There are some errors in server.log, which is probably the reason why engine refused to start:

2021-05-04 20:30:18,845+03 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-1) MSC000001: Failed to start service jboss.deployment.subunit."engine.ear"."webadmin.war".STRUCTURE: org.jboss.msc.service.StartException in service jboss.deployment.subunit."engine.ear"."webadmin.war".STRUCTURE: WFLYSRV0153: Failed to process phase STRUCTURE of subdeployment "webadmin.war" of deployment "engine.ear"
	at org.jboss.as.server.20.Final-redhat-00001//org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(DeploymentUnitPhaseService.java:183)
	at org.jboss.msc.11.Final-redhat-00001//org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1739)
	at org.jboss.msc.11.Final-redhat-00001//org.jboss.msc.service.ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1701)
	at org.jboss.msc.11.Final-redhat-00001//org.jboss.msc.service.ServiceControllerImpl$ControllerTask.run(ServiceControllerImpl.java:1559)
	at org.jboss.threads.3.Final-redhat-00001//org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
	at org.jboss.threads.3.Final-redhat-00001//org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
	at org.jboss.threads.3.Final-redhat-00001//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
	at org.jboss.threads.3.Final-redhat-00001//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1363)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.jboss.as.server.deployment.DeploymentUnitProcessingException: org.jboss.as.server.deployment.DeploymentUnitProcessingException: WFLYUT0048: Failed to process WEB-INF/lib: "/var/lib/ovirt-engine/jboss_runtime/deployments/engine.ear/webadmin.war/WEB-INF/lib/gwt-servlet.jar"
	at org.wildfly.extension.undertow.7.GA-redhat-00002//org.wildfly.extension.undertow.deployment.WarStructureDeploymentProcessor.deploy(WarStructureDeploymentProcessor.java:128)
	at org.jboss.as.server.20.Final-redhat-00001//org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(DeploymentUnitPhaseService.java:176)
	... 8 more
Caused by: org.jboss.as.server.deployment.DeploymentUnitProcessingException: WFLYUT0048: Failed to process WEB-INF/lib: "/var/lib/ovirt-engine/jboss_runtime/deployments/engine.ear/webadmin.war/WEB-INF/lib/gwt-servlet.jar"
	at org.wildfly.extension.undertow.7.GA-redhat-00002//org.wildfly.extension.undertow.deployment.WarStructureDeploymentProcessor.createResourceRoots(WarStructureDeploymentProcessor.java:230)
	at org.wildfly.extension.undertow.7.GA-redhat-00002//org.wildfly.extension.undertow.deployment.WarStructureDeploymentProcessor.deploy(WarStructureDeploymentProcessor.java:123)
	... 9 more
Caused by: java.io.FileNotFoundException: /var/lib/ovirt-engine/jboss_runtime/tmp/vfs/deployment/deploymentfb728a23c2cf9b9a/gwt-servlet.jar-eae5603963fb3286/gwt-servlet.jar (Operation not permitted)
	at java.base/java.io.RandomAccessFile.open0(Native Method)
	at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:345)
	at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:259)
	at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:214)
	at java.base/java.util.zip.ZipFile$Source.<init>(ZipFile.java:1285)
	at java.base/java.util.zip.ZipFile$Source.get(ZipFile.java:1251)
	at java.base/java.util.zip.ZipFile$CleanableResource.<init>(ZipFile.java:732)
	at java.base/java.util.zip.ZipFile$CleanableResource.get(ZipFile.java:849)
	at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:247)
	at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:177)
	at java.base/java.util.jar.JarFile.<init>(JarFile.java:348)
	at java.base/java.util.jar.JarFile.<init>(JarFile.java:319)
	at java.base/java.util.jar.JarFile.<init>(JarFile.java:285)
	at org.jboss.vfs.15.Final-redhat-00001//org.jboss.vfs.spi.JavaZipFileSystem.<init>(JavaZipFileSystem.java:90)
	at org.jboss.vfs.15.Final-redhat-00001//org.jboss.vfs.spi.JavaZipFileSystem.<init>(JavaZipFileSystem.java:77)
	at org.jboss.vfs.15.Final-redhat-00001//org.jboss.vfs.VFS.mountZip(VFS.java:386)
	at org.jboss.vfs.15.Final-redhat-00001//org.jboss.vfs.VFS.mountZip(VFS.java:410)
	at org.wildfly.extension.undertow.7.GA-redhat-00002//org.wildfly.extension.undertow.deployment.WarStructureDeploymentProcessor.createResourceRoots(WarStructureDeploymentProcessor.java:222)
	... 10 more

021-05-04 20:30:20,644+03 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("deploy") failed - address: ([("deployment" => "engine.ear")]) - failure description: {"WFLYCTL0080: Failed services" => {"jboss.deployment.subunit.\"engine.ear\".\"webadmin.war\".STRUCTURE" => "WFLYSRV0153: Failed to process phase STRUCTURE of subdeployment \"webadmin.war\" of deployment \"engine.ear\"
    Caused by: org.jboss.as.server.deployment.DeploymentUnitProcessingException: org.jboss.as.server.deployment.DeploymentUnitProcessingException: WFLYUT0048: Failed to process WEB-INF/lib: \"/var/lib/ovirt-engine/jboss_runtime/deployments/engine.ear/webadmin.war/WEB-INF/lib/gwt-servlet.jar\"
    Caused by: org.jboss.as.server.deployment.DeploymentUnitProcessingException: WFLYUT0048: Failed to process WEB-INF/lib: \"/var/lib/ovirt-engine/jboss_runtime/deployments/engine.ear/webadmin.war/WEB-INF/lib/gwt-servlet.jar\"
    Caused by: java.io.FileNotFoundException: /var/lib/ovirt-engine/jboss_runtime/tmp/vfs/deployment/deploymentfb728a23c2cf9b9a/gwt-servlet.jar-eae5603963fb3286/gwt-servlet.jar (Operation not permitted)"}}
2021-05-04 20:30:20,648+03 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("deploy") failed - address: ([("deployment" => "restapi.war")]) - failure description: {
    "WFLYCTL0412: Required services that are not installed:" => ["jboss.naming.context.java.global.engine.bll.\"Backend!org.ovirt.engine.core.common.interfaces.BackendLocal\""],
    "WFLYCTL0180: Services with missing/unavailable dependencies" => [
        "jboss.naming.context.java.module.restapi.restapi.env.\"org.ovirt.engine.core.utils.servlet.CORSSupportFilter\".backend is missing [jboss.naming.context.java.global.engine.bll.\"Backend!org.ovirt.engine.core.common.interfaces.BackendLocal\"]",
        "jboss.naming.context.java.module.restapi.restapi.env.\"org.ovirt.engine.api.restapi.invocation.VersionFilter\".backend is missing [jboss.naming.context.java.global.engine.bll.\"Backend!org.ovirt.engine.core.common.interfaces.BackendLocal\"]",
        "jboss.naming.context.java.module.restapi.restapi.env.\"org.ovirt.engine.api.restapi.invocation.CurrentFilter\".backend is missing [jboss.naming.context.java.global.engine.bll.\"Backend!org.ovirt.engine.core.common.interfaces.BackendLocal\"]"
    ]
}


WFLYCTL0186:   Services which failed to start:      service jboss.deployment.subunit."engine.ear"."webadmin.war".STRUCTURE: WFLYSRV0153: Failed to process phase STRUCTURE of subdeployment "webadmin.war" of deployment "engine.ear"
WFLYCTL0448: 42 additional services are down due to their dependencies being missing or failed
2021-05-04 20:30:20,821+03 INFO  [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
2021-05-04 20:30:20,829+03 ERROR [org.jboss.as] (Controller Boot Thread) WFLYSRV0026: JBoss EAP 7.3.7.GA (WildFly Core 10.1.20.Final-redhat-00001) started (with errors) in 14877ms - Started 651 of 965 services (52 services failed or missing dependencies, 378 services are lazy, passive or on-demand)

Comment 15 Martin Perina 2021-05-06 06:27:15 UTC
(In reply to Ales Musil from comment #14)
> There are some errors in server.log, which is probably the reason why engine
> refused to start:
> 
> 2021-05-04 20:30:18,845+03 ERROR [org.jboss.msc.service.fail] (MSC service
> thread 1-1) MSC000001: Failed to start service
> jboss.deployment.subunit."engine.ear"."webadmin.war".STRUCTURE:
> org.jboss.msc.service.StartException in service
> jboss.deployment.subunit."engine.ear"."webadmin.war".STRUCTURE: WFLYSRV0153:
> Failed to process phase STRUCTURE of subdeployment "webadmin.war" of
> deployment "engine.ear"
> 	at
> org.jboss.as.server.20.Final-redhat-00001//org.jboss.as.server.
> deployment.DeploymentUnitPhaseService.start(DeploymentUnitPhaseService.java:
> 183)
> 	at
> org.jboss.msc.11.Final-redhat-00001//org.jboss.msc.service.
> ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1739)
> 	at
> org.jboss.msc.11.Final-redhat-00001//org.jboss.msc.service.
> ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1701)
> 	at
> org.jboss.msc.11.Final-redhat-00001//org.jboss.msc.service.
> ServiceControllerImpl$ControllerTask.run(ServiceControllerImpl.java:1559)
> 	at
> org.jboss.threads.3.Final-redhat-00001//org.jboss.threads.
> ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:
> 35)
> 	at
> org.jboss.threads.3.Final-redhat-00001//org.jboss.threads.
> EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
> 	at
> org.jboss.threads.3.Final-redhat-00001//org.jboss.threads.
> EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
> 	at
> org.jboss.threads.3.Final-redhat-00001//org.jboss.threads.
> EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1363)
> 	at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: org.jboss.as.server.deployment.DeploymentUnitProcessingException:
> org.jboss.as.server.deployment.DeploymentUnitProcessingException:
> WFLYUT0048: Failed to process WEB-INF/lib:
> "/var/lib/ovirt-engine/jboss_runtime/deployments/engine.ear/webadmin.war/WEB-
> INF/lib/gwt-servlet.jar"
> 	at
> org.wildfly.extension.undertow.7.GA-redhat-00002//org.wildfly.extension.
> undertow.deployment.WarStructureDeploymentProcessor.
> deploy(WarStructureDeploymentProcessor.java:128)
> 	at
> org.jboss.as.server.20.Final-redhat-00001//org.jboss.as.server.
> deployment.DeploymentUnitPhaseService.start(DeploymentUnitPhaseService.java:
> 176)
> 	... 8 more
> Caused by: org.jboss.as.server.deployment.DeploymentUnitProcessingException:
> WFLYUT0048: Failed to process WEB-INF/lib:
> "/var/lib/ovirt-engine/jboss_runtime/deployments/engine.ear/webadmin.war/WEB-
> INF/lib/gwt-servlet.jar"
> 	at
> org.wildfly.extension.undertow.7.GA-redhat-00002//org.wildfly.extension.
> undertow.deployment.WarStructureDeploymentProcessor.
> createResourceRoots(WarStructureDeploymentProcessor.java:230)
> 	at
> org.wildfly.extension.undertow.7.GA-redhat-00002//org.wildfly.extension.
> undertow.deployment.WarStructureDeploymentProcessor.
> deploy(WarStructureDeploymentProcessor.java:123)
> 	... 9 more
> Caused by: java.io.FileNotFoundException:
> /var/lib/ovirt-engine/jboss_runtime/tmp/vfs/deployment/
> deploymentfb728a23c2cf9b9a/gwt-servlet.jar-eae5603963fb3286/gwt-servlet.jar
> (Operation not permitted)
> 	at java.base/java.io.RandomAccessFile.open0(Native Method)
> 	at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:345)
> 	at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:259)
> 	at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:214)
> 	at java.base/java.util.zip.ZipFile$Source.<init>(ZipFile.java:1285)
> 	at java.base/java.util.zip.ZipFile$Source.get(ZipFile.java:1251)
> 	at
> java.base/java.util.zip.ZipFile$CleanableResource.<init>(ZipFile.java:732)
> 	at java.base/java.util.zip.ZipFile$CleanableResource.get(ZipFile.java:849)
> 	at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:247)
> 	at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:177)
> 	at java.base/java.util.jar.JarFile.<init>(JarFile.java:348)
> 	at java.base/java.util.jar.JarFile.<init>(JarFile.java:319)
> 	at java.base/java.util.jar.JarFile.<init>(JarFile.java:285)
> 	at
> org.jboss.vfs.15.Final-redhat-00001//org.jboss.vfs.spi.JavaZipFileSystem.
> <init>(JavaZipFileSystem.java:90)
> 	at
> org.jboss.vfs.15.Final-redhat-00001//org.jboss.vfs.spi.JavaZipFileSystem.
> <init>(JavaZipFileSystem.java:77)
> 	at
> org.jboss.vfs.15.Final-redhat-00001//org.jboss.vfs.VFS.mountZip(VFS.java:
> 386)
> 	at
> org.jboss.vfs.15.Final-redhat-00001//org.jboss.vfs.VFS.mountZip(VFS.java:
> 410)
> 	at
> org.wildfly.extension.undertow.7.GA-redhat-00002//org.wildfly.extension.
> undertow.deployment.WarStructureDeploymentProcessor.
> createResourceRoots(WarStructureDeploymentProcessor.java:222)
> 	... 10 more
> 
> 021-05-04 20:30:20,644+03 ERROR
> [org.jboss.as.controller.management-operation] (Controller Boot Thread)
> WFLYCTL0013: Operation ("deploy") failed - address: ([("deployment" =>
> "engine.ear")]) - failure description: {"WFLYCTL0080: Failed services" =>
> {"jboss.deployment.subunit.\"engine.ear\".\"webadmin.war\".STRUCTURE" =>
> "WFLYSRV0153: Failed to process phase STRUCTURE of subdeployment
> \"webadmin.war\" of deployment \"engine.ear\"
>     Caused by:
> org.jboss.as.server.deployment.DeploymentUnitProcessingException:
> org.jboss.as.server.deployment.DeploymentUnitProcessingException:
> WFLYUT0048: Failed to process WEB-INF/lib:
> \"/var/lib/ovirt-engine/jboss_runtime/deployments/engine.ear/webadmin.war/
> WEB-INF/lib/gwt-servlet.jar\"
>     Caused by:
> org.jboss.as.server.deployment.DeploymentUnitProcessingException:
> WFLYUT0048: Failed to process WEB-INF/lib:
> \"/var/lib/ovirt-engine/jboss_runtime/deployments/engine.ear/webadmin.war/
> WEB-INF/lib/gwt-servlet.jar\"
>     Caused by: java.io.FileNotFoundException:
> /var/lib/ovirt-engine/jboss_runtime/tmp/vfs/deployment/
> deploymentfb728a23c2cf9b9a/gwt-servlet.jar-eae5603963fb3286/gwt-servlet.jar
> (Operation not permitted)"}}
> 2021-05-04 20:30:20,648+03 ERROR
> [org.jboss.as.controller.management-operation] (Controller Boot Thread)
> WFLYCTL0013: Operation ("deploy") failed - address: ([("deployment" =>
> "restapi.war")]) - failure description: {
>     "WFLYCTL0412: Required services that are not installed:" =>
> ["jboss.naming.context.java.global.engine.bll.\"Backend!org.ovirt.engine.
> core.common.interfaces.BackendLocal\""],
>     "WFLYCTL0180: Services with missing/unavailable dependencies" => [
>        
> "jboss.naming.context.java.module.restapi.restapi.env.\"org.ovirt.engine.
> core.utils.servlet.CORSSupportFilter\".backend is missing
> [jboss.naming.context.java.global.engine.bll.\"Backend!org.ovirt.engine.core.
> common.interfaces.BackendLocal\"]",
>        
> "jboss.naming.context.java.module.restapi.restapi.env.\"org.ovirt.engine.api.
> restapi.invocation.VersionFilter\".backend is missing
> [jboss.naming.context.java.global.engine.bll.\"Backend!org.ovirt.engine.core.
> common.interfaces.BackendLocal\"]",
>        
> "jboss.naming.context.java.module.restapi.restapi.env.\"org.ovirt.engine.api.
> restapi.invocation.CurrentFilter\".backend is missing
> [jboss.naming.context.java.global.engine.bll.\"Backend!org.ovirt.engine.core.
> common.interfaces.BackendLocal\"]"
>     ]
> }
> 
> 
> WFLYCTL0186:   Services which failed to start:      service
> jboss.deployment.subunit."engine.ear"."webadmin.war".STRUCTURE: WFLYSRV0153:
> Failed to process phase STRUCTURE of subdeployment "webadmin.war" of
> deployment "engine.ear"
> WFLYCTL0448: 42 additional services are down due to their dependencies being
> missing or failed
> 2021-05-04 20:30:20,821+03 INFO  [org.jboss.as.server] (Controller Boot
> Thread) WFLYSRV0212: Resuming server
> 2021-05-04 20:30:20,829+03 ERROR [org.jboss.as] (Controller Boot Thread)
> WFLYSRV0026: JBoss EAP 7.3.7.GA (WildFly Core 10.1.20.Final-redhat-00001)
> started (with errors) in 14877ms - Started 651 of 965 services (52 services
> failed or missing dependencies, 378 services are lazy, passive or on-demand)

Above seems like some mess in packages, is this reproducible?

Comment 16 Nikolai Sednev 2021-05-06 08:14:54 UTC
Yes, it is reproducible, happens all the time while using repos from bob.

Comment 17 Martin Perina 2021-05-06 12:20:28 UTC
Moving back to ON_QA, I can't reproduce, please check you really have the correct repos. And if you are preforming an upgrade, you are sticking exacly to correct upgrade steps. Thanks

Comment 18 Martin Perina 2021-05-06 14:30:16 UTC
OK, moving back to assigned, I can see the issue on hosted engine, but so far no idea why, standalone works fine.

Comment 19 Michal Skrivanek 2021-05-06 15:02:05 UTC
is it intentionally using FIPS?

Comment 20 Lukas Svaty 2021-05-07 08:38:23 UTC
We talked about this yesterday. As QE has to inject internal RC Repos mid-deployment I believe we hit a race, of ansible checking "some stuff" and QE altering it. Therefore verification of the ansible-playbook failed and future tasks were not executed.

One of the problems that were caused by this is that appliance VM received just local IP 192*, not updated packages, JBOSS crashing etc, due to not resolvable fqdn... cumulative issues.
Nikolai is trying to reproduce this, with better timing when to inject the repos.

Long term solution for QE is needed to inject repos correctly, without doing that in parallel to HE setup.
- have an appliance with internal repos
- pause deployment before engine-setup
- use hooks for HE-setup CLI deployment to inject the repos

or something similar.

TLDR: QE is still investigating and trying to reproduce.

Comment 21 Martin Perina 2021-05-07 09:28:27 UTC
(In reply to Michal Skrivanek from comment #19)
> is it intentionally using FIPS?

Just verified that standalone engine from RHV 4.4.6-7 installs correctly on FIPS enabled host and successfully adds a host with FIPS enabled.

So most probably the issue is somewhere around stopping hosted engine installation, injecting repos and updating engine from those repos ...

Comment 23 Nikolai Sednev 2021-05-09 21:34:26 UTC
By following https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html/installing_red_hat_virtualization_as_a_self- hosted_engine_using_the_command_line/installing_the_red_hat_virtualization_manager_she_cli_deploy and using "hosted-engine --deploy --ansible-extra-vars=he_pause_host=true" for pausing the deployment in order to update an old appliance rhvm-appliance-4.4-20210402.1.el8ev.x86_64, I've successfully deployed HE over iSCSI storage. Once paused I fetched the repos and updated the engine to the latest bits and rebooted it. When finished updating I've continued with the deployment and successfully finished without any issues.
Moving to verified.

Comment 24 Nikolai Sednev 2021-05-09 21:37:18 UTC
Components on host:
rhvm-appliance-4.4-20210402.1.el8ev.x86_64
ovirt-hosted-engine-ha-2.4.6-1.el8ev.noarch
ovirt-hosted-engine-setup-2.5.0-2.el8ev.noarch
openvswitch2.11-2.11.3-87.el8fdp.x86_64
Linux 4.18.0-305.el8.x86_64 #1 SMP Thu Apr 29 08:54:30 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 8.4 (Ootpa)

Components on engine:
ovirt-engine-setup-4.4.6.7-0.1.el8ev.noarch
openvswitch2.11-2.11.3-87.el8fdp.x86_64
Linux 4.18.0-305.el8.x86_64 #1 SMP Thu Apr 29 08:54:30 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 8.4 (Ootpa)


Note You need to log in before you can comment on or make changes to this bug.