Bug 1908773
Summary: | HE deployment fails on CentOS Stream when verifying that the engine service is up | ||||||
---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-ansible-collection | Reporter: | Yedidyah Bar David <didi> | ||||
Component: | hosted-engine-setup | Assignee: | Asaf Rachmani <arachman> | ||||
Status: | CLOSED WORKSFORME | QA Contact: | meital avital <mavital> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 1.2.0 | CC: | bugs, jamacku, relrod | ||||
Target Milestone: | ovirt-4.4.5 | Flags: | pm-rhel:
ovirt-4.4+
|
||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-12-21 12:41:29 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1908275 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Yedidyah Bar David
2020-12-17 15:09:50 UTC
Rick are you aware of any change in ansible systemd that touched this part and may require some change in our code? jamacku are you aware of any change in systemd-239-43.el8 that can lead to some change in state reporting? I'd like to clarify, more-or-less accurately, the flow: 1. The mentioned ansible script connects to the engine machine, and runs there 'engine-setup'. 2. engine-setup configures stuff, and near the end also starts the engine service. 3. This succeeds. You can see that in [1] (the log of engine-setup): 2020-12-17 14:57:22,342+0100 DEBUG otopi.plugins.otopi.services.systemd systemd.state:170 starting service ovirt-engine 2020-12-17 14:57:22,342+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/usr/bin/systemctl', 'start', 'ovirt-engine.service'), executable='None', cwd='None', env=None 2020-12-17 14:57:22,509+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/usr/bin/systemctl', 'start', 'ovirt-engine.service'), rc=0 2020-12-17 14:57:22,509+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/usr/bin/systemctl', 'start', 'ovirt-engine.service') stdout: 2020-12-17 14:57:22,509+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/usr/bin/systemctl', 'start', 'ovirt-engine.service') stderr: Meaning, 'systemctl start ovirt-engine.service' exited with rc=0 and empty stdout/stderr. You can see the systemd conf file in [2]. This "service" actually runs a python script, that runs the real (wildfly/java) service. It's of Type=notify. The code that does the notification is in python, and it runs before starting the java process, but after some other processing. It's in [3][4] (plus some other library code, less relevant I think). I have no idea if the bug related to notifification, just mentioning. It does work in CentOS 8.3. 4. Once engine-setup finishes, the ansible script runs (see link [3] from comment 0): - name: Make sure `ovirt-engine` service is running service: name: ovirt-engine state: started At this point, not sure what ansible does (didn't check its code), but: It's quite likely that the notification still didn't reach systemd, and so it might tell ansible that the service failed, or is down, or whatever. In reality, it does start eventually. You can't see this in jenkins, I think because the CI job kills everything rather quickly (can be patched, if needed), but on my local try it did [5]. [1] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14481/artifact/check-patch.he-basic_suite_master.el8.x86_64/test_logs/he-basic-suite-master/post-he_deploy/lago-he-basic-suite-master-host-0/_var_log/ovirt-hosted-engine-setup/engine-logs-2020-12-17T13%3A58%3A35Z/ovirt-engine/setup/ovirt-engine-setup-20201217145323-za8yqf.log [2] https://github.com/oVirt/ovirt-engine/blob/master/packaging/services/ovirt-engine/ovirt-engine.systemd.in [3] https://github.com/oVirt/ovirt-engine/blob/master/packaging/services/ovirt-engine/ovirt-engine.py.in [4] https://github.com/oVirt/ovirt-engine/blob/master/packaging/pythonlib/ovirt_engine/service.py [5] Full logs attached. Relevant parts: engine-setup log: 2020-12-17 14:53:11,283+0100 DEBUG otopi.plugins.otopi.services.systemd systemd.state:170 starting service ovirt-engine 2020-12-17 14:53:11,284+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/usr/bin/systemctl', 'start', 'ovirt-engine.service'), executable='None', cwd='None', env=None 2020-12-17 14:53:11,583+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/usr/bin/systemctl', 'start', 'ovirt-engine.service'), rc=0 2020-12-17 14:53:11,584+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/usr/bin/systemctl', 'start', 'ovirt-engine.service') stdout: 2020-12-17 14:53:11,584+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/usr/bin/systemctl', 'start', 'ovirt-engine.service') stderr: /var/log/messages: Dec 17 13:53:11 lago-he-basic-suite-master-engine systemd[1]: Starting oVirt Engine... Dec 17 13:53:11 lago-he-basic-suite-master-engine systemd[1]: Started oVirt Engine. ... Dec 17 13:53:14 lago-he-basic-suite-master-engine journal[26755]: 2020-12-17 14:53:14,427+0100 ovirt-engine: INFO _detectJBossVersion:186 Detecting JBoss version. Running: /usr/lib/jvm/jre-1 1-openjdk-11.0.9.11-3.el8.x86_64/bin/java ['ovirt-engine-version', '--add-modules', 'java.se', '-server', '-XX:+TieredCompilation', '-Xms1024M', '-Xmx1024M', '-Xss1M', '-Djava.awt.headless=t rue', '-Dsun.rmi.dgc.client.gcInterval=3600000', '-Dsun.rmi.dgc.server.gcInterval=3600000', '-Djsse.enableSNIExtension=false', '-Dresteasy.preferJacksonOverJsonB=true', '-Djackson.deserializ ation.whitelist.packages=org,com,java,javax', '-XX:+HeapDumpOnOutOfMemoryError', '-XX:HeapDumpPath=/var/log/ovirt-engine/dump', '-Djava.util.logging.manager=org.jboss.logmanager', '-Dlogging .configuration=file:///var/lib/ovirt-engine/jboss_runtime/config/ovirt-engine-logging.properties', '-Dorg.jboss.resolver.warning=true', '-Djboss.modules.system.pkgs=org.jboss.byteman', '-Djb oss.server.default.config=ovirt-engine', '-Djboss.home.dir=/usr/share/ovirt-engine-wildfly', '-Djboss.server.base.dir=/usr/share/ovirt-engine', '-Djboss.server.data.dir=/var/lib/ovirt-engine ', '-Djboss.server.log.dir=/var/log/ovirt-engine', '-Djboss.server.config.dir=/var/lib/ovirt-engine/jboss_runtime/config', '-Djboss.server.temp.dir=/var/lib/ovirt-engine/jboss_runtime/tmp', '-Djboss.controller.temp.dir=/var/lib/ovirt-engine/jboss_runtime/tmp', '-jar', '/usr/share/ovirt-engine-wildfly/jboss-modules.jar', '-mp', '/usr/share/ovirt-engine-wildfly-overlay/modules:/u sr/share/ovirt-engine/modules/common:/usr/share/ovirt-engine-extension-aaa-jdbc/modules:/usr/share/ovirt-engine-extension-aaa-ldap/modules:/usr/share/ovirt-engine-extension-aaa-misc/modules: /usr/share/ovirt-engine-extension-logger-log4j/modules:/usr/share/ovirt-engine-wildfly/modules', '-jaxpmodule', 'javax.xml.jaxp-provider', 'org.jboss.as.standalone', '-v'] Dec 17 13:53:21 lago-he-basic-suite-master-engine journal[26755]: 2020-12-17 14:53:21,740+0100 ovirt-engine: INFO _detectJBossVersion:206 Return code: 0, | stdout: '['WildFly Full 19.1.0.Final (WildFly Core 11.1.1.Final)'], | stderr: '['WARNING: -jaxpmodule is deprecated and may be removed in a future release']' Above is the last line from the python code, before it starts java. We have some debugging code that makes it emit a bit more, can be activated if needed. The java code writes two (relevant) log files, first of which is server.log, starts with: 2020-12-17 14:53:46,557+01 INFO [org.jboss.as.server.deployment.scanner] (MSC service thread 1-4) WFLYDS0013: Started FileSystemDeploymentService for directory /var/lib/ovirt-engine/jboss_runtime/deployments As you can see (notably in the syslog lines from the python code, which include both syslog time and python-logged time), things above are a bit messy around timezones - still not sure why, but I do not think this is related to the bug - and is probably an issue in how we run the various VMs in ovirt-system-tests etc. Created attachment 1740064 [details]
ovirt-system-tests logs
(In reply to Sandro Bonazzola from comment #2) > jamacku are you aware of any change in systemd-239-43.el8 that can lead to > some change in state reporting? Running this manually on the engine shows: # systemctl show ovirt-engine | grep -i active Failed to parse bus message: Invalid argument So I suspect current is a duplicate of one of the other bugs mentioning this error, e.g. bug 1908275. I confirm that the workaround in ansible [1], mentioned in bug 1908275 comment 1, fixes the problem locally. [1] https://github.com/ansible/ansible/pull/72337 I confirm that current bug is solved by upgrading to ansible-2.9.16-2. Closing. If we have a Stream ansible bug, we can change the reason to duplicate of it. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |