Description of problem: $subject. Reproduced on CI [1][2] and locally (using OST): 2020-12-17 14:57:30,137+0100 INFO ansible task start {'status': 'OK', 'ansible_type': 'task', 'ansible_playbook': '/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml', 'ansible_task': 'ovirt.ovirt.engine_setup : Make sure `ovirt-engine` service is running'} 2020-12-17 14:57:30,137+0100 DEBUG ansible on_any args TASK: ovirt.ovirt.engine_setup : Make sure `ovirt-engine` service is running kwargs is_conditional:False 2020-12-17 14:57:30,138+0100 DEBUG ansible on_any args localhost TASK: ovirt.ovirt.engine_setup : Make sure `ovirt-engine` service is running kwargs 2020-12-17 14:57:33,468+0100 DEBUG var changed: host "localhost" var "ansible_play_hosts" type "<class 'list'>" value: "[]" 2020-12-17 14:57:33,469+0100 DEBUG var changed: host "localhost" var "ansible_play_batch" type "<class 'list'>" value: "[]" 2020-12-17 14:57:33,469+0100 DEBUG var changed: host "localhost" var "play_hosts" type "<class 'list'>" value: "[]" 2020-12-17 14:57:33,469+0100 ERROR ansible failed { "ansible_host": "localhost", "ansible_playbook": "/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml", "ansible_result": { "_ansible_delegated_vars": { "ansible_host": "lago-he-basic-suite-master-engine.lago.local", "ansible_port": null, "ansible_user": "root" }, "_ansible_no_log": false, "changed": false, "invocation": { "module_args": { "daemon_reexec": false, "daemon_reload": false, "enabled": null, "force": null, "masked": null, "name": "ovirt-engine", "no_block": false, "scope": null, "state": "started", "user": null } }, "msg": "Service is in unknown state", "status": {} }, "ansible_task": "Make sure `ovirt-engine` service is running", "ansible_type": "task", "status": "FAILED", "task_duration": 4 } When I checked, a few minutes later, on the local run, the service was up and well. Still not fully certain about the reason for breakage. The engine service does take time between it's being started and being up. See also bug 1075013. It might be a change in systemd about what it does, and when, about this situation, or about ansible's code that does [3]. [1] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14481/ [2] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14481/artifact/check-patch.he-basic_suite_master.el8.x86_64/test_logs/he-basic-suite-master/post-he_deploy/lago-he-basic-suite-master-host-0/_var_log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-ansible-bootstrap_local_vm-20201217144447-0yncwg.log [3] https://github.com/oVirt/ovirt-ansible-collection/blob/master/roles/engine_setup/tasks/engine_setup.yml#L89 Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Rick are you aware of any change in ansible systemd that touched this part and may require some change in our code?
jamacku are you aware of any change in systemd-239-43.el8 that can lead to some change in state reporting?
I'd like to clarify, more-or-less accurately, the flow: 1. The mentioned ansible script connects to the engine machine, and runs there 'engine-setup'. 2. engine-setup configures stuff, and near the end also starts the engine service. 3. This succeeds. You can see that in [1] (the log of engine-setup): 2020-12-17 14:57:22,342+0100 DEBUG otopi.plugins.otopi.services.systemd systemd.state:170 starting service ovirt-engine 2020-12-17 14:57:22,342+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/usr/bin/systemctl', 'start', 'ovirt-engine.service'), executable='None', cwd='None', env=None 2020-12-17 14:57:22,509+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/usr/bin/systemctl', 'start', 'ovirt-engine.service'), rc=0 2020-12-17 14:57:22,509+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/usr/bin/systemctl', 'start', 'ovirt-engine.service') stdout: 2020-12-17 14:57:22,509+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/usr/bin/systemctl', 'start', 'ovirt-engine.service') stderr: Meaning, 'systemctl start ovirt-engine.service' exited with rc=0 and empty stdout/stderr. You can see the systemd conf file in [2]. This "service" actually runs a python script, that runs the real (wildfly/java) service. It's of Type=notify. The code that does the notification is in python, and it runs before starting the java process, but after some other processing. It's in [3][4] (plus some other library code, less relevant I think). I have no idea if the bug related to notifification, just mentioning. It does work in CentOS 8.3. 4. Once engine-setup finishes, the ansible script runs (see link [3] from comment 0): - name: Make sure `ovirt-engine` service is running service: name: ovirt-engine state: started At this point, not sure what ansible does (didn't check its code), but: It's quite likely that the notification still didn't reach systemd, and so it might tell ansible that the service failed, or is down, or whatever. In reality, it does start eventually. You can't see this in jenkins, I think because the CI job kills everything rather quickly (can be patched, if needed), but on my local try it did [5]. [1] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14481/artifact/check-patch.he-basic_suite_master.el8.x86_64/test_logs/he-basic-suite-master/post-he_deploy/lago-he-basic-suite-master-host-0/_var_log/ovirt-hosted-engine-setup/engine-logs-2020-12-17T13%3A58%3A35Z/ovirt-engine/setup/ovirt-engine-setup-20201217145323-za8yqf.log [2] https://github.com/oVirt/ovirt-engine/blob/master/packaging/services/ovirt-engine/ovirt-engine.systemd.in [3] https://github.com/oVirt/ovirt-engine/blob/master/packaging/services/ovirt-engine/ovirt-engine.py.in [4] https://github.com/oVirt/ovirt-engine/blob/master/packaging/pythonlib/ovirt_engine/service.py [5] Full logs attached. Relevant parts: engine-setup log: 2020-12-17 14:53:11,283+0100 DEBUG otopi.plugins.otopi.services.systemd systemd.state:170 starting service ovirt-engine 2020-12-17 14:53:11,284+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/usr/bin/systemctl', 'start', 'ovirt-engine.service'), executable='None', cwd='None', env=None 2020-12-17 14:53:11,583+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/usr/bin/systemctl', 'start', 'ovirt-engine.service'), rc=0 2020-12-17 14:53:11,584+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/usr/bin/systemctl', 'start', 'ovirt-engine.service') stdout: 2020-12-17 14:53:11,584+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/usr/bin/systemctl', 'start', 'ovirt-engine.service') stderr: /var/log/messages: Dec 17 13:53:11 lago-he-basic-suite-master-engine systemd[1]: Starting oVirt Engine... Dec 17 13:53:11 lago-he-basic-suite-master-engine systemd[1]: Started oVirt Engine. ... Dec 17 13:53:14 lago-he-basic-suite-master-engine journal[26755]: 2020-12-17 14:53:14,427+0100 ovirt-engine: INFO _detectJBossVersion:186 Detecting JBoss version. Running: /usr/lib/jvm/jre-1 1-openjdk-11.0.9.11-3.el8.x86_64/bin/java ['ovirt-engine-version', '--add-modules', 'java.se', '-server', '-XX:+TieredCompilation', '-Xms1024M', '-Xmx1024M', '-Xss1M', '-Djava.awt.headless=t rue', '-Dsun.rmi.dgc.client.gcInterval=3600000', '-Dsun.rmi.dgc.server.gcInterval=3600000', '-Djsse.enableSNIExtension=false', '-Dresteasy.preferJacksonOverJsonB=true', '-Djackson.deserializ ation.whitelist.packages=org,com,java,javax', '-XX:+HeapDumpOnOutOfMemoryError', '-XX:HeapDumpPath=/var/log/ovirt-engine/dump', '-Djava.util.logging.manager=org.jboss.logmanager', '-Dlogging .configuration=file:///var/lib/ovirt-engine/jboss_runtime/config/ovirt-engine-logging.properties', '-Dorg.jboss.resolver.warning=true', '-Djboss.modules.system.pkgs=org.jboss.byteman', '-Djb oss.server.default.config=ovirt-engine', '-Djboss.home.dir=/usr/share/ovirt-engine-wildfly', '-Djboss.server.base.dir=/usr/share/ovirt-engine', '-Djboss.server.data.dir=/var/lib/ovirt-engine ', '-Djboss.server.log.dir=/var/log/ovirt-engine', '-Djboss.server.config.dir=/var/lib/ovirt-engine/jboss_runtime/config', '-Djboss.server.temp.dir=/var/lib/ovirt-engine/jboss_runtime/tmp', '-Djboss.controller.temp.dir=/var/lib/ovirt-engine/jboss_runtime/tmp', '-jar', '/usr/share/ovirt-engine-wildfly/jboss-modules.jar', '-mp', '/usr/share/ovirt-engine-wildfly-overlay/modules:/u sr/share/ovirt-engine/modules/common:/usr/share/ovirt-engine-extension-aaa-jdbc/modules:/usr/share/ovirt-engine-extension-aaa-ldap/modules:/usr/share/ovirt-engine-extension-aaa-misc/modules: /usr/share/ovirt-engine-extension-logger-log4j/modules:/usr/share/ovirt-engine-wildfly/modules', '-jaxpmodule', 'javax.xml.jaxp-provider', 'org.jboss.as.standalone', '-v'] Dec 17 13:53:21 lago-he-basic-suite-master-engine journal[26755]: 2020-12-17 14:53:21,740+0100 ovirt-engine: INFO _detectJBossVersion:206 Return code: 0, | stdout: '['WildFly Full 19.1.0.Final (WildFly Core 11.1.1.Final)'], | stderr: '['WARNING: -jaxpmodule is deprecated and may be removed in a future release']' Above is the last line from the python code, before it starts java. We have some debugging code that makes it emit a bit more, can be activated if needed. The java code writes two (relevant) log files, first of which is server.log, starts with: 2020-12-17 14:53:46,557+01 INFO [org.jboss.as.server.deployment.scanner] (MSC service thread 1-4) WFLYDS0013: Started FileSystemDeploymentService for directory /var/lib/ovirt-engine/jboss_runtime/deployments As you can see (notably in the syslog lines from the python code, which include both syslog time and python-logged time), things above are a bit messy around timezones - still not sure why, but I do not think this is related to the bug - and is probably an issue in how we run the various VMs in ovirt-system-tests etc.
Created attachment 1740064 [details] ovirt-system-tests logs
(In reply to Sandro Bonazzola from comment #2) > jamacku are you aware of any change in systemd-239-43.el8 that can lead to > some change in state reporting? Running this manually on the engine shows: # systemctl show ovirt-engine | grep -i active Failed to parse bus message: Invalid argument So I suspect current is a duplicate of one of the other bugs mentioning this error, e.g. bug 1908275.
I confirm that the workaround in ansible [1], mentioned in bug 1908275 comment 1, fixes the problem locally. [1] https://github.com/ansible/ansible/pull/72337
I confirm that current bug is solved by upgrading to ansible-2.9.16-2. Closing. If we have a Stream ansible bug, we can change the reason to duplicate of it.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days