Bug 1831824
Summary: | restraintd crashes when running /distribution/virt-install task | ||
---|---|---|---|
Product: | [Retired] Restraint | Reporter: | Jan Tluka <jtluka> |
Component: | general | Assignee: | Daniel RodrÃguez <danrodri> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 0.2.0 | CC: | asavkov, bpeck, breilly, cbeer, cbouchar, olichtne |
Target Milestone: | 0.2.1 | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 12:43:48 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jan Tluka
2020-05-05 17:24:34 UTC
characteristics of this bug very similar to BZ1832918. Trap message is the same and they are performing a reboot. However, this case is not readily reproducible. In fact, was told it is rarely reproducible. TL;DR, the issue, restraintd crashing, should be fixed in next release, 0.2.1, with, https://github.com/beaker-project/restraint/pull/46/ https://github.com/beaker-project/restraint/pull/54/ But we still don't know if there is something else affecting the task. Therefore we are keeping this open until we can confirm that the issue is not present in the next release. As with other issues, workaround here can be to use restraint-0.1.45 or latest one in https://beaker-project.org/nightlies/harness/ -- As mentioned elsewhere, the message, [ 1248.986557] traps: restraintd[2872] trap int3 ip:5fcdf1 sp:7ffcd7f46100 error:0 in restraintd[400000+6da000] Is produced by a call to "g_error ()" in restraint. Per GLib's docs, https://developer.gnome.org/glib/stable/glib-Message-Logging.html#g-error "Error messages are always fatal, resulting in a call to G_BREAKPOINT() to terminate the application." Calls to "g_error ()" were introduced in 0.2.0 with the changes to remove the libssh library and they are breaking error handling. We reverted all of them in, https://github.com/beaker-project/restraint/pull/46 https://github.com/beaker-project/restraint/pull/47 https://github.com/beaker-project/restraint/pull/54 From these, there are two that will kill restraint if reached. The on in server_io_callback, https://github.com/beaker-project/restraint/blob/f0fb969cf0052b44e0ebaa0b1473ead1fc76d06c/src/server.c#L165 and the one in task_io_callback, https://github.com/beaker-project/restraint/blob/f0fb969cf0052b44e0ebaa0b1473ead1fc76d06c/src/task.c#L222 Checking the failed recipes, we can see that restraintd is terminated while tasks are running, so one of these two calls is being reached. The only similarity with BZ1832918 is that there is a call to "g_error ()", but in a completely different part of the code. This was confirmed in system logs of https://beaker.engineering.redhat.com/recipes/8238510#task110187225, restraintd[2952]: use_pty:FALSE /usr/share/restraint/plugins/run_task_plugins /usr/share/restraint/plugins/run_plugins restraintd[2952]: Invalid file descriptor. restraintd[2952]: IO error: Bad file descriptor kernel: traps: restraintd[2952] trap int3 ip:5fcdf1 sp:7ffe7f837eb0 error:0 in restraintd[400000+6da000] systemd[1]: restraintd.service: main process exited, code=killed, status=5/TRAP systemd[1]: Unit restraintd.service entered failed state. systemd[1]: restraintd.service failed. We are hitting the "G_IO_STATUS_ERROR" case here, and it must be because the stdout of the process running the is closed. With these "g_error ()" calls removed, restraintd is not going to be terminated, and the error that we are hitting here should be handled. Either it's acceptable and the task succeeded, or the task failed and this should logged as usual. If the task failed, either there is something wrong in the task/system, or there is some other bug in restraint that we need to fix. As for now, I'm still not sure about the root cause here, so running these tests with Restraint 0.1.45 may help if the problem is in the task/system. This can be done setting "restraint-rhts-0.1.45" for harness in ksmeta. Using restraint from https://beaker-project.org/nightlies/harness/ may help too. |