Description of problem: Issue was seen while running ktst_msg on KVM Guest (x86_64 2 vcpus). During testing task ktst_msg hit LWD due to: Error uploading: /var/log/messages Bad Request [soup_http_error_quark, 400] Looking at the job logs we see the ktst-fcntl subtask completed successfully, but ktst_msg hit LWD having issue uploading the logs. Version-Release number of selected component (if applicable): distro: RHEL-ALT-7.4-20170803.n.0 Server x86_64 kernel-alt: 4.11.0-22.el7a restraint: 0.1.30-1.el7_2 task: /kernel/misc/ktst_msg 1.2-2 How reproducible: I was not able to reproduce. Steps to Reproduce: 1. Install target system listed in comment#1 with RHEL-ALT-7.4-20170803.n.0 2. Install kernel-alt 4.11.0-22.el7a 3. run /kernel/misc/ktst_msg Actual results: https://beaker.engineering.redhat.com/recipes/4133840#task59428745 http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2017/08/19897/1989782/4133840/59428745/harness.log ---<-snip->--- Uploading dmesg.log ....done <no matches> Uploading avc.log ..done ..use_pty:FALSE /usr/share/restraint/plugins/run_task_plugins /usr/share/restraint/plugins/run_plugins <no matches> Uploading avc.log ..done ...failed Error uploading: /var/log/messages Bad Request [soup_http_error_quark, 400] ---<-snip->--- Expected results: successful uploading of task logs Additional info:
Looking on the Beaker LC side I think the only case where it would return 400 for a result log PUT request is if there is some mismatch between the Content-Range and Content-Length headers: 941 if content_range: 942 # a few sanity checks 943 if req.content_length != (content_range.stop - content_range.start): 944 raise BadRequest('Content length does not match range length') 945 if content_range.length and content_range.length < content_range.stop: 946 raise BadRequest('Total length is smaller than range end')
One bug I noticed in upload.c is that it only stats the file once when it first opens it, and assumes that that is the total size and it never changes. But if the file is still growing while it is being uploaded, it will end up uploading a chunk which is past the stated length, which is one of the two clauses above which would cause Beaker to give a 400 response. So that seems like a likely culprit.
I have a reproducer. Run /distribution/command with: set +x ; while [ "$(stat -c %s /var/log/messages)" -lt 1024000 ] ; do logger "this line is to make /var/log/messages reasonably large" ; done ; rhts-submit-log -l /var/log/messages & sleep 0.5 ; logger "now make it grow" ; wait %1 https://beaker-devel.app.eng.bos.redhat.com/recipes/25283#task163269 taskout.log shows the error: Uploading messages .......failed Error uploading: /var/log/messages Bad Request [soup_http_error_quark, 400] + code=144 + '[' 144 -ne 0 ']' + echo '- fail: unexpected error code 144.' And in the uploaded messages file, you can see that the last chunk was missed (the file ends in a bunch of zeroes). The trick for the reproducer is that /var/log/messages needs to be quite large to start with, so that it doesn't all just fit immediately into the first chunk and succeed.
This is the bare minimal fix: https://gerrit.beaker-project.org/#/c/restraint/+/6263 although I still need to try that patch against the reproducer. Unfortunately there are no tests covering the rstrnt-submit-log command... I can see a few other iffy things about it. The static offset and buffer are not great, since the upload.h module otherwise suggests that you could do two uploads concurrently but that wouldn't work with a global offset. And it *really* doesn't need to make a separate HTTP request for each chunk it reads from the file I think. So there are some potential improvements which could be made. In fact this command, being quite isolated, would be a good candidate for rewriting in Rust... But I would only tackle larger refactoring if we could cover it with some functional tests first.
The above fix is not ideal but it does improve the situation. Therefore, consider including it in Restraint 0.1.37 and filing a separate RFE for further improving the upload command (and adding tests, etc).
Restraint 0.1.40 has been released.