Description of problem: Version-Release number of selected component (if applicable): 25.4 (plus bug 1586049 hotfix) How reproducible: probably easy Steps to Reproduce: 1. have your task report a score of 2147483649 Actual results: bkr.labcontroller.main ERROR Error handling request POST //recipes/5372689/tasks/75380049/results/ Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/bkr/labcontroller/main.py", line 147, in _log_failed_requests return func(environ, start_response) File "/usr/lib/python2.6/site-packages/werkzeug/wrappers.py", line 250, in <lambda> return _patch_wrapper(f, lambda *a: f(*a[:-2]+(cls(a[-2]),))(*a[-2:])) File "/usr/lib/python2.6/site-packages/bkr/labcontroller/main.py", line 132, in __call__ return getattr(obj, attr)(req, **args) File "/usr/lib/python2.6/site-packages/bkr/labcontroller/proxy.py", line 862, in post_result req.form.get('message')) [...] Fault: <Fault 1: '<class \'sqlalchemy.exc.DataError\'>:(DataError) (1264, "Out of range value for column \'score\' at row 1") \'INSERT INTO recipe_task_result (recipe_task_id, path, result, score, log, start_time) VALUES (%s, %s, %s, %s, %s, %s)\' (75380049L, \'/kernel/ +memory/shmget\', \'Pass\', 2147483649.0, None, datetime.datetime(2018, 7, 11, 12, 37, 8, 202706))'> Expected results: The result should be accepted -- need to check what the previous behaviour with MySQL 5.1 non-strict mode was (most likely, score was mapped to 0) Additional info: Very similar to bug 1586049
We should also consider patching this (for 26.0) to accept arbitrary large integers for the score. Maybe store as varchar if Mariadb does not have a good arbitrary integer type.
All, pbunyan hopes its ok to pile on here in this BZ... I am seeing this issue recently when running testing for kernel-debug using distribution/kernelinstall (https://beaker.engineering.redhat.com/tasks/566). The task is successfully installing and we boot the debug kernel, but restraintd is having issue uploading the logs. I was able to reproduce this issue 2/3 kernel-debug installs: https://beaker.engineering.redhat.com/jobs/2664633 ======================================== Troubleshooting on the hosts I see this: ======================================== [root@dell-per930-01 ~]# service restraintd status Redirecting to /bin/systemctl status restraintd.service ● restraintd.service - The restraint harness. Loaded: loaded (/usr/lib/systemd/system/restraintd.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2018-08-03 11:46:59 EDT; 59min ago Process: 4645 ExecStartPre=/usr/bin/check_beaker (code=exited, status=0/SUCCESS) Main PID: 4655 (restraintd) CGroup: /system.slice/restraintd.service ├─ 4655 /usr/bin/restraintd ├─ 4694 /bin/bash -l /usr/share/restraint/plugins/task_run.d/10_bash_login /usr/share/restrain... ├─ 4771 make run ├─ 4924 /bin/sh ./runtest.sh └─13067 rstrnt-report-result /distribution/kernelinstall/Sysinfo PASS 613000000 Aug 03 12:37:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Aug 03 12:38:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Aug 03 12:39:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Aug 03 12:40:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Aug 03 12:41:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Aug 03 12:42:22 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: Internal Server Error: Unabl.... Aug 03 12:42:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Aug 03 12:43:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Aug 03 12:44:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Aug 03 12:45:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Hint: Some lines were ellipsized, use -l to show in full. [root@dell-per930-01 ~]# uname -a Linux dell-per930-01.khw.lab.eng.bos.redhat.com 4.14.0-101.el7a.x86_64.debug #1 SMP Tue Jul 31 21:13:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux *******This line gets our attention******* Aug 03 12:42:22 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: Internal Server Error: Unabl.... [root@dell-per930-01 ~]# pidof restraintd 4655 [root@dell-per930-01 ~]# strace -t -v -s 65535 -p 4655 -e trace=network strace: Process 4655 attached 13:55:17 socket(AF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 12 13:55:17 connect(12, {sa_family=AF_INET6, sin6_port=htons(8000), inet_pton(AF_INET6, "2620:52:0:132a:5054:ff:fe38:5099", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EINPROGRESS (Operation now in progress) 13:55:17 getsockopt(12, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 13:55:17 setsockopt(12, SOL_TCP, TCP_NODELAY, [1], 4) = 0 13:55:17 getsockopt(12, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 13:55:17 sendto(12, "POST //recipes/5472952/tasks/76744989/results/ HTTP/1.1\r\nHost: localhost:8081\r\nContent-Type: application/x-www-form-urlencoded\r\nAccept-Encoding: gzip, deflate\r\nConnection: Keep-Alive\r\nContent-Length: 74\r\n\r\n", 206, MSG_NOSIGNAL, NULL, 0) = 206 13:55:17 sendto(12, "result=PASS&path=%2Fdistribution%2Fkernelinstall%2FSysinfo&score=613000000", 74, MSG_NOSIGNAL, NULL, 0) = 74 13:55:17 recvfrom(12, 0xec51d0, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) 13:55:17 recvfrom(12, "HTTP/1.1 500 Internal Server Error\r\nContent-Type: text/plain\r\nConnection: close\r\nContent-Length: 21\r\nDate: Fri, 03 Aug 2018 17:55:17 GMT\r\n\r\nInternal Server Error", 8192, 0, NULL, NULL) = 161 ---<-snip->--- *******These lines gets our attention******* 13:55:17 sendto(12, "result=PASS&path=%2Fdistribution%2Fkernelinstall%2FSysinfo&score=613000000", 74, MSG_NOSIGNAL, NULL, 0) = 74 13:55:17 recvfrom(12, "HTTP/1.1 500 Internal Server Error\r\nContent-Type: So its seems this score of "score=613000000" is the culprit here. Best, -pbunyan
Dan, My testing was run with Beaker 25.5. Is this on the radar to be fixed still? Best, -pbunyan
The /distribution/kernelinstall task was padding its score with 6 trailing 0s for some reason lost to history (it's in the first git commit on this task): result_count=$(/usr/bin/printf "%03d%03d%03d\n" $dresult_count 0 0) ... report_result $TEST/Sysinfo PASS $result_count dresult_count is just the number of lines in the dmesg logs containing certain strings, so it's not terribly complicated. The debug kernel, though, is more verbose so this caught a lot of messages and pushed the score over Beaker's theshold (I think). I removed the padding. Hopefully this fixes it. result_count=$(/usr/bin/printf "%03d\n" $dresult_count)
Dear Paul, yes we still consider to fix this to be backwards compatible. MySQL simply accepted whatever and coerced it. If you need a fix urgently, I think adjusting the score reporting is a viable approach and even the "cleaner" option.
The recipe_task_result.score column is declared as DECIMAL(10,0) which means MySQL should accept integers up to 10 digits long. The behaviour on MySQL 5.1 when you try to insert a number that is too large is that it silently caps it to the maximum representable value (9999999999). The behaviour on Mariadb (strict mode) is you get the exception mentioned in comment 0. DataError: Out of range value for column 'score' However there is something else odd going on here. Both values mentioned in this bug (2147483649 or 613000000) are less than 10 digits long. I checked our production database and the type of the column is inexplicably DECIMAL(10,2) which means it will only accept values up to 8 digits long with 2 decimal places (but I think we already coerce to int before that). This problem seems to have existed since... forever. The earliest db dump we have is from 2012 and it shows that the type was wrong then. https://git.beaker-project.org/cgit/beaker/tree/IntegrationTests/src/bkr/inttest/server/database-dumps/redhat-production-20120216.sql So it means that previously, scores greater than 8 digits were being silently capped at 99999999. Now they are triggered the DataError. Even though many years ago I wrote some exhaustive tests to find schema inconsistencies in our prod schema like this one, and I went through and fixed all the problems it found... it seems the schema checking logic does not notice this difference in type (DECIMAL(10,2) vs DECIMAL(10,0)). So we need a few things here: * fix the schema tests to catch this inconsistency in our prod schema * add a migration to fix up the column type on Beaker 26.0 * add some logic similar to bug 1586049, to map scores longer than 8 digits to 99999999 on Beaker 25.6 * tweak the logic to map scores longer than 10 digits to 9999999999 on Beaker 26.0, assuming the column type is fixed to the expected 10 digits
This is the short-term fix for Beaker 25.x: https://gerrit.beaker-project.org/#/c/beaker/+/6254 I also have a patch for the develop branch, to fix it properly by changing the column types, but it's easier if I don't post that one until the above one is merged into develop.
Will consider this bz fixed in 25.6 by the above patch, but the "real" fix is here for 26.0: https://gerrit.beaker-project.org/#/c/beaker/+/6256
I created a task that would report the score "12345678900" and it reported ( "9" * 8 ) as expected. https://beaker-devel.app.eng.bos.redhat.com/recipes/25388#task163474
This has been released with Beaker 25.6. Release Notes: https://beaker-project.org/docs/whats-new/release-25.html#beaker-25-6