Bug 1600281
Summary: | reporting a very large score value causes 500 error: (DataError) (1264, "Out of range value for column \'score\' at row 1") | |||
---|---|---|---|---|
Product: | [Retired] Beaker | Reporter: | Dan Callaghan <dcallagh> | |
Component: | general | Assignee: | Dan Callaghan <dcallagh> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Jacob McKenzie <jmckenzi> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 25 | CC: | aquini, dcallagh, jbastian, jmckenzi, mtyson, pbunyan, rjoost | |
Target Milestone: | 25.6 | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1620334 (view as bug list) | Environment: | ||
Last Closed: | 2018-09-03 23:31:25 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1620334 |
Description
Dan Callaghan
2018-07-11 22:52:46 UTC
We should also consider patching this (for 26.0) to accept arbitrary large integers for the score. Maybe store as varchar if Mariadb does not have a good arbitrary integer type. All, pbunyan hopes its ok to pile on here in this BZ... I am seeing this issue recently when running testing for kernel-debug using distribution/kernelinstall (https://beaker.engineering.redhat.com/tasks/566). The task is successfully installing and we boot the debug kernel, but restraintd is having issue uploading the logs. I was able to reproduce this issue 2/3 kernel-debug installs: https://beaker.engineering.redhat.com/jobs/2664633 ======================================== Troubleshooting on the hosts I see this: ======================================== [root@dell-per930-01 ~]# service restraintd status Redirecting to /bin/systemctl status restraintd.service ● restraintd.service - The restraint harness. Loaded: loaded (/usr/lib/systemd/system/restraintd.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2018-08-03 11:46:59 EDT; 59min ago Process: 4645 ExecStartPre=/usr/bin/check_beaker (code=exited, status=0/SUCCESS) Main PID: 4655 (restraintd) CGroup: /system.slice/restraintd.service ├─ 4655 /usr/bin/restraintd ├─ 4694 /bin/bash -l /usr/share/restraint/plugins/task_run.d/10_bash_login /usr/share/restrain... ├─ 4771 make run ├─ 4924 /bin/sh ./runtest.sh └─13067 rstrnt-report-result /distribution/kernelinstall/Sysinfo PASS 613000000 Aug 03 12:37:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Aug 03 12:38:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Aug 03 12:39:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Aug 03 12:40:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Aug 03 12:41:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Aug 03 12:42:22 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: Internal Server Error: Unabl.... Aug 03 12:42:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Aug 03 12:43:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Aug 03 12:44:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Aug 03 12:45:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8 Hint: Some lines were ellipsized, use -l to show in full. [root@dell-per930-01 ~]# uname -a Linux dell-per930-01.khw.lab.eng.bos.redhat.com 4.14.0-101.el7a.x86_64.debug #1 SMP Tue Jul 31 21:13:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux *******This line gets our attention******* Aug 03 12:42:22 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: Internal Server Error: Unabl.... [root@dell-per930-01 ~]# pidof restraintd 4655 [root@dell-per930-01 ~]# strace -t -v -s 65535 -p 4655 -e trace=network strace: Process 4655 attached 13:55:17 socket(AF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 12 13:55:17 connect(12, {sa_family=AF_INET6, sin6_port=htons(8000), inet_pton(AF_INET6, "2620:52:0:132a:5054:ff:fe38:5099", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EINPROGRESS (Operation now in progress) 13:55:17 getsockopt(12, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 13:55:17 setsockopt(12, SOL_TCP, TCP_NODELAY, [1], 4) = 0 13:55:17 getsockopt(12, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 13:55:17 sendto(12, "POST //recipes/5472952/tasks/76744989/results/ HTTP/1.1\r\nHost: localhost:8081\r\nContent-Type: application/x-www-form-urlencoded\r\nAccept-Encoding: gzip, deflate\r\nConnection: Keep-Alive\r\nContent-Length: 74\r\n\r\n", 206, MSG_NOSIGNAL, NULL, 0) = 206 13:55:17 sendto(12, "result=PASS&path=%2Fdistribution%2Fkernelinstall%2FSysinfo&score=613000000", 74, MSG_NOSIGNAL, NULL, 0) = 74 13:55:17 recvfrom(12, 0xec51d0, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) 13:55:17 recvfrom(12, "HTTP/1.1 500 Internal Server Error\r\nContent-Type: text/plain\r\nConnection: close\r\nContent-Length: 21\r\nDate: Fri, 03 Aug 2018 17:55:17 GMT\r\n\r\nInternal Server Error", 8192, 0, NULL, NULL) = 161 ---<-snip->--- *******These lines gets our attention******* 13:55:17 sendto(12, "result=PASS&path=%2Fdistribution%2Fkernelinstall%2FSysinfo&score=613000000", 74, MSG_NOSIGNAL, NULL, 0) = 74 13:55:17 recvfrom(12, "HTTP/1.1 500 Internal Server Error\r\nContent-Type: So its seems this score of "score=613000000" is the culprit here. Best, -pbunyan Dan, My testing was run with Beaker 25.5. Is this on the radar to be fixed still? Best, -pbunyan The /distribution/kernelinstall task was padding its score with 6 trailing 0s for some reason lost to history (it's in the first git commit on this task): result_count=$(/usr/bin/printf "%03d%03d%03d\n" $dresult_count 0 0) ... report_result $TEST/Sysinfo PASS $result_count dresult_count is just the number of lines in the dmesg logs containing certain strings, so it's not terribly complicated. The debug kernel, though, is more verbose so this caught a lot of messages and pushed the score over Beaker's theshold (I think). I removed the padding. Hopefully this fixes it. result_count=$(/usr/bin/printf "%03d\n" $dresult_count) Dear Paul, yes we still consider to fix this to be backwards compatible. MySQL simply accepted whatever and coerced it. If you need a fix urgently, I think adjusting the score reporting is a viable approach and even the "cleaner" option. The recipe_task_result.score column is declared as DECIMAL(10,0) which means MySQL should accept integers up to 10 digits long. The behaviour on MySQL 5.1 when you try to insert a number that is too large is that it silently caps it to the maximum representable value (9999999999). The behaviour on Mariadb (strict mode) is you get the exception mentioned in comment 0. DataError: Out of range value for column 'score' However there is something else odd going on here. Both values mentioned in this bug (2147483649 or 613000000) are less than 10 digits long. I checked our production database and the type of the column is inexplicably DECIMAL(10,2) which means it will only accept values up to 8 digits long with 2 decimal places (but I think we already coerce to int before that). This problem seems to have existed since... forever. The earliest db dump we have is from 2012 and it shows that the type was wrong then. https://git.beaker-project.org/cgit/beaker/tree/IntegrationTests/src/bkr/inttest/server/database-dumps/redhat-production-20120216.sql So it means that previously, scores greater than 8 digits were being silently capped at 99999999. Now they are triggered the DataError. Even though many years ago I wrote some exhaustive tests to find schema inconsistencies in our prod schema like this one, and I went through and fixed all the problems it found... it seems the schema checking logic does not notice this difference in type (DECIMAL(10,2) vs DECIMAL(10,0)). So we need a few things here: * fix the schema tests to catch this inconsistency in our prod schema * add a migration to fix up the column type on Beaker 26.0 * add some logic similar to bug 1586049, to map scores longer than 8 digits to 99999999 on Beaker 25.6 * tweak the logic to map scores longer than 10 digits to 9999999999 on Beaker 26.0, assuming the column type is fixed to the expected 10 digits This is the short-term fix for Beaker 25.x: https://gerrit.beaker-project.org/#/c/beaker/+/6254 I also have a patch for the develop branch, to fix it properly by changing the column types, but it's easier if I don't post that one until the above one is merged into develop. Will consider this bz fixed in 25.6 by the above patch, but the "real" fix is here for 26.0: https://gerrit.beaker-project.org/#/c/beaker/+/6256 I created a task that would report the score "12345678900" and it reported ( "9" * 8 ) as expected. https://beaker-devel.app.eng.bos.redhat.com/recipes/25388#task163474 This has been released with Beaker 25.6. Release Notes: https://beaker-project.org/docs/whats-new/release-25.html#beaker-25-6 |