Bug 1600281 - reporting a very large score value causes 500 error: (DataError) (1264, "Out of range value for column \'score\' at row 1")
Summary: reporting a very large score value causes 500 error: (DataError) (1264, "Out ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: general
Version: 25
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: 25.6
Assignee: Dan Callaghan
QA Contact: Jacob McKenzie
URL:
Whiteboard:
Depends On:
Blocks: 1620334
TreeView+ depends on / blocked
 
Reported: 2018-07-11 22:52 UTC by Dan Callaghan
Modified: 2018-09-03 23:31 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
: 1620334 (view as bug list)
Environment:
Last Closed: 2018-09-03 23:31:25 UTC
Embargoed:


Attachments (Terms of Use)

Description Dan Callaghan 2018-07-11 22:52:46 UTC
Description of problem:


Version-Release number of selected component (if applicable):
25.4 (plus bug 1586049 hotfix)

How reproducible:
probably easy

Steps to Reproduce:
1. have your task report a score of 2147483649

Actual results:
bkr.labcontroller.main ERROR Error handling request POST //recipes/5372689/tasks/75380049/results/
 Traceback (most recent call last):
   File "/usr/lib/python2.6/site-packages/bkr/labcontroller/main.py", line 147, in _log_failed_requests
     return func(environ, start_response)
   File "/usr/lib/python2.6/site-packages/werkzeug/wrappers.py", line 250, in <lambda>
     return _patch_wrapper(f, lambda *a: f(*a[:-2]+(cls(a[-2]),))(*a[-2:]))
   File "/usr/lib/python2.6/site-packages/bkr/labcontroller/main.py", line 132, in __call__
     return getattr(obj, attr)(req, **args)     
   File "/usr/lib/python2.6/site-packages/bkr/labcontroller/proxy.py", line 862, in post_result
     req.form.get('message'))
[...]
 Fault: <Fault 1: '<class \'sqlalchemy.exc.DataError\'>:(DataError) (1264, "Out of range value for column \'score\' at row 1") \'INSERT INTO recipe_task_result (recipe_task_id, path, result, score, log, start_time) VALUES (%s, %s, %s, %s, %s, %s)\' (75380049L, \'/kernel/ +memory/shmget\', \'Pass\', 2147483649.0, None, datetime.datetime(2018, 7, 11, 12, 37, 8, 202706))'>


Expected results:
The result should be accepted -- need to check what the previous behaviour with MySQL 5.1 non-strict mode was (most likely, score was mapped to 0)

Additional info:
Very similar to bug 1586049

Comment 1 Dan Callaghan 2018-07-11 22:53:39 UTC
We should also consider patching this (for 26.0) to accept arbitrary large integers for the score. Maybe store as varchar if Mariadb does not have a good arbitrary integer type.

Comment 2 PaulB 2018-08-03 18:05:27 UTC
All,
pbunyan hopes its ok to pile on here in this BZ...

I am seeing this issue recently when running testing for kernel-debug using
distribution/kernelinstall (https://beaker.engineering.redhat.com/tasks/566).

The task is successfully installing and we boot the debug kernel, but restraintd
is having issue uploading the logs. 

I was able to reproduce this issue 2/3 kernel-debug installs:
 https://beaker.engineering.redhat.com/jobs/2664633

========================================
Troubleshooting on the hosts I see this:
========================================
[root@dell-per930-01 ~]# service restraintd status
Redirecting to /bin/systemctl status restraintd.service
● restraintd.service - The restraint harness.
   Loaded: loaded (/usr/lib/systemd/system/restraintd.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2018-08-03 11:46:59 EDT; 59min ago
  Process: 4645 ExecStartPre=/usr/bin/check_beaker (code=exited, status=0/SUCCESS)
 Main PID: 4655 (restraintd)
   CGroup: /system.slice/restraintd.service
           ├─ 4655 /usr/bin/restraintd
           ├─ 4694 /bin/bash -l /usr/share/restraint/plugins/task_run.d/10_bash_login /usr/share/restrain...
           ├─ 4771 make run
           ├─ 4924 /bin/sh ./runtest.sh
           └─13067 rstrnt-report-result /distribution/kernelinstall/Sysinfo PASS 613000000

Aug 03 12:37:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8
Aug 03 12:38:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8
Aug 03 12:39:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8
Aug 03 12:40:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8
Aug 03 12:41:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8
Aug 03 12:42:22 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: Internal Server Error: Unabl....
Aug 03 12:42:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8
Aug 03 12:43:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8
Aug 03 12:44:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8
Aug 03 12:45:59 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: *** Current Time: Fri Aug 03...8
Hint: Some lines were ellipsized, use -l to show in full.
[root@dell-per930-01 ~]# uname -a
Linux dell-per930-01.khw.lab.eng.bos.redhat.com 4.14.0-101.el7a.x86_64.debug #1 SMP Tue Jul 31 21:13:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux


*******This line gets our attention*******
Aug 03 12:42:22 dell-per930-01.khw.lab.eng.bos.redhat.com restraintd[4655]: Internal Server Error: Unabl....


[root@dell-per930-01 ~]# pidof restraintd
4655
[root@dell-per930-01 ~]# strace -t -v -s 65535 -p 4655 -e trace=network
strace: Process 4655 attached
13:55:17 socket(AF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 12
13:55:17 connect(12, {sa_family=AF_INET6, sin6_port=htons(8000), inet_pton(AF_INET6, "2620:52:0:132a:5054:ff:fe38:5099", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EINPROGRESS (Operation now in progress)
13:55:17 getsockopt(12, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
13:55:17 setsockopt(12, SOL_TCP, TCP_NODELAY, [1], 4) = 0
13:55:17 getsockopt(12, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
13:55:17 sendto(12, "POST //recipes/5472952/tasks/76744989/results/ HTTP/1.1\r\nHost: localhost:8081\r\nContent-Type: application/x-www-form-urlencoded\r\nAccept-Encoding: gzip, deflate\r\nConnection: Keep-Alive\r\nContent-Length: 74\r\n\r\n", 206, MSG_NOSIGNAL, NULL, 0) = 206
13:55:17 sendto(12, "result=PASS&path=%2Fdistribution%2Fkernelinstall%2FSysinfo&score=613000000", 74, MSG_NOSIGNAL, NULL, 0) = 74
13:55:17 recvfrom(12, 0xec51d0, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
13:55:17 recvfrom(12, "HTTP/1.1 500 Internal Server Error\r\nContent-Type: text/plain\r\nConnection: close\r\nContent-Length: 21\r\nDate: Fri, 03 Aug 2018 17:55:17 GMT\r\n\r\nInternal Server Error", 8192, 0, NULL, NULL) = 161
---<-snip->---

*******These lines gets our attention*******
13:55:17 sendto(12, "result=PASS&path=%2Fdistribution%2Fkernelinstall%2FSysinfo&score=613000000", 74, MSG_NOSIGNAL, NULL, 0) = 74

13:55:17 recvfrom(12, "HTTP/1.1 500 Internal Server Error\r\nContent-Type:




So its seems this score of "score=613000000" is the culprit here.

Best,
-pbunyan

Comment 3 PaulB 2018-08-03 18:07:45 UTC
Dan,
My testing was run with Beaker 25.5.

Is this on the radar to be fixed still?

Best,
-pbunyan

Comment 4 Jeff Bastian 2018-08-03 18:21:10 UTC
The /distribution/kernelinstall task was padding its score with 6 trailing 0s for some reason lost to history (it's in the first git commit on this task):

    result_count=$(/usr/bin/printf "%03d%03d%03d\n" $dresult_count 0 0)
    ...
    report_result $TEST/Sysinfo PASS $result_count

dresult_count is just the number of lines in the dmesg logs containing certain strings, so it's not terribly complicated.  The debug kernel, though, is more verbose so this caught a lot of messages and pushed the score over Beaker's theshold (I think).

I removed the padding.  Hopefully this fixes it.

    result_count=$(/usr/bin/printf "%03d\n" $dresult_count)

Comment 5 Roman Joost 2018-08-06 04:57:25 UTC
Dear Paul,

yes we still consider to fix this to be backwards compatible. MySQL simply accepted whatever and coerced it. If you need a fix urgently, I think adjusting the score reporting is a viable approach and even the "cleaner" option.

Comment 6 Dan Callaghan 2018-08-08 05:38:10 UTC
The recipe_task_result.score column is declared as DECIMAL(10,0) which means MySQL should accept integers up to 10 digits long.

The behaviour on MySQL 5.1 when you try to insert a number that is too large is that it silently caps it to the maximum representable value (9999999999).

The behaviour on Mariadb (strict mode) is you get the exception mentioned in comment 0. DataError: Out of range value for column 'score'

However there is something else odd going on here. Both values mentioned in this bug (2147483649 or 613000000) are less than 10 digits long. I checked our production database and the type of the column is inexplicably DECIMAL(10,2) which means it will only accept values up to 8 digits long with 2 decimal places (but I think we already coerce to int before that).

This problem seems to have existed since... forever. The earliest db dump we have is from 2012 and it shows that the type was wrong then.

https://git.beaker-project.org/cgit/beaker/tree/IntegrationTests/src/bkr/inttest/server/database-dumps/redhat-production-20120216.sql

So it means that previously, scores greater than 8 digits were being silently capped at 99999999. Now they are triggered the DataError.

Even though many years ago I wrote some exhaustive tests to find schema inconsistencies in our prod schema like this one, and I went through and fixed all the problems it found... it seems the schema checking logic does not notice this difference in type (DECIMAL(10,2) vs DECIMAL(10,0)).

So we need a few things here:

* fix the schema tests to catch this inconsistency in our prod schema

* add a migration to fix up the column type on Beaker 26.0

* add some logic similar to bug 1586049, to map scores longer than 8 digits to 99999999 on Beaker 25.6

* tweak the logic to map scores longer than 10 digits to 9999999999 on Beaker 26.0, assuming the column type is fixed to the expected 10 digits

Comment 7 Dan Callaghan 2018-08-08 07:04:30 UTC
This is the short-term fix for Beaker 25.x:

https://gerrit.beaker-project.org/#/c/beaker/+/6254

I also have a patch for the develop branch, to fix it properly by changing the column types, but it's easier if I don't post that one until the above one is merged into develop.

Comment 8 Dan Callaghan 2018-08-09 07:47:41 UTC
Will consider this bz fixed in 25.6 by the above patch, but the "real" fix is here for 26.0:

https://gerrit.beaker-project.org/#/c/beaker/+/6256

Comment 10 Jacob McKenzie 2018-08-23 03:56:18 UTC
I created a task that would report the score "12345678900" and it reported ( "9" * 8 ) as expected.

https://beaker-devel.app.eng.bos.redhat.com/recipes/25388#task163474

Comment 11 Roman Joost 2018-09-03 23:31:25 UTC
This has been released with Beaker 25.6.

Release Notes: https://beaker-project.org/docs/whats-new/release-25.html#beaker-25-6


Note You need to log in before you can comment on or make changes to this bug.