1191549 – VDSM can not parse Infinity speed for Throughput tests

Bug 1191549 - VDSM can not parse Infinity speed for Throughput tests

Summary: VDSM can not parse Infinity speed for Throughput tests

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	ovirt-3.6.0-rc
Target Release:	3.6.0
Assignee:	Ala Hino
QA Contact:	Ori Gofen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-02-11 13:44 UTC by Tobias Fiebig
Modified:	2016-03-09 19:31 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-03-09 19:31:34 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
fix regular expression for parsing dd output (323 bytes, patch) 2015-02-11 13:44 UTC, Tobias Fiebig	no flags	Details \| Diff
patch ensuring a value >0 is returned for runtime of throughput check (791 bytes, patch) 2015-02-11 13:45 UTC, Tobias Fiebig	no flags	Details \| Diff
rpm -qa from affected host (16.40 KB, text/plain) 2015-02-20 15:29 UTC, Tobias Fiebig	no flags	Details
current timing behab ntp (69.61 KB, image/png) 2015-02-20 15:35 UTC, Tobias Fiebig	no flags	Details
ntp kernel pll jan 2015 (48.65 KB, image/png) 2015-02-20 15:36 UTC, Tobias Fiebig	no flags	Details
ntp jitter jan 2015 (51.07 KB, image/png) 2015-02-20 15:37 UTC, Tobias Fiebig	no flags	Details
/var/log/messages jan 18 to jan 25 (449.48 KB, text/plain) 2015-02-20 16:00 UTC, Tobias Fiebig	no flags	Details
vdsm log jan 18-jan 25 (3.05 MB, application/x-xz) 2015-02-20 16:27 UTC, Tobias Fiebig	no flags	Details
dd results from thread-16 (1.55 MB, text/plain) 2015-02-23 09:00 UTC, Nir Soffer	no flags	Details
dd results from thread-80 (1.55 MB, text/plain) 2015-02-23 09:01 UTC, Nir Soffer	no flags	Details
System time debugging tool (639 bytes, text/x-csrc) 2015-02-23 14:48 UTC, Nir Soffer	no flags	Details
timelog from affected machine over ~2h (274.24 KB, text/plain) 2015-02-24 10:41 UTC, Tobias Fiebig	no flags	Details
time_log analyzer (391 bytes, text/x-python) 2015-02-24 10:42 UTC, Tobias Fiebig	no flags	Details
strace-log for dd (6.07 KB, text/plain) 2015-02-24 10:44 UTC, Tobias Fiebig	no flags	Details
Tool for measuring the practical resolution of clock_gettime() (805 bytes, text/x-csrc) 2015-02-24 13:15 UTC, Nir Soffer	no flags	Details
timediff log (312.50 KB, text/plain) 2015-02-24 13:32 UTC, Tobias Fiebig	no flags	Details
logs (209.71 KB, application/x-gzip) 2015-05-31 08:54 UTC, Ori Gofen	no flags	Details
logs (466.45 KB, application/x-gzip) 2015-06-29 12:03 UTC, Ori Gofen	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0362	normal	SHIPPED_LIVE	vdsm 3.6.0 bug fix and enhancement update	2016-03-09 23:49:32 UTC
oVirt gerrit	38021	None	NEW	tests: Enhance ReadSpeed.testReadSpeedRegExp unit test	Never
oVirt gerrit	38082	master	MERGED	misc: Support zero-time in getReadDelay()	Never

Description Tobias Fiebig 2015-02-11 13:44:13 UTC

Created attachment 990455 [details]
fix regular expression for parsing dd output

Description of problem:
Similar to #948232, vdsm can not handle high speeds on attached storage domains, if this results in dd prompting "infinit" speed. (i.e. for nfs due do mem-caching). This error occurs with ovirt 3.5.0 on an EL6 setup.

Example log VDSM on Host:

  241 Thread-17::DEBUG::2015-02-11 10:01:44,089::fileSD::261::Storage.Misc.excCmd::(getReadDelay) /bin/dd if=/rhev/data-center/mnt/nfs.sec.t-labs.tu-berlin.de:_srv_nfs_x4200/bf6a9e33-f659-4354-bc63-f38e96118133/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
  242 Thread-17::DEBUG::2015-02-11 10:01:44,272::fileSD::261::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n517 bytes (517 B) copied, 0 s, Infinity B/s\n'; <rc> = 0
  243 Thread-67249::DEBUG::2015-02-11 10:01:44,273::BindingXMLRPC::318::vds::(wrapper) client [130.149.230.18]
  244 Thread-17::ERROR::2015-02-11 10:01:44,275::misc::221::Storage.Misc::(readspeed) Unable to parse dd output: '517 bytes (517 B) copied, 0 s, Infinity B/s'                                                                                                                                                                 
  245 Thread-17::ERROR::2015-02-11 10:01:44,277::domainMonitor::256::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain bf6a9e33-f659-4354-bc63-f38e96118133 monitoring information
  246 Traceback (most recent call last):
  247   File "/usr/share/vdsm/storage/domainMonitor.py", line 234, in _monitorDomain
  248     self.nextStatus.readDelay = self.domain.getReadDelay()
  249   File "/usr/share/vdsm/storage/fileSD.py", line 261, in getReadDelay
  250     stats = misc.readspeed(self.metafile, 4096)
  251   File "/usr/share/vdsm/storage/misc.py", line 222, in readspeed
  252     raise se.MiscFileReadException(path)
  253 MiscFileReadException: Internal file read failure: (u'/rhev/data-center/mnt/nfs.sec.t-labs.tu-berlin.de:_srv_nfs_x4200/bf6a9e33-f659-4354-bc63-f38e96118133/dom_md/metadata',)
  254 Thread-67249::DEBUG::2015-02-11 10:01:44,277::task::595::Storage.TaskManager.Task::(_updateState) Task=`5695699f-2fa2-4a0e-9891-048ff2fcf70b`::moving from state init -> state preparing
  255 Thread-17::DEBUG::2015-02-11 10:01:44,279::domainMonitor::264::Storage.DomainMonitorThread::(_monitorDomain) Domain bf6a9e33-f659-4354-bc63-f38e96118133 changed its status to Invalid

Example Log on ovirt-engine:
1704 2015-02-11 10:01:30,678 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData] (org.ovirt.thread.pool-8-thread-4) domain bf6a9e33-f659-4354-bc63-f     38e96118133:nfs in problem. vds: virt2.sec.t-labs.tu-berlin.de


How reproducible:
I so far have no idea how to reproduce this. This only occurs on one specific host with supplied nfs storage. The error stems from the dd-output '517 bytes (517 B) copied, 0 s, Infinity B/s' being unparseable.

Additional info:
I patched vdsm/storage/misc.py in two places to counter this problem. Patch 1 adjusts the regular expression. Patch two returns a very small value instead of 0, if an infinity case occured to ensure the returned 0 does not cause problems somewhere down the road.

Comment 1 Tobias Fiebig 2015-02-11 13:45:21 UTC

Created attachment 990456 [details]
patch ensuring a value >0 is returned for runtime of throughput check

Comment 2 Nir Soffer 2015-02-16 10:43:21 UTC

Bumping severity, as this may cause a domain to become invalid and lead to migration storm while storage is in perfect condition.

Comment 3 Nir Soffer 2015-02-19 17:01:17 UTC

(In reply to Tobias Fiebig from comment #0)
Tobias, did you get one error in the log, or many such errors in a consistent way?

Comment 4 Tobias Fiebig 2015-02-19 23:35:26 UTC

What do you mean with many such errors? The issue consistently appeared over an extended period of time, repeating every 1-2min. Does this answer your question, or do you need a more specific timeline, how often this happened?

Comment 5 Nir Soffer 2015-02-20 14:48:02 UTC

(In reply to Tobias Fiebig from comment #4)
> What do you mean with many such errors? The issue consistently appeared over
> an extended period of time, repeating every 1-2min. Does this answer your
> question, or do you need a more specific timeline, how often this happened?

We need more info to understand this issue:

- vdsm logs showing multiple failures for extended period of time
- /var/log/messages from same time frame
- output of rpm -qa on the effected host
- Do you have ntp configured on this machine? How is it configured?
  Please attach also ntp related logs

Comment 6 Nir Soffer 2015-02-20 15:02:24 UTC

Ondrej, can you help us understand why dd is returning zero time in this case?

Looking at upstream source, it seems that dd cannot use monotonic time source so it is using system time, and system time is changing backwards during the read operation.

Comment 7 Tobias Fiebig 2015-02-20 15:29:37 UTC

Created attachment 993940 [details]
rpm -qa from affected host

Comment 8 Tobias Fiebig 2015-02-20 15:35:33 UTC

Created attachment 993944 [details]
current timing behab ntp

Comment 9 Tobias Fiebig 2015-02-20 15:36:26 UTC

Created attachment 993945 [details]
ntp kernel pll jan 2015

Comment 10 Tobias Fiebig 2015-02-20 15:37:05 UTC

Created attachment 993946 [details]
ntp jitter jan 2015

Comment 11 Nir Soffer 2015-02-20 15:47:36 UTC

(In reply to Tobias Fiebig from comment #10)
> Created attachment 993946 [details]
> ntp jitter jan 2015

What are the units in these ntp graphs - seconds? milliseconds? microseconds?

Comment 12 Tobias Fiebig 2015-02-20 15:59:19 UTC

Unit is time in seconds, the m's on the y-axis indicate milisecond range. Sorry for the confusion, just took what fell out of munin.

Comment 13 Tobias Fiebig 2015-02-20 16:00:25 UTC

Created attachment 993948 [details]
/var/log/messages jan 18 to jan 25

Comment 14 Tobias Fiebig 2015-02-20 16:27:33 UTC

Created attachment 993963 [details]
vdsm log jan 18-jan 25

ip's redacted, compressed

Comment 15 Tobias Fiebig 2015-02-20 16:28:32 UTC

Ok, details should be added. Please poke me, if this has been insufficient.

Comment 16 Nir Soffer 2015-02-21 11:28:52 UTC

Adding back need info for Ondrej, see comment 6.

Comment 17 Ondrej Vasik 2015-02-23 07:57:15 UTC

In the source code (src/dd.c:773 in F22 coreutils-8.23), you can see that if the "now" time is same or lower than "start_time", it prints "Infinity B/s" - and sets delta to 0. 

Is this what are you asking for or you need something else?

Comment 18 Nir Soffer 2015-02-23 08:58:57 UTC

(In reply to Ondrej Vasik from comment #17)
> In the source code (src/dd.c:773 in F22 coreutils-8.23), you can see that if
> the "now" time is same or lower than "start_time", it prints "Infinity B/s"
> - and sets delta to 0. 
> 
> Is this what are you asking for or you need something else?

It is clear that we get Infinity when elapsed time is zero or negative,
but it is unlikely that read operation over the network (NFS in this case)
will finish before it was started, or take less then nanosecond
(clock_get_time) or microsecond (gettimeofday).

I suspected that this was caused by dd using system time (because monotonic
time source was not available), and system time changed backwards
during a read operation. However we have 1000s of such logs, and it is 
unlikely that this will happen so frequently.

We can easily fix vdsm to parse dd output when it returns zero seconds
and infinity bytes per seconds, but we are worried that there is 
a bigger issue behind this.

There is also the issue of handling zero time in the upper layers of the
vdsm/ovirt, where it can break code expecting positive values for read
delay.

We need help understating why we get 1000's of read operations with taking
zero time.

This is the dd command we use:

/bin/dd iflag=direct if=/rhev/data-center/mnt/_dev_mapper_vg__virt2-lv__iso/06d8c1c4-3d0f-4f57-af00-b4a1efbf2b32/dom_md/metadata bs=4096 count=1

The input (if=) is NFS mount in this case.

This is an example of an unexpected output:

0+1 records in
0+1 records out
354 bytes (354 B) copied, 0 s, Infinity B/s

And this is expected output:

0+1 records in
0+1 records out
354 bytes (354 B) copied, 0.000999814 s, 354 kB/s

In the attached logs, we have two threads running the dd command above for two NFS mounts,
every 10 seconds, Thread-80 and Thread-16.

For Thread-80, we have 7969 runs, 2508 returning zero/Infinity.
For Thread-16, we have 7961 runs, 1171 returning zero/Infinity.

See attachments thread-80-results.log and thread-16-results.log

Here is typical runs from thread-80-results.log:

Thread-80::DEBUG::2015-01-18 23:35:44,612::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n354 bytes (354 B) copied, 0.000999811 s, 354 kB/s\n'; <rc> = 0
Thread-80::DEBUG::2015-01-18 23:35:54,749::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n354 bytes (354 B) copied, 0.00099981 s, 354 kB/s\n'; <rc> = 0
Thread-80::DEBUG::2015-01-18 23:36:04,792::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n354 bytes (354 B) copied, 0.00099981 s, 354 kB/s\n'; <rc> = 0
Thread-80::DEBUG::2015-01-18 23:36:14,836::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n354 bytes (354 B) copied, 0.00099981 s, 354 kB/s\n'; <rc> = 0
Thread-80::DEBUG::2015-01-18 23:36:24,879::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n354 bytes (354 B) copied, 0 s, Infinity B/s\n'; <rc> = 0
Thread-80::DEBUG::2015-01-18 23:36:34,953::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n354 bytes (354 B) copied, 0.000999811 s, 354 kB/s\n'; <rc> = 0
Thread-80::DEBUG::2015-01-18 23:36:44,996::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n354 bytes (354 B) copied, 0 s, Infinity B/s\n'; <rc> = 0
Thread-80::DEBUG::2015-01-18 23:36:55,086::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n354 bytes (354 B) copied, 0.00099981 s, 354 kB/s\n'; <rc> = 0
Thread-80::DEBUG::2015-01-18 23:37:05,129::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n354 bytes (354 B) copied, 0 s, Infinity B/s\n'; <rc> = 0
Thread-80::DEBUG::2015-01-18 23:37:15,176::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n354 bytes (354 B) copied, 0.00099981 s, 354 kB/s\n'; <rc> = 0
Thread-80::DEBUG::2015-01-18 23:37:25,220::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n354 bytes (354 B) copied, 0.00099981 s, 354 kB/s\n'; <rc> = 0
Thread-80::DEBUG::2015-01-18 23:37:35,309::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n354 bytes (354 B) copied, 0 s, Infinity B/s\n'; <rc> = 0
Thread-80::DEBUG::2015-01-18 23:37:45,377::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n354 bytes (354 B) copied, 0.000999811 s, 354 kB/s\n'; <rc> = 0
Thread-80::DEBUG::2015-01-18 23:37:55,437::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n354 bytes (354 B) copied, 0.00099

Comment 19 Nir Soffer 2015-02-23 09:00:32 UTC

Created attachment 994316 [details]
dd results from thread-16

Comment 20 Nir Soffer 2015-02-23 09:01:09 UTC

Created attachment 994317 [details]
dd results from thread-80

Comment 21 Tobias Fiebig 2015-02-23 09:20:05 UTC

Honestly, brown system time is a rather good explaination. I do not really see how there could be another underlying issue then?

Zeros in upper layers could be approached as in attachment two, though.

Comment 22 Nir Soffer 2015-02-23 09:26:45 UTC

(In reply to Tobias Fiebig from comment #21)
> Honestly, brown system time is a rather good explaination. I do not really
> see how there could be another underlying issue then?

System time should not behave like that. Most code assume that system 
time is going forward 99.9999% of the time. If we a system issue here
we should open a bug for the component responsible for this.

> Zeros in upper layers could be approached as in attachment two, though.

Sure, we are not very happy with this approach. I would like to pass zero value as is and let the upper layer handle it if possible.

Comment 23 Ondrej Vasik 2015-02-23 11:44:39 UTC

dd uses gethrxtime() gnulib function, it tries to use monotonic clocks where available, but if not, there is fallback to simple gettime() which may jump backwards (and it is even in the comments) - as it uses the system functions mentioned in comment #18.
Using strace would probably reveal that the system clock really went backwards. I don't think I will be able to identify the culprit - it will likely be somewhere in kernel system time module. Adding P.Brady - who is upstream maintainer of coreutils - to see if he has some other ideas what might be a culprit.

Comment 24 Tobias Fiebig 2015-02-23 12:15:12 UTC

Well, a flaky time-source on the system could also be an option. Looking at those ntp-graphs, i think that this may be a reasonable explanation with ntp constantly correcting (sometimes backwards).

Considering the upper-layer thing: If you want i can disable that part of my patch on that host, and see how this influences the system.

Comment 25 Nir Soffer 2015-02-23 14:48:47 UTC

Created attachment 994398 [details]
System time debugging tool

This is a simple tool that log both monotonic time and system time every second. Running it for an hour or so on the host experiencing this issue
will reveal if the problem is system time going backwards.

Comment 26 Nir Soffer 2015-02-23 15:27:21 UTC

(In reply to Tobias Fiebig from comment #24)
> Well, a flaky time-source on the system could also be an option. Looking at
> those ntp-graphs, i think that this may be a reasonable explanation with ntp
> constantly correcting (sometimes backwards).

It will be interesting to try the tool in attachment 994398 [details]. Please
run it on the effected system for an hour or so and share the output
of this tool.

> 
> Considering the upper-layer thing: If you want i can disable that part of my
> patch on that host, and see how this influences the system.

It can be useful.

Comment 27 Nir Soffer 2015-02-23 23:35:23 UTC

Using strace on rhel 6.6, we see that dd is using CLOCK_MONOTONIC:

$ strace dd if=/dev/zero of=/dev/null bs=4096 count=1
...
write(2, "4096 bytes (4.1 kB) copied", 264096 bytes (4.1 kB) copied) = 26
clock_gettime(CLOCK_MONOTONIC, {1859546, 419736614}) = 0
write(2, ", 0.000364096 s, 11.2 MB/s\n", 27, 0.000364096 s, 11.2 MB/s
) = 27
...

Tobias, can try this on your host and attach the output?

Comment 28 Tobias Fiebig 2015-02-24 10:41:31 UTC

Created attachment 994645 [details]
timelog from affected machine over ~2h

time log of the affected machine

Comment 29 Tobias Fiebig 2015-02-24 10:42:25 UTC

Created attachment 994646 [details]
time_log analyzer

Small .py that parses all lines and throws an error if time went backwards.

Comment 30 Tobias Fiebig 2015-02-24 10:44:07 UTC

Created attachment 994647 [details]
strace-log for dd

Comment 31 Tobias Fiebig 2015-02-24 10:44:56 UTC

dd behaves as expected on the affected system:

write(2, "4096 bytes (4.1 kB) copied", 264096 bytes (4.1 kB) copied) = 26
clock_gettime(CLOCK_MONOTONIC, {1634838, 381835004}) = 0
write(2, ", 0.00199961 s, 2.0 MB/s\n", 25, 0.00199961 s, 2.0 MB/s
) = 25

Comment 32 Nir Soffer 2015-02-24 12:15:15 UTC

(In reply to Tobias Fiebig from comment #31)
> dd behaves as expected on the affected system

Checking the time log show no issue, montonic time and real time seems to advance
consistently.

While the time log was collected, did you have any "Infinity" in getReadDelay
in vdsm log?

Comment 33 Nir Soffer 2015-02-24 13:15:36 UTC

Created attachment 994683 [details]
Tool for measuring the practical resolution of clock_gettime()

Comment 34 Nir Soffer 2015-02-24 13:21:23 UTC

Another possibility - while clock_gettime returns value in nanoseconds
its resolution is lower.

For example on this machine (8 cores):

$ cat /proc/cpuinfo | grep 'model name' | head -n 1
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

I get practical resolution of about 1150 nanoseconds.

1911247.946452809 (0.000001149)
1911247.946453938 (0.000001129)
1911247.946455084 (0.000001146)
1911247.946456229 (0.000001145)
1911247.946457374 (0.000001145)
1911247.946458514 (0.000001140)
1911247.946459662 (0.000001148)
1911247.946460866 (0.000001204)

Please build and run the attached timediff tool on your host to see what
is the practical resolution on your host.

If your storage is very fast or there is very successful caching
involved, the the operation takes less then the practical resolution
of the monotonic time source, we will get zero time.

Comment 35 Tobias Fiebig 2015-02-24 13:32:59 UTC

Created attachment 994701 [details]
timediff log

Comment 36 Tobias Fiebig 2015-02-24 13:34:25 UTC

tfiebig@smith Downloads % cat ./timediff_log|awk '{print $2}'|sort|uniq -c
    177 (0.000000001)
    175 (0.000999812)
   9462 (0.000999813)
    183 (0.000999814)
      1 (0.001999626)
      1 (0.010997944)
      1 (0.011997756)

Based on this i guess your assumption is correct.

Comment 37 Nir Soffer 2015-02-24 14:02:26 UTC

(In reply to Tobias Fiebig from comment #36)
> tfiebig@smith Downloads % cat ./timediff_log|awk '{print $2}'|sort|uniq -c
>     177 (0.000000001)
>     175 (0.000999812)
>    9462 (0.000999813)
>     183 (0.000999814)
>       1 (0.001999626)
>       1 (0.010997944)
>       1 (0.011997756)
> 
> Based on this i guess your assumption is correct.

So this is not very fast storage, but eratic clock_gettime().

We did not measure how many times we get the same value from clock_gettime,
this can be useful to detect a delay caused by scheduling. 

So I think we should continue in two directions:
1. coreutils developers can consider a more useful return value when
   start_time == now.
2. You can open a kernel bug to understand if this behavior is expected
   Please cc me if you open such bug.

On the vdsm side, I think the attached patch (http://gerrit.ovirt.org/38082)
is correct.

Thanks for your help Tobias!

Comment 38 Ori Gofen 2015-05-20 09:55:51 UTC

This bug reproduced on oVirt 3.6 master, are we sure the fix have been merged?

Comment 39 Ala Hino 2015-05-20 11:02:15 UTC

(In reply to Ori Gofen from comment #38)
> This bug reproduced on oVirt 3.6 master, are we sure the fix have been
> merged?

Yes, the fix is merged.

Can you please describe the steps you used to reproduce?

Comment 40 Allon Mureinik 2015-05-20 12:00:27 UTC

(In reply to Ori Gofen from comment #38)
> This bug reproduced on oVirt 3.6 master, are we sure the fix have been
> merged?
What commit hash?

Comment 41 Pádraig Brady 2015-05-20 12:05:57 UTC

Just a point on the coreutils "Infinity" output.
That's a valid floating point value as can be seen with `sleep Infinity`.
I'm not sure what coreutils should do differently here.

Comment 42 Ala Hino 2015-05-20 12:10:18 UTC

(In reply to Allon Mureinik from comment #40)
> (In reply to Ori Gofen from comment #38)
> > This bug reproduced on oVirt 3.6 master, are we sure the fix have been
> > merged?
> What commit hash?

4427cee24bf1d9b02b2e423d3873cd30ac10e0b7

Comment 43 Nir Soffer 2015-05-20 14:10:30 UTC

Ori, can you paste the exception you get in the log?

Comment 44 Ori Gofen 2015-05-31 08:54:15 UTC

Created attachment 1032780 [details]
logs

Steps taken to verify:

1.have created a fake dd with a static output: "517 bytes (517 B) copied, 0 s, Infinity B/s"
2.viewed the vdsm log

Comment 45 Ori Gofen 2015-06-29 12:03:07 UTC

Created attachment 1044333 [details]
logs

reproduced again on 
vdsm-cli-4.17.0-1054.git562e711.el7.noarch
vdsm-jsonrpc-4.17.0-1054.git562e711.el7.noarch
vdsm-python-4.17.0-1054.git562e711.el7.noarch
vdsm-4.17.0-1054.git562e711.el7.noarch
vdsm-infra-4.17.0-1054.git562e711.el7.noarch
vdsm-yajsonrpc-4.17.0-1054.git562e711.el7.noarch
vdsm-xmlrpc-4.17.0-1054.git562e711.el7.noarch

Nir, this should probably be moved to Assigned

Comment 46 Nir Soffer 2015-06-29 12:36:18 UTC

(In reply to Ori Gofen from comment #45)
> Created attachment 1044333 [details]
> logs
> 
> reproduced again on 

Please describe how did you reproduce the issue - what is the action you took, what was the expected behavior, what is the actual behavior.

Comment 47 Ori Gofen 2015-06-30 12:35:14 UTC

Verified on 3.6.0.3

getReadDelay is handling 'infinity'

Comment 49 errata-xmlrpc 2016-03-09 19:31:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html

Note You need to log in before you can comment on or make changes to this bug.