Bug 690675

Summary: [REG][5.4] Wake up of futex system call is long-delayed on VMware guest.
Product: Red Hat Enterprise Linux 5 Reporter: Moritoshi Oshiro <moshiro>
Component: kernelAssignee: Chris Lalancette <clalance>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 5.8CC: akataria, dhecht, drjones, garrett, jsavanyo
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-04-06 06:25:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Comment 4 Andrew Jones 2011-03-31 08:17:35 UTC
OK, at first I misunderstood this bug. It's not a request to integrate some patches into z-stream, but a statement that the previously integrated patches don't completely fix the issue. I believe the patches in question are

33a83f1 [x86] vmware: disable softlock processing on tsc systems
6ea73e0 [x86] vmware lazy timer emulation

The relevant new information from the customer is copy+pasted below.

We found that the problem is _not_ fixed completely in both this 5.4 errata and
5.6. The following table shows the delay time of futex wake-up from specified
timeout, in millisecond. The delay is seen on VMware guests only, never seen on
native systems.

        specified   |         |         |               |
      timeout value | RHEL5.3 | RHEL5.4 | RHEL5.4 + fix | RHEL5.6
   --------------------------------------------------------------
      60000 ( 1 min)|     26  |   1738  |       11      |     10
     120000 ( 2 min)|     47  |   3458  |       19      |     19
    1800000 (30 min)|    698  |  51782  |      276      |    275
    3600000 ( 1 hr) |   1396  | 103285  |      550      |    550
   21600000 ( 6 hr) |   8362  | 619190  |     3289      |   3289
   --------------------------------------------------------------
                                                           (msec)

I'll assign this to Chris for now, as he originally did this work.

Comment 7 Andrew Jones 2011-04-01 08:56:50 UTC
Hi Alok,

Can you take a look at comment 4 of this bug.

Thanks,
Drew

Comment 8 Alok Kataria 2011-04-04 18:38:03 UTC
Hi Andrew, 

The two patches that you list in comment 4, don't fix this delayed wakeup problem but the ones which fixed PR 538022 do. 

Are we sure that the customer is running with patches for that PR ?
Can you please share the test program that the customer used, so that we can try to reproduce it in house ?
What ESX version has the customer seen this problem on ?
Is this problem specific to 32bit kernels ? Have they tried on 64bit kernels and not seen any delays ?

Comment 9 Andrew Jones 2011-04-05 07:46:49 UTC
(In reply to comment #8)
> The two patches that you list in comment 4, don't fix this delayed wakeup
> problem but the ones which fixed PR 538022 do. 
> 
> Are we sure that the customer is running with patches for that PR ?

The patch isn't in 5.3, but it is in 5.4.z and 5.6.

Moritoshi,

please help Alok with his other questions in comment 8.

Comment 12 Andrew Jones 2011-04-06 06:25:16 UTC
Hi Alok,

Moritoshi got new information from customer that these delays are now showing up on some native installs as well. So the problem no longer looks VMware-specific. They've closed this case and are investigating the full set of conditions in order to report another one. I'm sorry for dragging you in and any trouble.

Drew

Closing as insufficient data, since there's a real problem, but the customer hasn't worked out all the details in order for us to look at it yet.