Bug 753796

Summary: xenconsoled is unprotected against time skew
Product: Red Hat Enterprise Linux 5 Reporter: Laszlo Ersek <lersek>
Component: xenAssignee: Laszlo Ersek <lersek>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 5.8CC: drjones, imammedo, leiwang, mrezanin, pbonzini, qguan, qwan, wshi, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: xen-3.0.3-139.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 818049 (view as bug list) Environment:
Last Closed: 2013-01-08 04:07:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 807971, 818049    
Attachments:
Description Flags
xenconsoled: replace gettimeofday with clock_gettime none

Description Laszlo Ersek 2011-11-14 14:18:44 UTC
From: http://lists.xensource.com/archives/html/xen-devel/2011-11/msg00787.html

On 11/14/11 11:15, Laszlo Ersek wrote:

>> On Fri, Nov 11, 2011 at 05:25:52PM +0000, Niall Fleming wrote:
> 
>>> Booted the physical machine, date tells me it's 17:15:45. Hwclock 
>>> agrees.
>>> Booted a VM (using xl as xm seems to be labouring under the
>>> impression that blockdev is missing - it isn't)
>>> The login prompt displays the time as 17:17, which is the expected
>>> behaviour.
>>> Changed the time in dom0 - (date +%T -s 12:00:00), synced to hwclock.
>>> Check that date and hwclock match - they do.
>>> Destroy the VM and recreate.
>>> The login prompt displays the time as 17:22. Unexpected!
> 
> I shut the VM down (2.6.32-131.0.15.el6) before the systime/hwclock 
> change in dom0 (2.6.18-286.el5xen), and restarted it afterwards. The 
> guest entered an infinite loop during this boot :(

In my test, I moved the clock from 11:28 to 11:05 (ie. backwards) too -- that may have an effect on some boot calibration loop in the guest. The xen-devel thread seems to imply that the hypervisor was not handling dom0 clock changes for a long time.

Comment 1 Laszlo Ersek 2011-11-14 14:21:25 UTC
(In reply to comment #0)
> The xen-devel
> thread seems to imply that the hypervisor was not handling dom0 clock changes
> for a long time.

Disregard that, sorry. I misunderstood <http://lists.xensource.com/archives/html/xen-devel/2011-11/msg00848.html>. "3.1" in this case is the kernel's version number, not the hypervisor's. So it should be the responsibility of the dom0 kernel to notify the hv.

Comment 3 Laszlo Ersek 2011-11-14 14:33:36 UTC
XENPF_settime seems to be supported by our hypervisor version, but our kernel version never calls it. I'm not sure we want to add a new feature ("new" in the RHEL-5 coordinate system).

Comment 4 Laszlo Ersek 2011-11-14 14:39:16 UTC
The RHEL-5 kernel doesn't have the timer scaffolding on which the above pvops commit rests. IOW I'm not sure *where* in the dom0 kernel we should issue the hypercall.

Comment 5 Paolo Bonzini 2011-11-14 15:16:45 UTC
It may not be the closest match, but I would just add an #ifdef CONFIG_XEN in do_settimeofday and see if it works...

Comment 6 Laszlo Ersek 2011-11-14 17:20:47 UTC
(In reply to comment #5)
> It may not be the closest match, but I would just add an #ifdef CONFIG_XEN in
> do_settimeofday and see if it works...

Thanks for the hint!

do_settimeofday() in "arch/i386/kernel/time-xen.c" does call DOM0_SETTIME in dom0 (if the "independent_wallclock" kernel parameter is not set). DOM0_SETTIME is #defined as XENPF_settime in "include/xen/interface/dom0_ops.h". I missed it when I grepped for XENPF_settime.

We already have linux-2.6.18-xen.hg changeset 832:c0f2f398aa3c (commit 17192205 / bug 494879). A possibly relevant changesets for "time-xen.c", newer than 832, seems to be <http://xenbits.xensource.com/linux-2.6.18-xen.hg/rev/1021>.

The last hunk causes the generic warp_clock() to call, in the end, monotonic_reset(), added by the preceding hunks.

I should also take a look at the guest's busy loop.

Comment 9 Laszlo Ersek 2011-11-16 11:53:08 UTC
It seems that the busy-loop in the guest is temporary. I think it's related to the amount that I set the clock back with.

Comment 10 Laszlo Ersek 2011-11-16 14:05:55 UTC
The "hwclock --systohc" host step doesn't appear to be necessary to cause the guest to spin.

Comment 11 Laszlo Ersek 2011-11-18 16:49:44 UTC
Final messages before spin:

(early) DMI not present or invalid.
(early) e820 update range: 0000000000000000 - 0000000000001000 (early) (usable)(early)  ==> (early) (reserved)(early) 
(early) e820 remove range: 00000000000a0000 - 0000000000100000 (early) (usable)(early) 

This is the stack during the spin:

xen_start_kernel()
-> x86_64_start_reservations()
  -> start_kernel()
    -> setup_arch()
      -> prefill_possible_map()
        -> printk(); message is "SMP: Allowing %d CPUs, %d hotplug CPUs\n"
          -> vprintk()
            -> release_console_sem()
              -> call_console_drivers()
                -> _call_console_drivers()
                  -> __call_console_drivers()
                    -> xenboot_write_console()
                      -> raw_console_write()
                        -> HYPERVISOR_console_io(CONSOLEIO_write, ...
                      -> write_console()
                        -> HYPERVISOR_sched_op(SCHEDOP_yield, NULL);

write_console() goes like this:

        /*
         * Make sure the whole buffer is emitted, polling if
         * necessary.  We don't ever want to rely on the hvc daemon
         * because the most interesting console output is when the
         * kernel is crippled.
         */
        while (len) {
                int sent = __write_console(data, len);
                
                data += sent;
                len -= sent;

                if (unlikely(len))
                        HYPERVISOR_sched_op(SCHEDOP_yield, NULL);
        }

Comment 12 Laszlo Ersek 2011-11-25 15:21:37 UTC
It reproduces on a RHEL-5 (-299), PV domU as well. I tried once with 4 VCPUs, and twice with 1 VCPU. Stacks:

4 VCPU case:

PID: 1      TASK: ffff8800012467a0  CPU: 3   COMMAND: "swapper"
 #0 [ffff880002501a80] schedule at ffffffff80261d89
 #1 [ffff880002501a88] thread_return at ffffffff80261e5c
 #2 [ffff880002501b08] __wake_up_common at ffffffff8028828c
 #3 [ffff880002501b58] _spin_lock_irq at ffffffff8026393d
 #4 [ffff880002501b68] wait_for_completion at ffffffff80261ffb
 #5 [ffff880002501bb8] call_usermodehelper_keys at ffffffff8029a2a6
 #6 [ffff880002501cb8] kcons_write at ffffffff803bc220
 #7 [ffff880002501ce8] __call_console_drivers at ffffffff8028dfc1
 #8 [ffff880002501d08] release_console_sem at ffffffff80217b39
 #9 [ffff880002501d38] vprintk at ffffffff8028e7c1
#10 [ffff880002501db8] printk at ffffffff8028e886
#11 [ffff880002501ea8] init_net_drop_monitor at ffffffff8077c553
#12 [ffff880002501ec8] init at ffffffff8075e7ed
#13 [ffff880002501f48] kernel_thread at ffffffff8025fb2c

1 VCPU cases:

PID: 227    TASK: ffff88003f1fb080  CPU: 0   COMMAND: "insmod"
 #0 [ffff88003f5a7ce0] xencons_ring_send at ffffffff803bc92c
 #1 [ffff88003f5a7d30] __xencons_tx_flush at ffffffff803bbfc8
 #2 [ffff88003f5a7d50] kcons_write at ffffffff803bc220
 #3 [ffff88003f5a7d80] __call_console_drivers at ffffffff8028dfc1
 #4 [ffff88003f5a7da0] release_console_sem at ffffffff80217b39
 #5 [ffff88003f5a7dd0] vprintk at ffffffff8028e7c1
 #6 [ffff88003f5a7e50] printk at ffffffff8028e886
 #7 [ffff88003f5a7f50] sys_init_module at ffffffff802a43ee
 #8 [ffff88003f5a7f80] system_call at ffffffff8025f106
    RIP: 00000000004105fa  RSP: 00007fff210bc288  RFLAGS: 00000202
    RAX: 00000000000000af  RBX: ffffffff8025f106  RCX: ffffffff8025f080
    RDX: 0000000002c015c0  RSI: 0000000000010168  RDI: 0000000002c015e0
    RBP: 0000000000010168   R8: 0000000000020010   R9: 0000000000000003
    R10: 000000000040f180  R11: 0000000000000202  R12: 00007fff210bcf9d
    R13: 0000000000000003  R14: 0000000002c015e0  R15: ffff88003f231780
    ORIG_RAX: 00000000000000af  CS: e033  SS: e02b

PID: 9      TASK: ffff88003fb427a0  CPU: 0   COMMAND: "xenwatch"
 #0 [ffff88003fb55b88] xencons_ring_send at ffffffff803bc92c
 #1 [ffff88003fb55bd8] __xencons_tx_flush at ffffffff803bbfc8
 #2 [ffff88003fb55bf8] kcons_write at ffffffff803bc220
 #3 [ffff88003fb55c28] __call_console_drivers at ffffffff8028dfc1
 #4 [ffff88003fb55c48] release_console_sem at ffffffff80217b39
 #5 [ffff88003fb55c78] vprintk at ffffffff8028e7c1
 #6 [ffff88003fb55cf8] printk at ffffffff8028e886
 #7 [ffff88003fb55de8] xlvbd_add at ffffffff8807630b
 #8 [ffff88003fb55e48] backend_changed at ffffffff88075efa
 #9 [ffff88003fb55e98] xenwatch_handle_callback at ffffffff803be418
#10 [ffff88003fb55ea8] xenwatch_thread at ffffffff803bf10d
#11 [ffff88003fb55ee8] kthread at ffffffff802338c3
#12 [ffff88003fb55f48] kernel_thread at ffffffff8025fb2c

The common part in all three cases:

kcons_write            at ffffffff803bc220 drivers/xen/console/console.c:159
__call_console_drivers at ffffffff8028dfc1 kernel/printk.c:389
release_console_sem    at ffffffff80217b39 kernel/printk.c:906
vprintk                at ffffffff8028e7c1 kernel/printk.c:430
                                           kernel/printk.c:906
printk                 at ffffffff8028e886 kernel/printk.c:561

Common between the 1 VCPU cases:

__xencons_ring_send at ffffffff803bc92c include/asm/mach-xen/asm/hypercall.h:264
                                        include/xen/evtchn.h:112
                                        drivers/xen/console/xencons_ring.c:61
                                        drivers/xen/console/xencons_ring.c:81
__xencons_tx_flush  at ffffffff803bbfc8 drivers/xen/console/console.c:394

Comment 13 Laszlo Ersek 2011-12-01 13:42:09 UTC
Writing to the console in a PV domU actually kicks the "xenconsoled" daemon process in dom0. Changing the wall clock time in dom0 could definitely affect a process in dom0. It appears that flow control from the domU to xenconsoled is implemented with busy waits. Assuming that xenconsoled hangs temporarily, that could explain the domU spinning.

Comment 14 Laszlo Ersek 2011-12-01 13:54:20 UTC
handle_io() in tools/console/daemon/io.c employs rate limiting based on gettimeofday().

          /* Check if over allowance, and refuse to process events
           * from this guest for next time period
           */

  /* If any domain has been rate limited, we need to work
     out what timeout to supply to select */
  if (next_timeout) {
          long long duration = (next_timeout - now);
          tv.tv_sec = duration / 1000;
          tv.tv_usec = (duration - (tv.tv_sec * 1000)) * 1000;
  } 

  ret = select(max_fd + 1, &readfds, &writefds, 0, next_timeout ? &tv : NULL);

I'll reproduce the spinning and look into the xenconsoled process. It should be blocked in the select() call, after time moves backwards and the domain appears rate limited.

Comment 16 Laszlo Ersek 2011-12-01 14:13:12 UTC
Confirmed, I straced xenconsoled:

select(7, [6], [], NULL, NULL)          = 1 (in [6])
select(7, [6], [], NULL, NULL)          = 1 (in [6])
select(7, [6], [], NULL, NULL)          = 1 (in [6])
select(7, [6], [], NULL, NULL)          = 1 (in [6])
select(10, [6 8 9], [], NULL, NULL)     = 2 (in [6 8])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [6])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [6])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [6])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [8])
select(10, [6 8 9], [9], NULL, NULL)    = 2 (in [8], out [9])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [8])
select(10, [6 8 9], [9], NULL, NULL)    = 1 (out [9])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [8])
select(10, [6 8 9], [9], NULL, NULL)    = 2 (in [8], out [9])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [8])
select(10, [6 8 9], [9], NULL, NULL)    = 1 (out [9])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [8])
select(10, [6 8 9], [9], NULL, NULL)    = 1 (out [9])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [8])
select(10, [6 8 9], [9], NULL, NULL)    = 2 (in [8], out [9])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [8])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [8])
select(10, [6 8 9], [9], NULL, NULL)    = 1 (out [9])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [8])
select(10, [6 8 9], [9], NULL, NULL)    = 1 (out [9])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [8])
select(10, [6 8 9], [9], NULL, NULL)    = 2 (in [8], out [9])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [8])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [8])
select(10, [6 8 9], [9], NULL, NULL)    = 2 (in [8], out [9])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [8])
select(10, [6 8 9], [9], NULL, NULL)    = 2 (in [8], out [9])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [8])
select(10, [6 8 9], [9], NULL, NULL)    = 2 (in [8], out [9])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [8])
select(10, [6 8 9], [9], NULL, NULL)    = 2 (in [8], out [9])
select(10, [6 8 9], [], NULL, NULL)     = 1 (in [8])
select(10, [6 9], [], NULL, {723, 639000}

The last arg is "timeout".

Comment 17 Laszlo Ersek 2011-12-01 15:22:59 UTC
Created attachment 539243 [details]
xenconsoled: replace gettimeofday with clock_gettime

backport of 18168:6cd615adaac7 (comment 15)

Comment 19 RHEL Program Management 2012-04-02 10:18:31 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 21 Miroslav Rezanina 2012-04-16 13:17:49 UTC
Fix built into xen-3.0.3-139.el5

Comment 24 Laszlo Ersek 2012-09-17 12:03:56 UTC
Steps to reproduce:
- (kernel-xen version is not important, either in dom0 or domU; xen userspace
  version in dom0 is important)
- boot dom0,
- set the dom0 system time back 20 minutes (ie. into the past) using the "date"
  utility
- start a new PV guest

Comment 25 Laszlo Ersek 2012-09-17 12:04:42 UTC
- guest spins with unpatched xen package, progresses normally otherwise

Comment 26 Wei Shi 2012-09-18 10:48:23 UTC
(In reply to comment #25)
> - guest spins with unpatched xen package, progresses normally otherwise

Thanks, follow your steps, i reproduced with xen-3.0.3-132.el5 on dom0, rhel5.9 pv domU

Comment 27 Wei Shi 2012-09-19 02:15:21 UTC
(In reply to comment #24)
> Steps to reproduce:
> - (kernel-xen version is not important, either in dom0 or domU; xen userspace
>   version in dom0 is important)
> - boot dom0,
> - set the dom0 system time back 20 minutes (ie. into the past) using the
> "date"
>   utility
> - start a new PV guest

Verified on xen-3.0.3-141.el5

Note: if you try to verify this problem by update xen rpm package, a restart of xend service is not enough, i verified by reboot dom0

Comment 29 errata-xmlrpc 2013-01-08 04:07:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0119.html