Bug 566350

Summary: Xext/sync.c IDLETIME counter sometimes fails to fire when wrapped around
Product: [Fedora] Fedora Reporter: Tim Taiwanese Liim <tim.liim>
Component: xorg-x11-serverAssignee: Adam Jackson <ajax>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 12CC: james, xgl-maint
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-08-20 16:11:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tim Taiwanese Liim 2010-02-18 03:59:13 UTC
Description of problem:

    gnome-power-manager ("g-p-m") and gnome-screensaver ("g-ss") (via
    gnome-session, ("g-s")) rely on IDLETIME counter in Xext/sync.c
    for idle notification.  Sometimes (after idle counter wraps
    around) IDLETIME fails to send idle notification to g-s, in which
    case g-ss will no longer lock the screen, a security flaw.

Version-Release number of selected component (if applicable):
    F11: xorg-x11-server-Xorg-1.6.4-0.1.fc11.i586
    F12: xorg-x11-server-Xorg-1.7.4-6.fc12.i686

How reproducible:
    Elusive on F12; less elusive on F11.  I have the gdb output in case 
    you need it.

    F11 + i386: a few times a day.
    F12 + i386: once a day or so.
    F12 + x64:  once in a few days. (I don't know why x64 makes any difference.)

Steps to Reproduce:
    1. change idle timeout to 1 minute, so it's easier to reproduce the issue.
       gnome-screensaver-preferences
            "Regard the computer as idle after"      1 minute
       gnome-power-preferences
            "Put display to sleep when inactive for" 1 minute
    2. wait 1 minute, till screen starts to fade (assume your screen supports
       gamma fading; my physical monitors all do, but my qemu vm does not.)
    3. hit a key to stop the fading.
    4. repeat step2,3 until screen no longer fades.  This may take 
              10-20 tries (on F11+i386), or 
              >100  tries (on F12+i386), or
              >?? (I lost count) tries (F12+x64)
  
Actual results:
    After many tries, screen no longer locks.

Expected results:
    Screen should always lock after idle timeout.

Additional info:

 1. In the above test case (1 minute timeout), g-p-m registers two
    alarms with IDLETIME, one 30 sec positive transition, another 30
    sec negative transition (30 sec in F11; 10 sec in F12).  g-s
    register two alarms: 60 sec pos and neg transition.  After many
    cycles of fading/ interruption, g-p-m doubles its pos trans alarm
    to 60 sec, then 120 sec (in F11; in F12 it's 10->20->40->80 sec).
    When g-p-m's alarm exceeds 60 sec (those of g-s), after counter
    wraparound, IDLETIME no longer sends idle notice to g-s, thus the
    no lock issue.

 2. IDLETIME uses two thresholds: pIdleTimeValueGreater (">=") and
    pIdleTimeValueLess ("<=").  

    Here is a good cycle:
                  idle     >=     <=   alarm_id trans client  wait_value
        00:56:52  60007      0  30006     c00299    +   g-s      30000
        --------------------------------------------------
        00:56:58      0  30000      0    280062e    -   g-pm     30000
        00:57:29  30005  30000      0    2800003    +   g-pm     30000
        00:57:30  30147  60000  30004          0    x   x        30000
        00:58:10  60006      0  30004     c0029b    +   g-s      30000
        --------------------------------------------------
        00:58:15      0  30000      0    280062f    -   g-pm     30000
        00:58:47  30003  30000      0    2800003    +   g-pm     30000
        00:58:47  30154  60000  30002          0    x   x        30000
        00:59:17  60003      0  60000     c0029d    +   g-s      30000
    When the IDLETIME counter wraps around, the value of "<=" is nil'ed
    out, and ">=" becomes 30 sec (correct).

    Here is a transition into bad cycle:
                  idle     >=     <=   alarm_id trans client  wait_value
        00:59:59      0  60000      0    2800631    -   g-pm     60000
        01:01:09  60005      0  60004    2800003    +   g-pm     60000
        --------------------------------------------------
        01:01:16      0  60000      0     c0029e    -   g-s     120000
        01:02:29  60004  60000      0     c0029d    +   g-s     120000
        01:03:31  60007      0  60000     c0029f    +   g-s     120000
        --------------------------------------------------
        01:03:38      1      0  60000     c002a0    -   g-s     120000
        01:03:38     51      0  60000          0    x   x       120000
        01:03:38    452      0  60000          0    x   x       120000
        01:03:38     20      0  60000          0    x   x       120000
        01:03:39    450      0  60000          0    x   x       120000
        01:03:39      1      0  60000          0    x   x       120000
        ...
    When IDLETIME wraps around, both ">=" and "<=" are *not* updated (bad),
    so no more idle notice to g-s.

 3. In SyncComputeBracketValues(), we have
        else if (pTrigger->test_type == XSyncNegativeTransition &&
                   ct != XSyncCounterNeverIncreases)
        {
       /*3*/if (XSyncValueGreaterThan(pCounter->value, pTrigger->test_value) &&
                XSyncValueGreaterThan(pTrigger->test_value, psci->bracket_less))
            {
                psci->bracket_less = pTrigger->test_value;
                pnewltval = &psci->bracket_less;
            }
        }
        else if (pTrigger->test_type == XSyncPositiveTransition &&
                  ct != XSyncCounterNeverDecreases)
        {
       /*4*/if (XSyncValueLessThan(pCounter->value, pTrigger->test_value) &&
                XSyncValueLessThan(pTrigger->test_value, psci->bracket_greater))
            {
                psci->bracket_greater = pTrigger->test_value;
                pnewgtval = &psci->bracket_greater;
            }
        }

    In good case, when counter wraps around, we have g-p-m sending alarm change
    requests, which results in SyncComputeBracketValues(startOver=***1***)
       /*3*/if (XSyncValueGreaterThan(pCounter->value, pTrigger->test_value) &&
                                                0 > 30000                (FALSE)
                XSyncValueGreaterThan(pTrigger->test_value, psci->bracket_less))
                                                30000 > (XSyncMinValue)  (TRUE)
       /*4*/if (XSyncValueLessThan(pCounter->value, pTrigger->test_value) &&
                                                0 < 60000                (TRUE)
                XSyncValueLessThan(pTrigger->test_value, psci->bracket_greater))
                                                60000 < (XSyncMaxValue)  (TRUE)
    so IdleTimeBracketValues() is invoked to update
    pIdleTimeValueGreater and pIdleTimeValueLess.

    In bad case, when counter wraps around, g-p-m no longer sends
    alarm change request, so we have SyncComputeBracketValues(startOver=***0***)
       /*3*/if (XSyncValueGreaterThan(pCounter->value, pTrigger->test_value) &&
                                                    0 > 30000           (FALSE)
                XSyncValueGreaterThan(pTrigger->test_value, psci->bracket_less))
                                                    30000 > 60000       (FALSE)
       /*4*/if (XSyncValueLessThan(pCounter->value, pTrigger->test_value) &&
                                                    0 < 60000           (TRUE)
                XSyncValueLessThan(pTrigger->test_value, psci->bracket_greater))
                                                    60000 < 60000       (FALSE)
    so IdleTimeBracketValues() is *not* invoked, so the value of ">="
    "<=" are stuck, and no more idle notice sent.

    In short, in the good cases the wraparound bug in IDLETIME counter is 
    covered up by the alarm change request from g-p-m.


 4. I tried a partial fix: when counter wraps around, 
    SyncComputeBracketValues() should start over.  That is, 
        SyncChangeCounter(pCounter, newval)
        {
            int startOver = 0;
            if (XSyncValueLessThan(newval, oldval)) { 
                /* counter wraps around */
                startOver = 1;
            }
            SyncComputeBracketValues(pCounter, startOver);
        }
    This works well in the cases I tried; it is not the whole
    solution, though, because now I start to notice this pattern:

                  idle     >=     <=   alarm_id trans client  wait_value
        19:01:51      0  20000      0     c00014    -   g-s      20000
        19:02:16  20000  20000      0    2600021    +   g-pm     20000
        19:02:17  20989  20000      0          0    x   x        20000
        19:02:18  21827  20000      0          0    x   x        20000
        19:02:19  22658  20000      0          0    x   x        20000
        ...
    That is, we are stuck for a while with ">=" less than IDLETIME, thus
    SyncChangeCounter keeps on firing, but (*ptl->pTrigger->CheckTrigger)
    finds no trigger to fire.  Maybe we can do
        SyncChangeCounter(pCounter, newval)
        {
            int startOver = 1;
            for (ptl = ...) {
                if ((*ptl->pTrigger->CheckTrigger)(ptl->pTrigger, oldval)) {
                    (*ptl->pTrigger->TriggerFired)(ptl->pTrigger);
                    startOver = 0;
                }
            }

            if (XSyncValueLessThan(newval, oldval)) { startOver = 1; }
            SyncComputeBracketValues(pCounter, startOver);
        }

    But you are the expert on this matter, surely you have a better
    way to address this issue properly.  Mine is just a proposal; I
    don't know what else it may break.

 5. If you are interested, I have a few gdb output to illustrate the
    issue.

Comment 1 Tim Taiwanese Liim 2010-02-18 04:05:24 UTC
Pointer bug from g-ss:
bug566351 after interrupting the fading a few times, gnome-screensaver 
          no longer locks the screen

Comment 2 Tim Taiwanese Liim 2010-02-18 04:16:06 UTC
Further info Comment #0, Additional Info #3.  In some other
good cases:
  - when the idle counter wraps around (from 60000 to 0)
        counter new     bracket_greater         bracket_less    pIdleTimeValueGreater   pIdleTimeValueLess
        0               60000                   60000           null                    60000
        - the positive transition trigger from gpm (30sec) makes
           /*4*/if (XSyncValueLessThan(pCounter->value, pTrigger->test_value) &&
                                    if              0 < 30000 &&
                    XSyncValueLessThan(pTrigger->test_value, psci->bracket_greater))
                                                30000 < 60000 True ==> >bracket_greater = 30000
          so we have
        counter new     bracket_greater         bracket_less    pIdleTimeValueGreater   pIdleTimeValueLess
        0               30000                   60000           30000                   null

But this point remains valid: 
    In short, in the good cases the wraparound bug in IDLETIME counter is 
    covered up by alarms from g-p-m.

Comment 3 Tim Taiwanese Liim 2010-02-26 02:25:27 UTC
I found a more consistent way to reproduce this issue.  The bug is
in Xorg::Xext/sync.c, which is exposed by the behavior of g-p-m.

Steps to Reproduce:
    1. change idle timeout to 1 minute, so it's easier to reproduce the issue. 
       gnome-screensaver-preferences 
            "Regard the computer as idle after"      1 minute 
       gnome-power-preferences 
            "Put display to sleep when inactive for" 1 minute 
 
    2. kill gnome-power-manager, and start  a new one by 
        p='idle for|idle dim|idle dim|active for' 
        b=/usr/bin/gnome-power-manager 
        $b --verbose | egrep "$p" 

    3. In F11, hit a key, idle for 35 sec, hit a key, idle for
       65 sec, hit a key, idle for 65 sec, hit a key.  Wait 10
       minutes; g-ss will not kick in, thus issue reproduced.

       The delay is actually (in F11)
           for i in 30 60 60; do
               d=$i+5
               idle $d sec
               hit a key

       In F12, use
           for i in 10 20 40 60; do
               d=$i+5
               idle $d sec
               hit a key

       g-p-m should output something like (in F12)
             - resetting idle dim time to 10s
             - increasing idle dim time to 20s
             - increasing idle dim time to 40s
             - increasing idle dim time to 80s


       Here is some bash-ish mouse script (use mouse copy and paste to
       your terminal window) to help you get the right timing:
            t() {
                n=$1
                while [ $n -gt 0 ]; do
                    printf "$(date) $n   \r"
                    let n=$n-1
                    sleep 1
                done
                printf "\n"
            }
            t1() {
                for d in $*; do
                    let d1=$d+5
                    t $d1
                done
            }
            # F11
            t1 30 60 60
            # F12
            t1 10 20 40 60

The reason for those funny numbers (10 20 40 60 in F12):
    - g-p-m schedules two alarms with Xext/sync.c:IDLETIME counter:
      1. for DIM    event (idle for 
                           30, 60, 120, 240, ... sec in F11; 
                           10, 20,  40,  80, ... sec in F12)
      2. for NORMAL event (when user is active, kbd or mouse).

    - initially g-p-m uses 10 sec (in F12) for DIM alarm; if the user
      undims screen immediately (within 10 sec) after dimming, the
      DIM time is doubled.  When DIM time (120 sec in F11, 80 sec in 
      F12) is more than 60 sec (g-ss idle time), bug566350 in
      Xorg::Xext/sync.c kicks in, thus no more idle events are sent
      to g-ss or g-p-m.

    - in F11 we have 30 60 60; the first two (30 60) are for doubling
      DIM time; the last one (60) is for g-ss timeout.

    - in F12 we have 10 20 40 60; the first three (10 20 40) are for 
      doubling DIM time; the last one (60) is for g-ss timeout.


Source code for doubling dim timeout in g-p-m.
Note: the design and implementation of this code segment is sound; I
put it here only to show the doubling behavior.

    gpm_backlight_notify_system_idle_changed (gpm-backlight.c) 
        egg_debug ("we have just been idle for %lfs", elapsed);  
        /* The user immediatly undimmed the screen!  
         * We should double the timeout to avoid this happening again */  
        if (elapsed < 10) {  
                /* double the event time */  
                backlight->priv->idle_dim_timeout *= 2.0;  
                egg_debug ("increasing idle dim time to %is",
                            backlight->priv->idle_dim_timeout);  
                gpm_idle_set_timeout_dim (backlight->priv->idle,
                            backlight->priv->idle_dim_timeout);  
        }  
        /* We reset the dimming after 2 minutes of idle,  
         * as the user will have changed tasks */  
        if (elapsed > 2*60) {  
                /* reset back to our default dimming */  
                backlight->priv->idle_dim_timeout =  
                        gconf_client_get_int (backlight->priv->conf,  
                                   GPM_CONF_BACKLIGHT_IDLE_DIM_TIME, NULL);  
                egg_debug ("resetting idle dim time to %is", 
                                backlight->priv->idle_dim_timeout);  
                gpm_idle_set_timeout_dim (backlight->priv->idle, 
                                backlight->priv->idle_dim_timeout);  
        }

Comment 4 Tim Taiwanese Liim 2010-08-04 04:17:16 UTC
Adam,
Any news?  Thanks.  Do you need further info?

Comment 5 Tim Taiwanese Liim 2010-08-13 04:37:31 UTC
Adam,
Has this bug been fixed already in F13?  If so, can we close this bug?

While debugging
    bug612620 Fade-out to screensaver cannot be interrupted with 
              xorg-x11-server-Xorg-1.8.2-1.fc13  
I tested 
    xorg-x11-server-Xorg-1.8.2-3.fc13.i686
with the test case in Comment #3.  xorg 1.8.2-3 (with fix#1,#2 in
bug612620 Comment #27) actually performed properly, no missing alarms
as it did in F11, F12.  So please close this bug if you agree.

Comment 6 Tim Taiwanese Liim 2010-08-20 16:11:38 UTC
I'll close this bug due to lack of response from Adam (speaking of
role reversal, usually it's the developer who is eager to close bugs;
in this case, the originator sees no reason to keep bug open for no
good reason).  From what I experimented, I think this bug has been
fixed in F13.  Please reopen if you disagree.