Bug 732878 (CVE-2011-3209)

Summary: CVE-2011-3209 kernel: panic occurs when clock_gettime() is called
Product: [Other] Security Response Reporter: Eugene Teo (Security Response) <eteo>
Component: vulnerabilityAssignee: Red Hat Product Security <security-response-team>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: anton, arozansk, bhu, davej, dhoward, fhrbata, jkacur, kernel-mgr, kmcmartin, lgoncalv, lwang, mfuruta, moshiro, nmurray, plougher, pmatouse, prarit, rt-maint, security-response-team, sforsber, tcallawa, tkeisukee, vgoyal, williams
Target Milestone: ---Keywords: Security
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: public=20080501,reported=20110824,source=researcher,impact=moderate,cvss2=4.9/AV:L/AC:L/Au:N/C:N/I:N/A:C,rhel-5/kernel=affected,rhel-6/kernel=notaffected,mrg-2/realtime-kernel=notaffected,rhel-4/kernel=notaffected,rhel-5.3.z/kernel=affected,rhel-5.6.z/kernel=affected,fedora-all/kernel=affected
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-04 04:07:55 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On: 732614, 732879, 739786, 739787, 748684    
Bug Blocks: 732875    

Description Eugene Teo (Security Response) 2011-08-23 22:59:06 EDT
Description of Problem:
The call trace is as follows:

crash> bt
PID: 16963  TASK: f7415aa0  CPU: 0   COMMAND: "1-2.run-test"
 #0 [eb1c4e20] crash_kexec at c04434bd
 #1 [eb1c4e64] die at c04064d3
 #2 [eb1c4e94] do_divide_error at c0406ac5
 #3 [eb1c4f44] error_code (via divide_error) at c0405abb
    EAX: 5e3c58c2  EBX: 3b9aca00  ECX: fffffe4c  EDX: fffffe4c  EBP: eb1c4000 
    DS:  007b      ESI: eb1c4fac  ES:  007b      EDI: eb1c4fac
    CS:  0060      EIP: c04374cd  ERR: ffffffff  EFLAGS: 00210246 
 #4 [eb1c4f78] sample_to_timespec at c04374cd
 #5 [eb1c4f8c] posix_cpu_clock_get at c0438744
 #6 [eb1c4fa8] sys_clock_gettime at c04367f3
 #7 [eb1c4fb8] system_call at c0404f44
    EAX: ffffffda  EBX: fffffff2  ECX: bfe85f78  EDX: 00967ff4 
    DS:  007b      ESI: fffffff2  ES:  007b      EDI: 00000000
    SS:  007b      ESP: bfe85f3c  EBP: bfe85f58
    CS:  0073      EIP: 00963e75  ERR: 00000109  EFLAGS: 00200246 

Here is [customer's] analysis of the problem.

Processing clock_gettime system call reached Divide Error Fault as described

1) clock_gettime system call is called with 0xfffffff2, which is clock ID of
   the init process whose process ID is 1.
   The clock ID is got from clock_getcpuclockid(1,&clock_id). 
2) posix_cpu_clock_get() sets cpu_time_count->sched to
   task_struct->sched_time of PID#1 and calls sample_to_timespec().
3) sample_to_timespec() divides cpu_time_count->sched by NSEC_PER_SEC using
4) The result of the division becomes bigger than 0xffffffff.
5) Divide Error Fault occurs.

The reason why Divide Error Fault occurs is the huge task_struct->sched_time
of PDI#1. When sys_clock_gettime() was called, task_struct->sched_time of
PID#1 was 0xfffffe4c5e3c58c2.

The task_struct->sched_time is increased by update_cpu_clock() while handling
local timer interrupts as follows. 

static inline void
update_cpu_clock(struct task_struct *p, struct rq *rq, unsigned long long now)
        p->sched_time += now - max(p->timestamp, rq->timestamp_last_tick);

The 'now' argument is got from TSC and if it is nearly zero, p->sched_time
becomes very large. It can happen while system booting on which TSC is
initialized to zero as follows.

  init() "init/main.c"
  -> smp_prepare_cpus(max_cpus)
     -> synchronize_tsc_bp()
        -> write_tsc()
           => TSC is initialized to 0

So the summary of the problem is as follows:

1) TSC is initialized to zero during system booting.
2) update_cpu_clock() is called just after 1) then task_struct->sched_time of
   PID#1 becomes very large.
3) sys_clock_gettime() is called for clock ID of PID#1.
4) Divide Error Fault occurs.
Comment 2 Eugene Teo (Security Response) 2011-08-23 23:04:40 EDT

Red Hat would like to thank Yasuaki Ishimatsu for reporting this issue.
Comment 5 Eugene Teo (Security Response) 2011-08-23 23:13:47 EDT

This issue did not affect the Linux kernels as shipped with Red Hat Enterprise Linux 4, 6, and Red Hat Enterprise MRG, as they either do not have the sample_to_timespec() function, or have already backported upstream commit f8bd2258, which addresses this issue. It was addressed in Red Hat Enterprise Linux 5 via https://rhn.redhat.com/errata/RHSA-2011-1386.html.
Comment 13 Eugene Teo (Security Response) 2011-09-01 22:03:52 EDT
Upstream commit:
Comment 19 errata-xmlrpc 2011-10-20 13:29:15 EDT
This issue has been addressed in following products:

  Red Hat Enterprise Linux 5

Via RHSA-2011:1386 https://rhn.redhat.com/errata/RHSA-2011-1386.html
Comment 20 Eugene Teo (Security Response) 2011-10-24 23:55:16 EDT
Created kernel tracking bugs for this issue

Affects: fedora-all [bug 748684]
Comment 21 errata-xmlrpc 2011-11-01 13:14:08 EDT
This issue has been addressed in following products:

  Red Hat Enterprise Linux 5.6.Z - Server Only

Via RHSA-2011:1419 https://rhn.redhat.com/errata/RHSA-2011-1419.html
Comment 22 errata-xmlrpc 2011-11-01 13:14:52 EDT
This issue has been addressed in following products:

  Red Hat Enterprise Linux 5 Long Life

Via RHSA-2011:1418 https://rhn.redhat.com/errata/RHSA-2011-1418.html