Bug 732878 (CVE-2011-3209)

Summary: CVE-2011-3209 kernel: panic occurs when clock_gettime() is called
Product: [Other] Security Response Reporter: Eugene Teo (Security Response) <eteo>
Component: vulnerabilityAssignee: Red Hat Product Security <security-response-team>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: anton, arozansk, bhu, davej, dhoward, fhrbata, jkacur, kernel-mgr, kmcmartin, lgoncalv, lwang, mfuruta, moshiro, nmurray, plougher, pmatouse, prarit, rt-maint, security-response-team, sforsber, tcallawa, tkeisukee, vgoyal, williams
Target Milestone: ---Keywords: Security
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-04 08:07:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 732614, 732879, 739786, 739787, 748684    
Bug Blocks: 732875    

Description Eugene Teo (Security Response) 2011-08-24 02:59:06 UTC
Description of Problem:
The call trace is as follows:

crash> bt
PID: 16963  TASK: f7415aa0  CPU: 0   COMMAND: "1-2.run-test"
 #0 [eb1c4e20] crash_kexec at c04434bd
 #1 [eb1c4e64] die at c04064d3
 #2 [eb1c4e94] do_divide_error at c0406ac5
 #3 [eb1c4f44] error_code (via divide_error) at c0405abb
    EAX: 5e3c58c2  EBX: 3b9aca00  ECX: fffffe4c  EDX: fffffe4c  EBP: eb1c4000 
    DS:  007b      ESI: eb1c4fac  ES:  007b      EDI: eb1c4fac
    CS:  0060      EIP: c04374cd  ERR: ffffffff  EFLAGS: 00210246 
 #4 [eb1c4f78] sample_to_timespec at c04374cd
 #5 [eb1c4f8c] posix_cpu_clock_get at c0438744
 #6 [eb1c4fa8] sys_clock_gettime at c04367f3
 #7 [eb1c4fb8] system_call at c0404f44
    EAX: ffffffda  EBX: fffffff2  ECX: bfe85f78  EDX: 00967ff4 
    DS:  007b      ESI: fffffff2  ES:  007b      EDI: 00000000
    SS:  007b      ESP: bfe85f3c  EBP: bfe85f58
    CS:  0073      EIP: 00963e75  ERR: 00000109  EFLAGS: 00200246 

Here is [customer's] analysis of the problem.

Processing clock_gettime system call reached Divide Error Fault as described
below:

1) clock_gettime system call is called with 0xfffffff2, which is clock ID of
   the init process whose process ID is 1.
   The clock ID is got from clock_getcpuclockid(1,&clock_id). 
2) posix_cpu_clock_get() sets cpu_time_count->sched to
   task_struct->sched_time of PID#1 and calls sample_to_timespec().
3) sample_to_timespec() divides cpu_time_count->sched by NSEC_PER_SEC using
   div_long_long_rem().
4) The result of the division becomes bigger than 0xffffffff.
5) Divide Error Fault occurs.

The reason why Divide Error Fault occurs is the huge task_struct->sched_time
of PDI#1. When sys_clock_gettime() was called, task_struct->sched_time of
PID#1 was 0xfffffe4c5e3c58c2.

The task_struct->sched_time is increased by update_cpu_clock() while handling
local timer interrupts as follows. 

---
static inline void
update_cpu_clock(struct task_struct *p, struct rq *rq, unsigned long long now)
{
        p->sched_time += now - max(p->timestamp, rq->timestamp_last_tick);
}
---

The 'now' argument is got from TSC and if it is nearly zero, p->sched_time
becomes very large. It can happen while system booting on which TSC is
initialized to zero as follows.

---
  init() "init/main.c"
  -> smp_prepare_cpus(max_cpus)
     -> synchronize_tsc_bp()
        -> write_tsc()
           => TSC is initialized to 0
---

So the summary of the problem is as follows:

1) TSC is initialized to zero during system booting.
2) update_cpu_clock() is called just after 1) then task_struct->sched_time of
   PID#1 becomes very large.
3) sys_clock_gettime() is called for clock ID of PID#1.
4) Divide Error Fault occurs.

Comment 2 Eugene Teo (Security Response) 2011-08-24 03:04:40 UTC
Acknowledgements:

Red Hat would like to thank Yasuaki Ishimatsu for reporting this issue.

Comment 5 Eugene Teo (Security Response) 2011-08-24 03:13:47 UTC
Statement:

This issue did not affect the Linux kernels as shipped with Red Hat Enterprise Linux 4, 6, and Red Hat Enterprise MRG, as they either do not have the sample_to_timespec() function, or have already backported upstream commit f8bd2258, which addresses this issue. It was addressed in Red Hat Enterprise Linux 5 via https://rhn.redhat.com/errata/RHSA-2011-1386.html.

Comment 13 Eugene Teo (Security Response) 2011-09-02 02:03:52 UTC
Upstream commit:
https://github.com/torvalds/linux/commit/f8bd2258e2d520dff28c855658bd24bdafb5102d

Comment 19 errata-xmlrpc 2011-10-20 17:29:15 UTC
This issue has been addressed in following products:

  Red Hat Enterprise Linux 5

Via RHSA-2011:1386 https://rhn.redhat.com/errata/RHSA-2011-1386.html

Comment 20 Eugene Teo (Security Response) 2011-10-25 03:55:16 UTC
Created kernel tracking bugs for this issue

Affects: fedora-all [bug 748684]

Comment 21 errata-xmlrpc 2011-11-01 17:14:08 UTC
This issue has been addressed in following products:

  Red Hat Enterprise Linux 5.6.Z - Server Only

Via RHSA-2011:1419 https://rhn.redhat.com/errata/RHSA-2011-1419.html

Comment 22 errata-xmlrpc 2011-11-01 17:14:52 UTC
This issue has been addressed in following products:

  Red Hat Enterprise Linux 5 Long Life

Via RHSA-2011:1418 https://rhn.redhat.com/errata/RHSA-2011-1418.html