Bug 636411 (CVE-2010-3699)

Summary: CVE-2010-3699 kernel: guest->host denial of service from invalid xenbus transitions
Product: [Other] Security Response Reporter: Eugene Teo (Security Response) <eteo>
Component: vulnerabilityAssignee: Red Hat Product Security <security-response-team>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: dhoward, drjones, pbonzini, plyons, pmatouse, rkhan, security-response-team
Target Milestone: ---Keywords: Security
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-19 09:14:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 635999, 636412    
Bug Blocks:    

Description Eugene Teo (Security Response) 2010-09-22 07:42:55 UTC
Description of problem:
Tearing down incorrectly a tap:aio:... device causes the backend to hang and
prevents further management operations in the host (including listing and
creating domains).

Version-Release number of selected component (if applicable):
Present since forever.

How reproducible:
100%

Actual results:
The backend of the domain's xvdb device will never be torn down, and the domain
will remain in "xm list" as a zombie until xend is restarted.  Even then, most
"xm" commands will not work anymore in the host.

The following messages will appear in /var/log/messages:

kernel: INFO: task xenwatch:21 blocked for more than 120 seconds.
kernel: xenwatch      D ffff8801de591100     0    21     19    22       (L-TLB)
kernel:  ffff8801de5abdc0  0000000000000246  0000000000000009  ffff8801de591100 
kernel:  0000000000000009  ffff8801de591100  ffff8800a8476040  0000000000000924 
kernel:  ffff8801de5912e8  ffffffffffffffff 
kernel: Call Trace:
kernel:  [<ffffffff802893ce>] enqueue_task+0x41/0x56
kernel:  [<ffffffff8029d110>] keventd_create_kthread+0x0/0xc4
kernel:  [<ffffffff88755901>] :blktap:tap_blkif_free+0x72/0x97
kernel:  [<ffffffff8029d328>] autoremove_wake_function+0x0/0x2e
kernel:  [<ffffffff887555e2>] :blktap:tap_frontend_changed+0x1d5/0x231
kernel:  [<ffffffff803ba494>] xenbus_read_driver_state+0x26/0x3b

Expected results:
The backend can tolerate guests with erroneous backend behavior.

Additional info:
The root cause of the failure is that in step 4a we skip the "Closing" phase of
the xenbus protocol, where the kernel thread is released:

        case XenbusStateClosing:
                if (be->blkif->xenblkd) {
                        kthread_stop(be->blkif->xenblkd);
                        be->blkif->xenblkd = NULL;
                }
                tap_blkif_free(be->blkif);

This code is never executed.  Then, at step 4d, another thread is started.

At step 5, the frontend goes to the Closing state, and the code above _is_
executed.  The second xenblkd thread _is_ stopped when the Closing state is
reached, but the leaked one keeps a reference to be->blkif and thus
tap_blkif_free hangs.  The whole xenwatch process then cannot run anymore.

Comment 2 Eugene Teo (Security Response) 2010-11-25 05:36:43 UTC
Upstream commit:
http://xenbits.xen.org/linux-2.6.18-xen.hg?rev/59f097ef181b

Comment 4 errata-xmlrpc 2011-01-04 16:52:12 UTC
This issue has been addressed in following products:

  Red Hat Enterprise Linux 5

Via RHSA-2011:0004 https://rhn.redhat.com/errata/RHSA-2011-0004.html