Bug 636411 (CVE-2010-3699)

Summary:	CVE-2010-3699 kernel: guest->host denial of service from invalid xenbus transitions
Product:	[Other] Security Response	Reporter:	Eugene Teo (Security Response) <eteo>
Component:	vulnerability	Assignee:	Red Hat Product Security <security-response-team>
Status:	CLOSED ERRATA	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	unspecified	CC:	dhoward, drjones, pbonzini, plyons, pmatouse, rkhan, security-response-team
Target Milestone:	---	Keywords:	Security
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-10-19 09:14:20 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	635999, 636412
Bug Blocks:

Description Eugene Teo (Security Response) 2010-09-22 07:42:55 UTC

Description of problem:
Tearing down incorrectly a tap:aio:... device causes the backend to hang and
prevents further management operations in the host (including listing and
creating domains).

Version-Release number of selected component (if applicable):
Present since forever.

How reproducible:
100%

Actual results:
The backend of the domain's xvdb device will never be torn down, and the domain
will remain in "xm list" as a zombie until xend is restarted.  Even then, most
"xm" commands will not work anymore in the host.

The following messages will appear in /var/log/messages:

kernel: INFO: task xenwatch:21 blocked for more than 120 seconds.
kernel: xenwatch      D ffff8801de591100     0    21     19    22       (L-TLB)
kernel:  ffff8801de5abdc0  0000000000000246  0000000000000009  ffff8801de591100 
kernel:  0000000000000009  ffff8801de591100  ffff8800a8476040  0000000000000924 
kernel:  ffff8801de5912e8  ffffffffffffffff 
kernel: Call Trace:
kernel:  [<ffffffff802893ce>] enqueue_task+0x41/0x56
kernel:  [<ffffffff8029d110>] keventd_create_kthread+0x0/0xc4
kernel:  [<ffffffff88755901>] :blktap:tap_blkif_free+0x72/0x97
kernel:  [<ffffffff8029d328>] autoremove_wake_function+0x0/0x2e
kernel:  [<ffffffff887555e2>] :blktap:tap_frontend_changed+0x1d5/0x231
kernel:  [<ffffffff803ba494>] xenbus_read_driver_state+0x26/0x3b

Expected results:
The backend can tolerate guests with erroneous backend behavior.

Additional info:
The root cause of the failure is that in step 4a we skip the "Closing" phase of
the xenbus protocol, where the kernel thread is released:

        case XenbusStateClosing:
                if (be->blkif->xenblkd) {
                        kthread_stop(be->blkif->xenblkd);
                        be->blkif->xenblkd = NULL;
                }
                tap_blkif_free(be->blkif);

This code is never executed.  Then, at step 4d, another thread is started.

At step 5, the frontend goes to the Closing state, and the code above _is_
executed.  The second xenblkd thread _is_ stopped when the Closing state is
reached, but the leaked one keeps a reference to be->blkif and thus
tap_blkif_free hangs.  The whole xenwatch process then cannot run anymore.

Comment 2 Eugene Teo (Security Response) 2010-11-25 05:36:43 UTC

Upstream commit:
http://xenbits.xen.org/linux-2.6.18-xen.hg?rev/59f097ef181b

Comment 4 errata-xmlrpc 2011-01-04 16:52:12 UTC

This issue has been addressed in following products:

  Red Hat Enterprise Linux 5

Via RHSA-2011:0004 https://rhn.redhat.com/errata/RHSA-2011-0004.html