Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1103021

Summary: Guest kernel crash while enter s3/s4 state with 82599ES vf assigned
Product: Red Hat Enterprise Linux 6 Reporter: mazhang <mazhang>
Component: qemu-kvmAssignee: Alex Williamson <alex.williamson>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.6CC: acathrow, alex.williamson, bsarathy, chayang, juzhang, michen, mkenneth, qzhang, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-12 13:34:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 912287    
Attachments:
Description Flags
vmcore-dmesg.txt none

Description mazhang 2014-05-30 04:39:11 UTC
Description of problem:
Try suspend guest with vf assigned, guest kernel crash.

Version-Release number of selected component (if applicable):

Host:
qemu-kvm-rhev-debuginfo-0.12.1.2-2.427.el6.x86_64
gpxe-roms-qemu-0.9.7-6.10.el6.noarch
qemu-kvm-rhev-0.12.1.2-2.427.el6.x86_64
qemu-img-rhev-0.12.1.2-2.427.el6.x86_64
qemu-kvm-rhev-tools-0.12.1.2-2.427.el6.x86_64
kernel-2.6.32-469.el6.x86_64

Guest:
kernel-2.6.32-471.el6.x86_64

How reproducible:
3/3

Steps to Reproduce:
1.Genrate VF, unbind to pci-stub.
[root@intel-e5530-8-2 ~]# ls /sys/bus/pci/drivers/pci-stub/
0000:05:10.0  bind  new_id  remove_id  uevent  unbind

2.Boot vm with vf.
/usr/libexec/qemu-kvm \
-M pc \
-cpu SandyBridge \
-m 4G \
-smp 32,sockets=16,cores=2,threads=1,maxcpus=160 \
-enable-kvm \
-name rhel6.4 \
-uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \
-smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 \
-k en-us \
-rtc base=localtime,clock=host,driftfix=slew \
-nodefaults \
-monitor stdio \
-qmp tcp:0:6667,server,nowait \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-monitor unix:/tmp/guest-sock,server,nowait \
-drive file=/home/rhel6.5-64-backup.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads \
-device virtio-blk-pci,scsi=on,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,addr=0x3 \
-vga qxl \
-spice port=5900,disable-ticketing \
-device pci-assign,host=05:10.0,id=hostnet_VF1 \
-global PIIX4_PM.disable_s3=0 \
-global PIIX4_PM.disable_s4=0 \

3.Suspend guest
(in guest)#pm-suspend

Actual results:
Guest kernel crash.

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-471.el6.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 32
        DATE: Fri May 30 19:27:10 2014
      UPTIME: 00:02:29
LOAD AVERAGE: 0.65, 0.82, 0.36
       TASKS: 632
    NODENAME: localhost.localdomain
     RELEASE: 2.6.32-471.el6.x86_64
     VERSION: #1 SMP Wed May 28 19:22:04 EDT 2014
     MACHINE: x86_64  (2393 Mhz)
      MEMORY: 4 GB
       PANIC: "Oops: 0011 [#1] SMP " (check log for details)
         PID: 2808
     COMMAND: "pm-suspend"
        TASK: ffff880118673500  [THREAD_INFO: ffff8801173e4000]
         CPU: 26
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 2808   TASK: ffff880118673500  CPU: 26  COMMAND: "pm-suspend"
 #0 [ffff8801173e58b0] machine_kexec at ffffffff8103a0eb
 #1 [ffff8801173e5910] crash_kexec at ffffffff810c7512
 #2 [ffff8801173e59e0] oops_end at ffffffff8152a9f0
 #3 [ffff8801173e5a10] no_context at ffffffff8104b36b
 #4 [ffff8801173e5a60] __bad_area_nosemaphore at ffffffff8104b5f5
 #5 [ffff8801173e5ab0] bad_area_nosemaphore at ffffffff8104b6c3
 #6 [ffff8801173e5ac0] __do_page_fault at ffffffff8104be1f
 #7 [ffff8801173e5be0] do_page_fault at ffffffff8152c93e
 #8 [ffff8801173e5c10] page_fault at ffffffff81529cf5
 #9 [ffff8801173e5cf0] ixgbevf_resume at ffffffffa01134c9 [ixgbevf]
#10 [ffff8801173e5d30] pci_legacy_resume at ffffffff812ac7b2
#11 [ffff8801173e5d50] pci_pm_resume at ffffffff812ac9c8
#12 [ffff8801173e5d80] pm_op at ffffffff8136edc0
#13 [ffff8801173e5da0] dpm_resume_end at ffffffff8136fb37
#14 [ffff8801173e5e00] suspend_devices_and_enter at ffffffff810c060e
#15 [ffff8801173e5e20] enter_state at ffffffff810c07b7
#16 [ffff8801173e5e40] state_store at ffffffff810bfe1a
#17 [ffff8801173e5e90] kobj_attr_store at ffffffff8128a817
#18 [ffff8801173e5ea0] sysfs_write_file at ffffffff812076a5
#19 [ffff8801173e5ef0] vfs_write at ffffffff8118ba68
#20 [ffff8801173e5f30] sys_write at ffffffff8118c431
#21 [ffff8801173e5f80] system_call_fastpath at ffffffff8100b072
    RIP: 00000035b7adb7a0  RSP: 00007fffa18d2450  RFLAGS: 00010202
    RAX: 0000000000000001  RBX: ffffffff8100b072  RCX: 000000000000006d
    RDX: 0000000000000003  RSI: 00007f3f73822000  RDI: 0000000000000001
    RBP: 00007f3f73822000   R8: 00000000ffffffff   R9: 00000000015a5df0
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000003
    R13: 00000035b7d8e780  R14: 0000000000000003  R15: 00000035b7d8e780
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b


Expected results:
Guest works well.

Additional info:

[root@intel-e5530-8-2 ~]# lspci -vvv -s 05:00.0
05:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
	Subsystem: Intel Corporation Ethernet Server Adapter X520-2
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 35
	Region 0: Memory at df300000 (64-bit, non-prefetchable) [size=512K]
	Region 2: I/O ports at ecc0 [size=32]
	Region 4: Memory at df2f8000 (64-bit, non-prefetchable) [size=16K]
	Expansion ROM at df200000 [disabled] [size=512K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00002000
	Capabilities: [a0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
		DevCtl:	Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #3, Speed 5GT/s, Width x8, ASPM L0s, Latency L0 <1us, L1 <8us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [e0] Vital Product Data
		Unknown small resource type 00, will not decode more.
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [140 v1] Device Serial Number 00-1b-21-ff-ff-c3-d0-3c
	Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
		ARICap:	MFVC- ACS-, Next Function: 1
		ARICtl:	MFVC- ACS-, Function Group: 0
	Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
		IOVCap:	Migration-, Interrupt Message Number: 000
		IOVCtl:	Enable+ Migration- Interrupt- MSE+ ARIHierarchy+
		IOVSta:	Migration-
		Initial VFs: 64, Total VFs: 64, Number of VFs: 2, Function Dependency Link: 00
		VF offset: 128, stride: 2, Device ID: 10ed
		Supported Page Size: 00000553, System Page Size: 00000001
		Region 0: Memory at 00000000df400000 (64-bit, non-prefetchable)
		Region 3: Memory at 00000000df500000 (64-bit, non-prefetchable)
		VF Migration: offset: 00000000, BIR: 0
	Kernel driver in use: ixgbe
	Kernel modules: ixgbe

Comment 1 mazhang 2014-05-30 04:48:29 UTC
Created attachment 900605 [details]
vmcore-dmesg.txt

Comment 3 mazhang 2014-05-30 05:04:31 UTC
kernel-2.6.32-431.20.2.el6.x86_64 also hit this problem.

Comment 4 mazhang 2014-05-30 07:40:41 UTC
s4 also hit this problem.

Comment 5 Ademar Reis 2014-06-12 13:33:18 UTC
S3/S4 support is tech-preview in RHEL6 and it'll be promoted to fully supported
at some point, but only in RHEL7.

Therefore we're closing all S3/S4 related bugs in RHEL6. New bugs will be
considered only if they're regressions or break some important use-case or
certification.

RHEL7 is being more extensively tested and effort from QE is underway in
certifying that this particular bug is not present there.

For the RHEL7 tracker, please visit Bug 923626.