Bug 619265 - regression caused by fix to typo in pm-utils
Summary: regression caused by fix to typo in pm-utils
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: pm-utils   
(Show other bugs)
Version: 6.0
Hardware: All
OS: Linux
Target Milestone: rc
: ---
Assignee: Jaroslav Škarvada
QA Contact: desktop-bugs@redhat.com
Keywords: Regression
Depends On:
Blocks: 613509
TreeView+ depends on / blocked
Reported: 2010-07-29 05:58 UTC by Dave Airlie
Modified: 2010-08-05 06:06 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2010-08-05 06:06:00 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

Description Dave Airlie 2010-07-29 05:58:03 UTC
The fix for bug 613509

actually meant we avoiding VT switching on kms drivers, however due to the fact that this has been broken for at least 2 fedora releases means that particular code path was never actually tested in the field.

I noticed a regression during testing on a HP laptop on resume I wouldn't always get the X server back, sometimes it should end up pointing the hardware that the console.

This is due to the fact on resume, the kernel resumes into X, but pm-utils then echos to a file in /sys/class/graphics/fb0 to set the unsuspended state, however touching this file causes the fbcon to set a mode and crap all over the X mode.

I think at this point in EL6 we should just remove the quirk addition and try and fix this upstream first.

Comment 1 Dave Airlie 2010-07-29 05:59:22 UTC
for pm, this is a definite blocker.

Comment 2 Jaroslav Škarvada 2010-07-29 08:32:37 UTC
Please could you be more specific where is the regression related to fix for bug 613509? The implementation of 613509 differs from the rawhide version but the behaviour should be the same as in rawhide.

Please what is the point of this bug report? Removing/changing fix for bug 613509? Removing --quirk-no-chvt from KMS? Or something different? I think that for RHEL-6.0 is a bit late for complex changes/fixing things upstream.

Comment 3 Dave Airlie 2010-07-29 11:45:43 UTC
I'll restate, before you fixed the problem, the chvt avoidance code never happened because it was broken, this means the code was never tested in production. That codepath seems to be broken in the kernel, as it was never tested as it was never enabled.

So we should explicitly disable the --quirk-no-chvt for RHEL6.0 on purpose instead of by typo. In rawhide I'd leave it alone and we can try and fix the kernel to do not be broken, if this fix went into a stable fedora release it should also be backed out. Maybe for 6.1 we can backport the kernel fix and re-enable this feature.

So yes for EL6.0 remove the --quirk-no-chvt is my suggestion.

Comment 4 Dave Airlie 2010-07-29 12:01:44 UTC
actually it looks like the bugfix to #590541 that fixed the call to chvt.

Not sure why we haven't noticed this before, but lets just be safe for EL6.0 and remove the VT switch and file a bug for 6.1 to investigate further.

Comment 5 Jaroslav Škarvada 2010-08-02 10:23:10 UTC
Thanks for info. I checked with my Intel GPU (KMS forced on):

# cat /sys/module/i915/parameters/modeset

For my HW the video resumes correctly with --quirk-no-chvt. But without --quirk-no-chvt it sometimes doesn't resume and I have to manually chvt. Thus I got the opposite of your observation (but I had to force KMS on, thus it may not worked correctly).

Please note that upstream of pm-utils have --quirk-no-chvt for KMS. And I worry that after dropping it, I will start receiving bugs on non working video resume (there were some during the period of broken chvt-avoidance-code in Fedora).

It seems there exist video HW needing --quirk-no-chvt and also video HW needing no --quirk-no-chvt. I think that correctly working KMS shouldn't break by manual call of chvt and also correctly working KMS shouldn't need chvt. But maybe I am not right. Now I don't know which one causes less problems. Please could anybody clarify this?

Comment 7 Dave Airlie 2010-08-05 06:06:00 UTC
Okay I've spent some time after I cleared off my other problems and I tracked down the actual cause of the issue.

On my laptop where I noticed the issue, I get an oops on resume, and this causes a write to the fb0 state file to call into a codepath that isn't called into normally. If no oops happens things operate as normal. I'm not sure we really need pm-utils writing into the fb0 files on KMS but we should fix that upstream.

I'll close this bug an open a new 6.1 bug against the kernel for me to investigate later.

Note You need to log in before you can comment on or make changes to this bug.