987003 – clearer udevd messages than "worker [y] terminated by signal 9 (Killed)"

Bug 987003 - clearer udevd messages than "worker [y] terminated by signal 9 (Killed)"

Summary: clearer udevd messages than "worker [y] terminated by signal 9 (Killed)"

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	systemd
Sub Component:
Version:	rawhide
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	systemd-maint
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-07-22 15:01 UTC by udo
Modified:	2016-08-04 04:58 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-08-04 04:49:06 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
the screen that shows what failed at the end, in rather bad photograph (1.65 MB, image/jpeg) 2013-08-06 18:05 UTC, udo	no flags	Details
worked killed picture (1.65 MB, image/jpeg) 2013-08-06 18:07 UTC, udo	no flags	Details
View All

Description udo 2013-07-22 15:01:13 UTC

Description of problem:
kernel.org 3.10.x built in F19 does not successfully boot.
I built 3.10.1 from kernel.org as I usually build my kernels.
This kernel behaves differently from 3.9.x.
Is this a kernel issue or a fedora issue?

I see:

Found device /dev/mapper/luks-058(etc) (my crypto that I typed the password for)
Started Cryptography Setup for luks-058(etc)
Reached target ENcrypted Volumes.
Starting dracut pre-mount hook...
Reached target System Initialization.
Reached target Basic System.
Started dracut pre-mount hook. (here it sits for 20 seconds or more)
systemd-udevd[x]: worker [y] terminated by signal 9 (Killed)
systemd-udevd[x]: worker [y+1] terminated by signal 9 (Killed)

after this a screen with all the detected usb hubs etc appears
and after a while:

Timed out waiting for device dev-myvg-rootlv.device.
Dependency failed for /sysroot
Dependency failed for Initrd Root File System
Dependency failed for Reload Configuration from the Real Root

it tries to start an emergency shell but fails.


I did rebuild 3.9.9 after upgrading to F19 and that kernel works. (still!)
3.9.10 too.
What could be wrong?

Is it this issue? See http://us.generation-nt.com/bug-3-10...211868412.html

Version-Release number of selected component (if applicable):
systemd-204-9.fc19.x86_64

How reproducible:
Get kernel.org source, build, install, boot.

Actual results:
See attachments

Expected results:
Clean boot.

Additional info:
3.9.x boots OK. Config did not change much from 3.9.10 to 3.10.x.

Comment 1 Michal Schmidt 2013-07-22 15:17:30 UTC

(In reply to udo from comment #0)
I am using a Fedora-built kernel-3.10.0-1.fc20 on one of my systems and it works.
So I don't know. Might be some .config difference again. Try to find out which config option it is and let us know, if you will.

> Is it this issue? See http://us.generation-nt.com/bug-3-10...211868412.html

Error 404

We do not investigate issues arising from using custom kernels.

Comment 2 udo 2013-07-22 16:07:57 UTC

The kernel works fine.
It is systemd-udevd and similar parts that output weird messages without making clear what actually goes wrong.

Found device /dev/mapper/luks-058(etc) (my crypto that I typed the password for)
Started Cryptography Setup for luks-058(etc)
Reached target Encrypted Volumes.
Starting dracut pre-mount hook...
Reached target System Initialization.
Reached target Basic System.

This makes me believe the disk decrypted OK.

systemd-udevd[x]: worker [y] terminated by signal 9 (Killed)
systemd-udevd[x]: worker [y+1] terminated by signal 9 (Killed)

This is quite vague: what worker for what was killed by what for what reason?

How can the system boot (slowly) further, loading usb modules, loading radeon graphics modules, etc? Those are on the encrypted lvms.
The kernel has the 'rd.lvm.vg=myvg' thingie.
dracut.conf was unchanged since Nov  4  2012.
It is then systemd that panics:

Timed out waiting for device dev-myvg-rootlv.device.
Dependency failed for /sysroot
Dependency failed for Initrd Root File System
Dependency failed for Reload Configuration from the Real Root

And the emergency feature doesn't work. I can therefor not gather further information. 

So what is the progress that systemd brings us in this situation versus init and SysV scripts? Please enlighten us.

Comment 3 Michal Schmidt 2013-07-22 16:34:29 UTC

(In reply to Michal Schmidt from comment #1)
> (In reply to udo from comment #0)
> > Is it this issue? See http://us.generation-nt.com/bug-3-10...211868412.html
> 
> Error 404

It appears you meant this link:
http://us.generation-nt.com/answer/bug-3-10-01-modprobe-snd-hangs-help-211868412.html

The symptoms sure look similar. The discussion there suggests it's a kernel bug in the snd_seq_oss module. There's even a patch attached. You can test if it's the same problem either by trying to apply the patch, or by disabling the module in your kernel config.

> it tries to start an emergency shell but fails.

How exactly does it fail?

Comment 4 udo 2013-07-22 16:56:01 UTC

a dependency fails for emergency mode.

I cannot see any crash c.q. kernel dump like in the article but I *will* try the patch.

So please consider making the worker messages more clear about what is causing this, also make us see any kernel burps, please.

Comment 5 Michal Schmidt 2013-07-22 17:05:30 UTC

(In reply to udo from comment #4)
> a dependency fails for emergency mode.

Does it say nothing more than that? Could you take a picture of it and attach it?

> So please consider making the worker messages more clear about what is
> causing this

Well, this at least should be doable.
Reopening to have someone with udev knowledge (i.e. not me) consider making some improvements to the "worker [y] terminated by signal 9 (Killed)" messages.

> also make us see any kernel burps, please.

You should see kernel messages if you do not have "quiet" on the command line.

Comment 6 udo 2013-08-06 18:00:25 UTC

BTW: 3.10.5 does boot OK now....

Comment 7 udo 2013-08-06 18:05:30 UTC

Created attachment 783445 [details]
the screen that shows what failed at the end, in rather bad photograph

the screen that shows what failed at the end, in rather bad photograph

Comment 8 udo 2013-08-06 18:07:58 UTC

Created attachment 783446 [details]
worked killed picture

Comment 9 udo 2013-08-06 18:09:50 UTC

No 'quiet' in my grub.conf(s).

Comment 10 Fedora End Of Life 2015-01-09 22:13:05 UTC

This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 11 Fedora End Of Life 2015-05-29 09:12:36 UTC

This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 12 udo 2015-05-29 12:37:31 UTC

No deep insights?
Patches?
Questions?

Comment 13 Fedora End Of Life 2015-06-30 00:40:19 UTC

Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 14 udo 2015-06-30 02:29:35 UTC

(In reply to Michal Schmidt from comment #5)

> Reopening to have someone with udev knowledge (i.e. not me) consider making
> some improvements to the "worker [y] terminated by signal 9 (Killed)"
> messages.


?

Comment 15 Zbigniew Jędrzejewski-Szmek 2016-08-04 04:49:06 UTC

I looked the code over, and we do in fact log what failed:

   log_warning("worker ["PID_FMT"] terminated by signal %i (%s)", pid, WTERMSIG(status), strsignal(WTERMSIG(status)));
   ...
   log_error("worker ["PID_FMT"] failed while handling '%s'", pid, worker->event->devpath);

This should be good enough. Please reopen if you this does not provide the expected information.

Comment 16 udo 2016-08-04 04:58:36 UTC

If the code shows to a non-programmer what the worker was doing, why it failed etc then it is OK.

Note You need to log in before you can comment on or make changes to this bug.