Bug 45104

Summary:	Silent death in slocate
Product:	[Retired] Red Hat Linux	Reporter:	Michal Jaegermann <michal>
Component:	kernel	Assignee:	Phil Copeland <copeland>
Status:	CLOSED WORKSFORME	QA Contact:	Brock Organ <borgan>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.3
Target Milestone:	---
Target Release:	---
Hardware:	alpha
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2001-06-20 20:58:54 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Michal Jaegermann 2001-06-20 03:36:37 UTC

Seawolf-gold RC2 for Alpha silently dies while running
/etc/cron.daily/slocate.cron.  A machine just stops responding
to a keybord input or mouse.  It is pingable from a remote but
an attempt to 'ssh' to stop after the following lined from 'ssh -v':

debug1: read PEM private key done: type RSA
debug1: identity file /root/.ssh/identity type 0
debug1: identity file /root/.ssh/id_rsa type -1
debug1: identity file /root/.ssh/id_dsa type -1

and no further response.  Trying SysRQ key (I have it on) brings
a table of tasks with a tail which looks like that:

.....
slocate.cron S fffffc0000830f84 0 1538  826 1540 (NOTLB)
updatedb     R current task     8 1540 1538      (NOTLB)
bash         S fffffc0000828518 0 1541  819      (NOTLB)

but after that it stops responding on all SysRQ sequences.
Unfortunatly no trace in logs.

  Michal
  michal

Comment 1 Bill Nottingham 2001-06-20 20:53:40 UTC

If it takes down the entire system, that's more of a kernel problem.

Comment 2 Phil Copeland 2001-06-24 18:42:56 UTC

Again, from a full install of the Gold CD's

[root@dhcpd141 alpha]# sh /etc/cron.daily/slocate.cron
[root@dhcpd141 alpha]#
[root@dhcpd141 alpha]# locate / | wc -l
 228451

Seems to have created a database. Works for me.
It may be that you're running out of space on /tmp or memory or swap

Comment 3 Michal Jaegermann 2001-06-24 20:04:47 UTC

Yes, "works-for-me" sometimes too.  And sometimes it does not.
That is the whole point!!!

Are these "RESOLVED WORKSFORME" a form of joke?  It is not a very
good one.

Comment 4 Phil Copeland 2001-06-27 13:35:33 UTC

Err Michal, your original bug report doesn't mention that this is an
intermittant fault
so, your last entry isn't cutting any ice here as it does indeed work for me.
I've since tried 8 times to get this to fail and it doesn't

Phil
=--=

Comment 5 Michal Jaegermann 2001-06-27 17:04:35 UTC

> Err Michal, your original bug report doesn't mention that this is
> an intermittant fault

Indeed, you are right.  Apologies.  And I do not have a ready recipe
to reproduce.

But for "too little swap" case - the box in question has 256 Megs
of memory and at least 512 MB of swap (may happen to have more).

This may, or may not, be related but I start to suspect that there
are some VM bugs which make _too much_ of swap into a killer.
A co-worker found a way to crash reliably a machine with 512 MB of
memory and 1 GB of swap.  If you are running lmbench then it starts
with checking how much of memory to allocate.  In the configuration
above lmbench stops in this check around 580 MB into it.  If, after
killing lmbench, you will try then 'swapoff -a' then the machine
reliably dies.  Reducing an amount of swap (just run 'mkswap' and
specify a number) to a half makes both 'lmbench' and 'swapoff -a'
to work.

Can you reproduce something like that?  An architecture _may_ be not
relevant and he had troubles with 2.2.19 and 2.4.x kernels.

There are really hard to put a finger on but I have seen other
troubles which are possibly related.  The case described in the original
report _may_ be an instance of it but I failed to reproduce it ever
since.  The other example may be my report about update troubles
with RC1 for 7.1-Alpha.  Reducing amount of swap helped me to go
further; but this may be coincidental and I never finished that update.
Since updates with RC2 and Seawolf were working reliably for everybody,
including me, then the bug was closed.