Bug 167830

Summary: mgetty locks up when starting a dial in session:- probably kernel related
Product: [Fedora] Fedora Reporter: Duncan Mercer <djlm>
Component: mgettyAssignee: Martin Nagy <mnagy>
Status: CLOSED WONTFIX QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: hripps, mbarabas, triage, ts_sb_1
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard: bzcl34nup
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-06 15:31:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Log file snippet showing the lockup
none
Strace log when the permissions of the serial port are more relaxed
none
strace logs of the login:- one worked, one failed none

Description Duncan Mercer 2005-09-08 17:13:08 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050720 Fedora/1.0.6-1.1.fc4 Firefox/1.0.6

Description of problem:
Mgetty locks up when handling a dial in session. The system handles the
dial in correctly, provides the login prompt and then locks up when it starts the program to handle the login session.

At the lockup point, the 'ps' command reports that mgetty has passed operation
to the login program, but this never starts. This is always true even if we call a "login" prog that does nothing other than exit, this locks up also.

I only ever see this when using a native serial port to handle the modem. If I switch to a non-native, ie a serial port on PCI card I never see the problem.

I started experiencing a similar problem after upgrading machines from RH8 to RH9. And the flaw is still here in FC4, the frequency of the problem has varied under the different kernels but never gone away.

I only see the problem on newer "faster" machines > 1500MHz, I've never seen it on old 600Mhz machines. I've seen the problem on a number of faster machines
handling a number of different styles of modem.

I've tried compiling and tracing through Mgetty a number of times but I always loose the trace when the code jumps into the kernel and never returns. I'm rather out of my depth trying to debug kernel problems so I've never got any further.

Thereafter the login prog that was called must be killed manually otherwise it will remain inactive and blocking the serial port ad infintium. It's also possible to kill the session by droping the Carrier detect line.

I'm assuming that the lock up relates to a race condition in the kernel, but not
knowing much about the kernel I've not got much further.

Version-Release number of selected component (if applicable):
1.1.33-3

How reproducible:
Always

Steps to Reproduce:
1.Setup megetty to handle incoming calls on the native serial port
2.Dial into the system across the PSTN system
3.Mgetty answers the call provides a Username prompt
4.Enter a username at the prompt
  

Actual Results:  No further response, ie you never get the password prompt for a standard login.
calls to 'ps' show that the login prog has replaced mgetty but never starts.

Expected Results:  I would expect a password prompt to be returned for the standard login. Alternatively when I've used an alternative login handler, i.e. one that does nothing but terminate it's self. I'd expect this to finish and mgetty to be able to respawn.

Additional info:

* I'm seeing this only the faster 2 of my 3 machines
* All three machines worked fine with mgetty using RH8 and earlier
* I've seen the effect in RH9, FC3 & 4

Comment 1 Jason Vas Dias 2005-09-08 17:40:42 UTC
I have tested PPP login, serial console /bin/login, voice and FAX reception
with mgetty on FC3 and Rawhide and have not been able to duplicate this problem.

You say:
> calls to 'ps' show that the login prog has replaced mgetty but never starts
If the mgetty spawned process is named 'login' and its executable is 
/bin/login, then login has started; it may be hung for some reason.

It would be most helpful if you could append your mgetty configuration and
log files from a machine where this problem was reproduced to this bug report:
  # tar -cpf - /var/log/mgetty.log.* /etc/mgetty+sendfax | gzip >
/tmp/mgetty.logs.tar.gz

Please either upload this file to bugzilla or send it to jvdias -
I'll ensure its contents remain confidential if this is of concern to you.

It would also be most helpful if you could gather an strace log:

o in /etc/mgetty+sendfax/login.config, set the 'login_program' used by mgetty
  for dial-in users to be a executable script such as: 
'
#!/bin/bash
exec /usr/bin/strace -v -t -f \
 -o /tmp/mgetty.login.strace.`/bin/date +'%s.%N'`.log \
 /bin/login $* 
'
This would tell us exactly where the login process is hanging.
 
Do you have SELinux enabled on the machines where the problem occurs ?
If so: - what is the output of 'audit2allow < /var/log/audit/audit.log' ?
       - does the problem still occur with selinux=permissive ? 


Comment 2 Duncan Mercer 2005-09-14 09:53:37 UTC
Created attachment 118792 [details]
Log file snippet showing the lockup

The attached logfile snippet shows the last attempted login that locked up.

I was calling a login script that called strace as you suggested. I've attached
the output from that along with my mgetty config under a seperate email.

The log file shows the main "oddity" that I see in the logs after a lockup,
that is a whole series of spurious chars reported while waiting for line to
clear. I never see this when I have a login that works.

Comment 3 Duncan Mercer 2005-09-14 10:19:51 UTC
Hi Jason

Thanks for your swift response to my bug report...

Answers to your questions.
1. SELinux currently set to disabled
2. I've attached a file Mgetty.Log.Example, this shows the log output I get
when I have the system lockup.
3. sent seperately an strace output as requested.
4. sent seperately a zipped log/config file tar as requested.

5. You said
>If the mgetty spawned process is named 'login' and its executable is
>/bin/login, then login has started; it may be hung for some reason.

I've tried using other "empty" scripts and compiled 'c' progs to handle the
login all which show the same behaviour.

E.G.     'c prog'       

int main(void){
 return 0;
}

I see the problem with my own login scripts and with the default
/bin/login

I would be most greatful if you could shed some light on this problem.


Comment 4 Jason Vas Dias 2005-09-14 18:49:15 UTC
Hi Duncan -

You said:
> 3. sent seperately an strace output as requested.
> 4. sent seperately a zipped log/config file tar as requested.

Sorry, but I still have not received them - did you send them to 
'jvdias' ?  Please try resending them.

These would be most useful in trying to reproduce this problem,
which I have so far been unable to do.

Thanks,
Jason.


Comment 5 Duncan Mercer 2005-09-15 10:49:58 UTC
Jason I've now sent two additional copies of my mail
with the log attachments to jvdias.

I haven't had any of these three mails returned to me
so hopefully you will get at least one of them.
Fingers crossed. 

If none of these work then let me know and I can edit
the relevant files to remove any "sensitive" information
and post them here.

Duncan

Comment 6 Jason Vas Dias 2005-09-15 18:32:49 UTC
Hi Duncan - 

The mgetty.login.strace.log you sent shows that login does not have correct
permissions for the /dev/ttyS0 device:

4964  10:59:53 lstat64("/dev/ttyS0", \
                       { st_dev=makedev(0, 13), st_ino=1258, \
                         st_mode=S_IFCHR|0600, st_nlink=1, \
                         st_uid=10, st_gid=14, \
                         st_blksize=4096, st_blocks=0, \ 
                         st_rdev=makedev(4, 64), \
                         st_atime=2005/09/14-10:59:53, \ 
                         st_mtime=2005/09/14-10:59:53, \
                         st_ctime=2005/09/14-10:59:47}) = 0

So the /dev/ttyS0 file is owned by uid 10 (uucp on my system), group uucp,
and has mode "0600", which is read+write by owner only.

mgetty runs as root, as shown by the uid of the mgetty.log.ttyS0 file in
your mgetty.logs.tar.gz file:
-rw-r--r-- root/root     63125 2005-09-14 05:42:23 var/log/mgetty.log.ttyS0


mgetty reads the user name, "iridium" in your case, and will setuid to
"iridium" in your case, as shown in the /var/log/mgetty.log.ttyS0 file:
09/13 02:50:58 yS0   login: user id: iridium (uid 513, gid 513)

login is execve'd with userid (uid 513, gid 513), and the strace shows it
is not allowed to change the mode of /dev/ttyS0, which only permits the 
(uucp,uucp) userid to read/write the device:
4964  10:59:53 chown32("/dev/ttyS0", 0, 0) = -1 EPERM (Operation not permitted)
4964  10:59:53 chmod("/dev/ttyS0", 0620) = -1 EPERM (Operation not permitted)

login then engages in the most unhelpful behaviour of infinitely trying to 
read the device, which will fail because it does not have permission to 
read it.

To fix: 
Uncomment these lines in /etc/mgetty+sendfax/mgetty.config:
"
port-owner uucp
port-group uucp
port-mode 0664
"
And do:
# chmod 0664 /dev/ttyS0

Or, if the "iridium" user is the ONLY user which logs in with mgetty over
/dev/ttyS0, then change the ownership of /dev/ttyS0 to iridium:iridium.

Ensure the "iridium" user is a member of the "uucp" if /dev/ttyS0 is owned by
uucp:uucp.

Ensure mgetty is run as root, eg. by including this line in /etc/inittab:
"
S0:5:respawn:/sbin/mgetty ttyS0
".

Then you should have no further problems - if you do, please let me know.

Comment 7 Duncan Mercer 2005-09-16 11:29:11 UTC
Jason

I've tried as you suggested and made sure that the file permissions etc were set
as you suggested and I still see the same problem.

When I look at the strace file for the login trace it see that, as you showed
above   that the permissions on /dev/ttyS0 are 600 when login starts despite
mgetty having set them previously.

If I examine the permissions on the /dev/ttyS0 device when mgetty is started but
BEFORE I start to dial into the system then the permissions are set as dictated
by the mgetty.config file (eg 0664 or 0666). However as soon as mgetty picks up
an incomming call the permissions on the device (as reported by 'll') change to
0600. This happens as soon as the login prompt is displayed, I.E. before mgetty
has spawned the login process and terminated it's self.

A couple of other things that suggest that the problem is more fundamental are
things I've seen before when attempting to get to the bottom of this problem.

1. The lockup is not consistent. Under FC2 the first dial in session after a
period of inactivity of ~ 1Hr would be handled correctly, thereafter the login
attempt would hang. Under FC4 I'm seeing the lockup 90% of the time.

2. In an attempt to get to the bottom of the problem under FC2 I attempted to
use strace to follow /bin/mgetty (not /bin/login) during a failed login. I found
that 100% of the time I used strace then the login would succeed, if however I
did not use strace it would fail. This was one of the things that lead me to
conclude that the problem related to a kernel race condition.

I've tried running strace against mgetty during the login  number of times under
FC4 but haven't yet establised a pattern as to when it works and when it doesn't.

I've put a tar file on my webserver with a number of different strace outputs in.

I've include 
1. trace of the /bin/login prog when 
mgetty.config was set to set group and owner to uucp and modem 0664
(iridium being in the uucp group)

2. trace of /bin/login prog when I've changed /dev/ttyS0 permissions using chmod
etc to be "0666" owner "iridium:iridium", and mgetty.config to be reflect this
with owner & group iridium and mode 0666.

3. A trace of /sbin/mgetty when the login process worked normally

4. A trace of /sbin/mgetty when the login locked up.

Hope the above makes sense.

Regards 

Duncan


Comment 8 Duncan Mercer 2005-11-04 13:21:35 UTC
Jason.

I've been looking at my mgetty problem again and I've found the following...

1. The fact that the permissions of the serial port are set to 0600 when passing
control to "login" is due to the explict call made by mgetty. See mgetty.c
line 1013.....	

/* set permissions to "rw-------" for login */
(void) chmod(devname, 0600);

2. If I comment out this line for testing purposes, the process still locks up,
but the serial port has more relaxed permissions when login starts.

3. If I strace the login attempt when this line is commented out. I see the
following in the log generated (this mirrors the scrap that you posted before)

21225 13:08:32 lstat64("/dev/ttyS0", {st_dev=makedev(0, 13), \
st_ino=872, st_mode=S_IFCHR|0664, st_nlink=1, st_uid=10, st_gid=14,
st_blksize=4096, st_blocks=0, st_rdev=makedev(4, 64),
st_atime=2005/11/04-13:08:32, st_mtime=2005/11/04-13:08:32,
st_ctime=2005/11/04-13:07:59}) = 0
21225 13:08:32 setpgid(0, 0)            = 0
21225 13:08:32 ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {c_iflags=0x106,
c_oflags=0x1805, c_cflags=0x800004be, c_lflags=0x3b, c_line=0,
c_cc="\x03\x1c\x7f\x15\x04\x00\x01\x00\x11\x13\x1a\x00\x12\x0f\x17\x16\x00\x00\x00"})
= 0
21225 13:08:32 chown32("/dev/ttyS0", 0, 0) = -1 EPERM (Operation not permitted)
21225 13:08:32 chmod("/dev/ttyS0", 0620) = -1 EPERM (Operation not permitted)
21

Comment 9 Duncan Mercer 2005-11-04 13:28:37 UTC
Created attachment 120726 [details]
Strace log when the permissions of the serial port are more relaxed

Comment 10 Duncan Mercer 2005-11-04 14:02:21 UTC
Please disregard comments 8 & 9, I've realised that the problem is a bit more 
involved than suggested in these latest two updates.

Duncan

Comment 11 Jason Vas Dias 2005-11-04 17:44:30 UTC
Yes, the problem is a bit more involved.

Here are some updated mgetty i386 RPMS to test:
   http://people.redhat.com/~jvdias/mgetty/FC-4/

This version of mgetty does NOT do the 
   /* set permissions to "rw-------" for login */
   (void) chmod(devname, 0600);
But does:
   /* Let users control the tty device access mode with with 'port-mode'!  
    */
   if( ! c_isset(port_mode) )
     (void) chmod(devname, 0600);

ie. only if you have not explicitly set port-mode in mgetty.config:
"
port-mode 0664
"
will the port mode be set to 0600 .

It also flushes the terminal buffers before handing over the terminal to
the login program, and sets the terminal as the new process' control terminal,
which it was not doing before - the login program was expected to set the 
terminal as the control terminal.

Please try this new mgetty version out and let me know of any issues with it.

Now that these login bugs are fixed:
  Bug #168434 â login's timeout can fail - needs to call siginterrupt(SIGALRM,1)
  Bug #168436 â login will attempt to run if it has no read/write access 
                to its terminal
which I found and raised during investigation of this bug, things are slightly
better with login, but login will still fail if it is not root and attempts to
to 'initgroups' for the new logged in user.

I understand that the process you're actually trying to get working
is not login, but a perl script that reads the terminal data .

I'd recommend that you do not use the real login program at all to test 
this bug .
 
Firstly, there are problems using strace at all with the login program, 
because when strace is used, and fd0 (stdin) is a terminal, strace takes
over as the process group leader of the terminal and login is unable to
set the terminal as its control terminal. 

This means that attempts by login to read the terminal will fail when run
under strace as a non-root user.

Secondly, login will always have problems when not run as root, and you are
running it as a non-root user - if you want to run login, then run it as 
root and let it do the setuid/setgid, not mgetty. 

So, if you still have problems with the new mgetty version, please try 
running the real perl script, perhaps under debug mode while logging to 
a file. 
If you'd send me a copy the real script that you want mgetty to run for
the modem connections, then I can try to duplicate the problem here - I've
been unable to do so so far.

Thanks & Regards,
Jason 

Comment 12 Duncan Mercer 2005-11-08 19:15:43 UTC
Hi Jason

I've tried the new version of mgetty and I'm
still seeing the same problems as before :-(

I have been using an "empty login" C program to 
test these problems further. 

i.e. (exit.c)

int main(void){
    printf("Hello World\n");
    exit(0);
}

and the corresponding entry in login.config
is set to.... 

exit* - - /etc/mgetty+sendfax/testing/exit 

In normal operation I see that attempts to login as "exit"
will succeed 10% of the time.

If I heavily load my system by leaving a delay loop 
running at high priority, at the same time as mgetty
 e.g. "nice -n-10 /testing/delay". 
I find that the login attempts will succeed 80% of the time!

my simple delay.c prog....

int main(void){
  volatile long test=0;
  for(;;)
   { test++; }
}


I have also used strace to follow what happens when my
"exit" code is launched. I have uploaded two logs, one
when things have worked ok. The other when the process has
locked up. The thing that I find very confusing is that both
these logs look to me the same. However what I saw when attempting
the dialin session was very different.

In the sessions that worked ok... 
1. I dial in and enter "exit" as my login ID
2. "Hello World" is returned
3. the call is terminated immediately
4. mgetty restarts

In the sessions that didn't work...
1. I dial in and enter "exit" as my login ID
2. There is no returned text
3. the call does not terminate untill about 60secs have passed
4. mgetty will then restart

In those sessions that don't work properly, if I print the timestamps
into the strace log. The timestamps appear to show that "exit" ran
correctly and was finished long before the call terminated. It's
allmost as though my "exit" program is not having its STDIN, STDOUT
properly connected to ttyS0 when it is launched.

Any suggestions on how I can probe any further with this?

Best regards

Duncan


Comment 13 Duncan Mercer 2005-11-08 19:20:35 UTC
Created attachment 120822 [details]
strace logs of the login:- one worked, one failed

Two strace log files following my empty login handler "exit.c"
one of the traces shows a successful login, the other a failed one.

Comment 14 Duncan Mercer 2005-11-10 13:04:54 UTC
Jason

I've found a problem with your updated mgetty....

If I use login.config to launch a login handler, with a non-root
UID eg iridium, then I get the following error in mgetty_log.

    user iridium denied rw access to /dev/ttyS0: Permission denied

This is despite the fact that I have. The following in mgetty.config

  port-owner uucp
  port-group uucp
  port-mode 0664

And that my user iridium is a member of the uucp group. 

Doing an ls -l on the port whilst mgetty is sitting at the
login prompt shows that the port mode / ownership has been 
set correctly to uucp.uucp 0664

Best regards

Duncan







Comment 15 Christian Iseli 2007-01-22 10:11:42 UTC
This report targets the FC3 or FC4 products, which have now been EOL'd.

Could you please check that it still applies to a current Fedora release, and
either update the target product or close it ?

Thanks.

Comment 16 Thomas Sprinkmeier 2007-05-17 02:53:46 UTC
I'm having similar symptoms on a vanilla FC5 system.

Once I get the system to talk to me again (i.e. get the local support guy to
reboot it for me) I'll  get the log/config files (assuming you want them?).

FWIW, the same system also suffered from Bug #64151.

Comment 17 petrosyan 2008-02-28 23:55:58 UTC
Fedora Core 5 is no longer maintained. Can anybody reproduce this bug on Fedora
7 or Fedora 8?

Comment 18 Bug Zapper 2008-04-04 02:01:24 UTC
Fedora apologizes that these issues have not been resolved yet. We're
sorry it's taken so long for your bug to be properly triaged and acted
on. We appreciate the time you took to report this issue and want to
make sure no important bugs slip through the cracks.

If you're currently running a version of Fedora Core between 1 and 6,
please note that Fedora no longer maintains these releases. We strongly
encourage you to upgrade to a current Fedora release. In order to
refocus our efforts as a project we are flagging all of the open bugs
for releases which are no longer maintained and closing them.
http://fedoraproject.org/wiki/LifeCycle/EOL

If this bug is still open against Fedora Core 1 through 6, thirty days
from now, it will be closed 'WONTFIX'. If you can reporduce this bug in
the latest Fedora version, please change to the respective version. If
you are unable to do this, please add a comment to this bug requesting
the change.

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we are following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

And if you'd like to join the bug triage team to help make things
better, check out http://fedoraproject.org/wiki/BugZappers

Comment 19 Bug Zapper 2008-05-06 15:31:19 UTC
This bug is open for a Fedora version that is no longer maintained and
will not be fixed by Fedora. Therefore we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen thus bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.