Bug 167830
Summary: | mgetty locks up when starting a dial in session:- probably kernel related | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Duncan Mercer <djlm> | ||||||||
Component: | mgetty | Assignee: | Martin Nagy <mnagy> | ||||||||
Status: | CLOSED WONTFIX | QA Contact: | |||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 5 | CC: | hripps, mbarabas, triage, ts_sb_1 | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | i686 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | bzcl34nup | ||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2008-05-06 15:31:20 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Duncan Mercer
2005-09-08 17:13:08 UTC
I have tested PPP login, serial console /bin/login, voice and FAX reception
with mgetty on FC3 and Rawhide and have not been able to duplicate this problem.
You say:
> calls to 'ps' show that the login prog has replaced mgetty but never starts
If the mgetty spawned process is named 'login' and its executable is
/bin/login, then login has started; it may be hung for some reason.
It would be most helpful if you could append your mgetty configuration and
log files from a machine where this problem was reproduced to this bug report:
# tar -cpf - /var/log/mgetty.log.* /etc/mgetty+sendfax | gzip >
/tmp/mgetty.logs.tar.gz
Please either upload this file to bugzilla or send it to jvdias -
I'll ensure its contents remain confidential if this is of concern to you.
It would also be most helpful if you could gather an strace log:
o in /etc/mgetty+sendfax/login.config, set the 'login_program' used by mgetty
for dial-in users to be a executable script such as:
'
#!/bin/bash
exec /usr/bin/strace -v -t -f \
-o /tmp/mgetty.login.strace.`/bin/date +'%s.%N'`.log \
/bin/login $*
'
This would tell us exactly where the login process is hanging.
Do you have SELinux enabled on the machines where the problem occurs ?
If so: - what is the output of 'audit2allow < /var/log/audit/audit.log' ?
- does the problem still occur with selinux=permissive ?
Created attachment 118792 [details]
Log file snippet showing the lockup
The attached logfile snippet shows the last attempted login that locked up.
I was calling a login script that called strace as you suggested. I've attached
the output from that along with my mgetty config under a seperate email.
The log file shows the main "oddity" that I see in the logs after a lockup,
that is a whole series of spurious chars reported while waiting for line to
clear. I never see this when I have a login that works.
Hi Jason
Thanks for your swift response to my bug report...
Answers to your questions.
1. SELinux currently set to disabled
2. I've attached a file Mgetty.Log.Example, this shows the log output I get
when I have the system lockup.
3. sent seperately an strace output as requested.
4. sent seperately a zipped log/config file tar as requested.
5. You said
>If the mgetty spawned process is named 'login' and its executable is
>/bin/login, then login has started; it may be hung for some reason.
I've tried using other "empty" scripts and compiled 'c' progs to handle the
login all which show the same behaviour.
E.G. 'c prog'
int main(void){
return 0;
}
I see the problem with my own login scripts and with the default
/bin/login
I would be most greatful if you could shed some light on this problem.
Hi Duncan -
You said:
> 3. sent seperately an strace output as requested.
> 4. sent seperately a zipped log/config file tar as requested.
Sorry, but I still have not received them - did you send them to
'jvdias' ? Please try resending them.
These would be most useful in trying to reproduce this problem,
which I have so far been unable to do.
Thanks,
Jason.
Jason I've now sent two additional copies of my mail with the log attachments to jvdias. I haven't had any of these three mails returned to me so hopefully you will get at least one of them. Fingers crossed. If none of these work then let me know and I can edit the relevant files to remove any "sensitive" information and post them here. Duncan Hi Duncan - The mgetty.login.strace.log you sent shows that login does not have correct permissions for the /dev/ttyS0 device: 4964 10:59:53 lstat64("/dev/ttyS0", \ { st_dev=makedev(0, 13), st_ino=1258, \ st_mode=S_IFCHR|0600, st_nlink=1, \ st_uid=10, st_gid=14, \ st_blksize=4096, st_blocks=0, \ st_rdev=makedev(4, 64), \ st_atime=2005/09/14-10:59:53, \ st_mtime=2005/09/14-10:59:53, \ st_ctime=2005/09/14-10:59:47}) = 0 So the /dev/ttyS0 file is owned by uid 10 (uucp on my system), group uucp, and has mode "0600", which is read+write by owner only. mgetty runs as root, as shown by the uid of the mgetty.log.ttyS0 file in your mgetty.logs.tar.gz file: -rw-r--r-- root/root 63125 2005-09-14 05:42:23 var/log/mgetty.log.ttyS0 mgetty reads the user name, "iridium" in your case, and will setuid to "iridium" in your case, as shown in the /var/log/mgetty.log.ttyS0 file: 09/13 02:50:58 yS0 login: user id: iridium (uid 513, gid 513) login is execve'd with userid (uid 513, gid 513), and the strace shows it is not allowed to change the mode of /dev/ttyS0, which only permits the (uucp,uucp) userid to read/write the device: 4964 10:59:53 chown32("/dev/ttyS0", 0, 0) = -1 EPERM (Operation not permitted) 4964 10:59:53 chmod("/dev/ttyS0", 0620) = -1 EPERM (Operation not permitted) login then engages in the most unhelpful behaviour of infinitely trying to read the device, which will fail because it does not have permission to read it. To fix: Uncomment these lines in /etc/mgetty+sendfax/mgetty.config: " port-owner uucp port-group uucp port-mode 0664 " And do: # chmod 0664 /dev/ttyS0 Or, if the "iridium" user is the ONLY user which logs in with mgetty over /dev/ttyS0, then change the ownership of /dev/ttyS0 to iridium:iridium. Ensure the "iridium" user is a member of the "uucp" if /dev/ttyS0 is owned by uucp:uucp. Ensure mgetty is run as root, eg. by including this line in /etc/inittab: " S0:5:respawn:/sbin/mgetty ttyS0 ". Then you should have no further problems - if you do, please let me know. Jason I've tried as you suggested and made sure that the file permissions etc were set as you suggested and I still see the same problem. When I look at the strace file for the login trace it see that, as you showed above that the permissions on /dev/ttyS0 are 600 when login starts despite mgetty having set them previously. If I examine the permissions on the /dev/ttyS0 device when mgetty is started but BEFORE I start to dial into the system then the permissions are set as dictated by the mgetty.config file (eg 0664 or 0666). However as soon as mgetty picks up an incomming call the permissions on the device (as reported by 'll') change to 0600. This happens as soon as the login prompt is displayed, I.E. before mgetty has spawned the login process and terminated it's self. A couple of other things that suggest that the problem is more fundamental are things I've seen before when attempting to get to the bottom of this problem. 1. The lockup is not consistent. Under FC2 the first dial in session after a period of inactivity of ~ 1Hr would be handled correctly, thereafter the login attempt would hang. Under FC4 I'm seeing the lockup 90% of the time. 2. In an attempt to get to the bottom of the problem under FC2 I attempted to use strace to follow /bin/mgetty (not /bin/login) during a failed login. I found that 100% of the time I used strace then the login would succeed, if however I did not use strace it would fail. This was one of the things that lead me to conclude that the problem related to a kernel race condition. I've tried running strace against mgetty during the login number of times under FC4 but haven't yet establised a pattern as to when it works and when it doesn't. I've put a tar file on my webserver with a number of different strace outputs in. I've include 1. trace of the /bin/login prog when mgetty.config was set to set group and owner to uucp and modem 0664 (iridium being in the uucp group) 2. trace of /bin/login prog when I've changed /dev/ttyS0 permissions using chmod etc to be "0666" owner "iridium:iridium", and mgetty.config to be reflect this with owner & group iridium and mode 0666. 3. A trace of /sbin/mgetty when the login process worked normally 4. A trace of /sbin/mgetty when the login locked up. Hope the above makes sense. Regards Duncan Jason. I've been looking at my mgetty problem again and I've found the following... 1. The fact that the permissions of the serial port are set to 0600 when passing control to "login" is due to the explict call made by mgetty. See mgetty.c line 1013..... /* set permissions to "rw-------" for login */ (void) chmod(devname, 0600); 2. If I comment out this line for testing purposes, the process still locks up, but the serial port has more relaxed permissions when login starts. 3. If I strace the login attempt when this line is commented out. I see the following in the log generated (this mirrors the scrap that you posted before) 21225 13:08:32 lstat64("/dev/ttyS0", {st_dev=makedev(0, 13), \ st_ino=872, st_mode=S_IFCHR|0664, st_nlink=1, st_uid=10, st_gid=14, st_blksize=4096, st_blocks=0, st_rdev=makedev(4, 64), st_atime=2005/11/04-13:08:32, st_mtime=2005/11/04-13:08:32, st_ctime=2005/11/04-13:07:59}) = 0 21225 13:08:32 setpgid(0, 0) = 0 21225 13:08:32 ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {c_iflags=0x106, c_oflags=0x1805, c_cflags=0x800004be, c_lflags=0x3b, c_line=0, c_cc="\x03\x1c\x7f\x15\x04\x00\x01\x00\x11\x13\x1a\x00\x12\x0f\x17\x16\x00\x00\x00"}) = 0 21225 13:08:32 chown32("/dev/ttyS0", 0, 0) = -1 EPERM (Operation not permitted) 21225 13:08:32 chmod("/dev/ttyS0", 0620) = -1 EPERM (Operation not permitted) 21 Created attachment 120726 [details]
Strace log when the permissions of the serial port are more relaxed
Please disregard comments 8 & 9, I've realised that the problem is a bit more involved than suggested in these latest two updates. Duncan Yes, the problem is a bit more involved. Here are some updated mgetty i386 RPMS to test: http://people.redhat.com/~jvdias/mgetty/FC-4/ This version of mgetty does NOT do the /* set permissions to "rw-------" for login */ (void) chmod(devname, 0600); But does: /* Let users control the tty device access mode with with 'port-mode'! */ if( ! c_isset(port_mode) ) (void) chmod(devname, 0600); ie. only if you have not explicitly set port-mode in mgetty.config: " port-mode 0664 " will the port mode be set to 0600 . It also flushes the terminal buffers before handing over the terminal to the login program, and sets the terminal as the new process' control terminal, which it was not doing before - the login program was expected to set the terminal as the control terminal. Please try this new mgetty version out and let me know of any issues with it. Now that these login bugs are fixed: Bug #168434 â login's timeout can fail - needs to call siginterrupt(SIGALRM,1) Bug #168436 â login will attempt to run if it has no read/write access to its terminal which I found and raised during investigation of this bug, things are slightly better with login, but login will still fail if it is not root and attempts to to 'initgroups' for the new logged in user. I understand that the process you're actually trying to get working is not login, but a perl script that reads the terminal data . I'd recommend that you do not use the real login program at all to test this bug . Firstly, there are problems using strace at all with the login program, because when strace is used, and fd0 (stdin) is a terminal, strace takes over as the process group leader of the terminal and login is unable to set the terminal as its control terminal. This means that attempts by login to read the terminal will fail when run under strace as a non-root user. Secondly, login will always have problems when not run as root, and you are running it as a non-root user - if you want to run login, then run it as root and let it do the setuid/setgid, not mgetty. So, if you still have problems with the new mgetty version, please try running the real perl script, perhaps under debug mode while logging to a file. If you'd send me a copy the real script that you want mgetty to run for the modem connections, then I can try to duplicate the problem here - I've been unable to do so so far. Thanks & Regards, Jason Hi Jason I've tried the new version of mgetty and I'm still seeing the same problems as before :-( I have been using an "empty login" C program to test these problems further. i.e. (exit.c) int main(void){ printf("Hello World\n"); exit(0); } and the corresponding entry in login.config is set to.... exit* - - /etc/mgetty+sendfax/testing/exit In normal operation I see that attempts to login as "exit" will succeed 10% of the time. If I heavily load my system by leaving a delay loop running at high priority, at the same time as mgetty e.g. "nice -n-10 /testing/delay". I find that the login attempts will succeed 80% of the time! my simple delay.c prog.... int main(void){ volatile long test=0; for(;;) { test++; } } I have also used strace to follow what happens when my "exit" code is launched. I have uploaded two logs, one when things have worked ok. The other when the process has locked up. The thing that I find very confusing is that both these logs look to me the same. However what I saw when attempting the dialin session was very different. In the sessions that worked ok... 1. I dial in and enter "exit" as my login ID 2. "Hello World" is returned 3. the call is terminated immediately 4. mgetty restarts In the sessions that didn't work... 1. I dial in and enter "exit" as my login ID 2. There is no returned text 3. the call does not terminate untill about 60secs have passed 4. mgetty will then restart In those sessions that don't work properly, if I print the timestamps into the strace log. The timestamps appear to show that "exit" ran correctly and was finished long before the call terminated. It's allmost as though my "exit" program is not having its STDIN, STDOUT properly connected to ttyS0 when it is launched. Any suggestions on how I can probe any further with this? Best regards Duncan Created attachment 120822 [details]
strace logs of the login:- one worked, one failed
Two strace log files following my empty login handler "exit.c"
one of the traces shows a successful login, the other a failed one.
Jason I've found a problem with your updated mgetty.... If I use login.config to launch a login handler, with a non-root UID eg iridium, then I get the following error in mgetty_log. user iridium denied rw access to /dev/ttyS0: Permission denied This is despite the fact that I have. The following in mgetty.config port-owner uucp port-group uucp port-mode 0664 And that my user iridium is a member of the uucp group. Doing an ls -l on the port whilst mgetty is sitting at the login prompt shows that the port mode / ownership has been set correctly to uucp.uucp 0664 Best regards Duncan This report targets the FC3 or FC4 products, which have now been EOL'd. Could you please check that it still applies to a current Fedora release, and either update the target product or close it ? Thanks. I'm having similar symptoms on a vanilla FC5 system. Once I get the system to talk to me again (i.e. get the local support guy to reboot it for me) I'll get the log/config files (assuming you want them?). FWIW, the same system also suffered from Bug #64151. Fedora Core 5 is no longer maintained. Can anybody reproduce this bug on Fedora 7 or Fedora 8? Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers This bug is open for a Fedora version that is no longer maintained and will not be fixed by Fedora. Therefore we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen thus bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |