Bug 243717 - Autofs hangs on some mountpoints
Autofs hangs on some mountpoints
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: autofs (Show other bugs)
5.0
All Linux
medium Severity high
: ---
: ---
Assigned To: Ian Kent
Brock Organ
:
Depends On: 241780
Blocks:
  Show dependency treegraph
 
Reported: 2007-06-11 10:52 EDT by Ian Kent
Modified: 2007-11-30 17:07 EST (History)
2 users (show)

See Also:
Fixed In Version: RHBA-2007-0621
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-07 12:56:17 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to resolve deadlock in alarm handler (1.39 KB, patch)
2007-06-11 10:52 EDT, Ian Kent
no flags Details | Diff

  None (edit)
Description Ian Kent 2007-06-11 10:52:21 EDT
+++ This bug was initially created as a clone of Bug #241780 +++

Description of problem:

After upgrading to autofs-5.0.1-0.rc3.29 (and lots of other packeges as
well:-(), our systems tend to stop automounting some of the mountpoints
specified by NIS-maps after some variable time (hours - several days)

Version-Release number of selected component (if applicable):

autofs-5.0.1-0.rc3.29

How reproducible:

Often 

Steps to Reproduce:
1. Use system for some time in such a way that automounting is triggered.
  
Actual results:

Automounter stops responding for some mountpoints

Expected results:

Automounting responding.


Additional info:

-- Additional comment from anders.blomdell@control.lth.se on 2007-05-30 11:31 EST --
Created an attachment (id=155705)
Possibly relevant 'ps -eLf', 'gdb' and 'strace' data


-- Additional comment from ikent@redhat.com on 2007-05-30 22:53 EST --
Kernel version?
Could you post your maps?
Could we have a debug log as well please?

Ian


-- Additional comment from anders.blomdell@control.lth.se on 2007-05-31 03:11 EST --
Created an attachment (id=155783)
/proc/kallsyms


-- Additional comment from anders.blomdell@control.lth.se on 2007-05-31 03:12 EST --
On this particular machine (my development machine):

2.6.21-rc7 

Problem most often has manifested itself on:

2.6.20-1.2948.fc6

I have enclosed /proc/kallsyms

What do you mean by a debug log (remember, the problem is not easily
reproducible, and I would hate to destroy a specimen of it that I can keep alive
as long as I see fit).

-- Additional comment from ikent@redhat.com on 2007-05-31 05:10 EST --
(In reply to comment #4)
> On this particular machine (my development machine):
> 
> 2.6.21-rc7 

Oh really, and you've seen it using this kernel?

> 
> Problem most often has manifested itself on:
> 
> 2.6.20-1.2948.fc6
> 
> I have enclosed /proc/kallsyms
> 
> What do you mean by a debug log (remember, the problem is not easily
> reproducible, and I would hate to destroy a specimen of it that I can keep alive
> as long as I see fit).

Yep, I know it a big ask and setting debug logging will most
likely prevent the bug from occurring but, if you have enough
disk space, give it a try if you can.

http://people.redhat.com/jmoyer describes what needs to be set.

btw, autofs-5.0.1-0.rc3.30 should be on it's way to the testing
directory on the mirrors. I'm not sure this will help but please
give it a try when it arrives. Or grab the source rpm from Rawhide
or F7, build it and give that a try.

Ian




-- Additional comment from anders.blomdell@control.lth.se on 2007-05-31 06:51 EST --
Created an attachment (id=155790)
strace of good mount

I'm still dodging the 'restart automounter' request, since I want to gather as
much info from the current hang (machine has been up for 35 days, automounter
for 18 days before the bug manifested itself). Usually this bug hits various
desktop machines for users who don't like me to debug their machines for hours
:-(.

So I have submitted 2 strace's, one for a request that works (strace.ok) and
one for a failed request (strace.bad). Since the strace of the failed request
is totally silent, it seems like there is a pending mount request somewhere
deep down in the kernel that is stopping the request to reach the automounter.

-- Additional comment from anders.blomdell@control.lth.se on 2007-05-31 06:52 EST --
Created an attachment (id=155792)
strace of failed mount


-- Additional comment from anders.blomdell@control.lth.se on 2007-05-31 07:22 EST --
Another point of possible interest (assuming it's a kernel bug):

An affected system stops to auto-unmount unused filesystems. I'll continue to
try to understand where that behaviour intersects with a stuck mount.


-- Additional comment from ikent@redhat.com on 2007-05-31 08:15 EST --
(In reply to comment #6)
> Created an attachment (id=155790) [edit]
> strace of good mount
> 
> I'm still dodging the 'restart automounter' request, since I want to gather as
> much info from the current hang (machine has been up for 35 days, automounter
> for 18 days before the bug manifested itself). Usually this bug hits various
> desktop machines for users who don't like me to debug their machines for hours
> :-(.

We probably won't get anything more that's useful from the
broken machine, thanks for getting the stuff so far.

We need to record the kernel version of machines that show
this problem. We need to confirm whether you see it on
machines with 2.6.21 or not. A patch for a race between
mount and expire is included in it. It may not be the
problem you're seeing but it seems to have come up a
few times lately (actually suddenly, I don't know why,
it was really hard to reproduce when we were degugging
it). The patch was present from 2.6.21-rc1.

> 
> So I have submitted 2 strace's, one for a request that works (strace.ok) and
> one for a failed request (strace.bad). Since the strace of the failed request
> is totally silent, it seems like there is a pending mount request somewhere
> deep down in the kernel that is stopping the request to reach the automounter.

Not sure if we got any extra evidence with these.
The symptoms of the race above are that you don't see mount
requests coming in when there should be. The straces do appear
to indicate that but this should only happen with specific
configurations and specific usage patterns. In some cases it
will recover several seconds after it happens (maybe 10) and
just start working again. OTOH the gdb trace looks like we
have a new problem, a deadlock, which would have symptoms of
stopping any further mounts to that mount point and possibly
(it looks probable from the gdb outputy) other mount points
as well.

The package I mentioned above (0.rc3.30) has some significant
changes to the map re-reading and identification of changed
maps. The gdb trace shows a map re-read happening and that's
why I wanted you to use that for further testing. I'd hate for
us to spend heaps of time solving something that's fixed already.

btw, a sysrq-t log dump is often useful.

Lastly, if this is a new race then things get more difficult.
Installing the corresponding autofs-debuginfo package so we can
get slightly more useful gdb backtrace info would be good. I'll
dig up my instructions for what to get. Basically, list the
threads and then a stack trace (bt) for threads that don't look
as they should (much like what you did already). Essentially
igoring anything waiting in poll and a couple of other worker
threads (if they look ok).

What's really is important though is your setup, maps, nsswitch
and particularly usage patterns. Any of this info is really
useful.

So bummer, having this bug, sorry.
Ian

-- Additional comment from ikent@redhat.com on 2007-05-31 08:21 EST --
(In reply to comment #8)
> Another point of possible interest (assuming it's a kernel bug):
> 
> An affected system stops to auto-unmount unused filesystems. I'll continue to
> try to understand where that behaviour intersects with a stuck mount.
> 

Mmm .. I hadn't noticed that with the race I mentioned
above but we were more interested in fixing the dead mount
problem so it could be related to the same problem. But lets
not make assumptions, the gdb output looks like a pthreads
deadlock in the user space daemon.

Ian


Ian


-- Additional comment from anders.blomdell@control.lth.se on 2007-05-31 10:54 EST --
Ahh, it was the automount maps (silly me)

# Master map for automounter
/home   auto.home --ghost
/work   auto.work --ghost
/auto.direct  /etc/auto.direct


# /etc/nsswitch.conf
passwd:     files nis
shadow:     files nis
group:      files nis
hosts:      files dns nis
ethers:     files nis
netmasks:   files 
networks:   files
protocols:  files nis
rpc:        files
services:   files nis
netgroup:   files nis
automount:  files nis
aliases:    files

Some more stray thoughts:

Restarting automounter makes problem go away (kill -9 required). This indicates
that the kernel is waiting for the stuck mount, blocking further requests to
that mountpoint to succed (back to pointing fingers at automount I guess).

If I understand you correctly, the problems (probably) resides in automount,
since 2.6.21-rc7 [SMP PREEMPT] exhibits the problem. I slowly bring all our
computers to use autofs-5.0.1-0.rc3.30.

-- Additional comment from ikent@redhat.com on 2007-05-31 11:10 EST --
(In reply to comment #11)
> Ahh, it was the automount maps (silly me)
> 
> # Master map for automounter
> /home   auto.home --ghost
> /work   auto.work --ghost
> /auto.direct  /etc/auto.direct

Are the maps themselves simple maps, ie no multi-mount,
submount entries or "+" map inclusion?

> 
> 
> # /etc/nsswitch.conf
> passwd:     files nis
> shadow:     files nis
> group:      files nis
> hosts:      files dns nis
> ethers:     files nis
> netmasks:   files 
> networks:   files
> protocols:  files nis
> rpc:        files
> services:   files nis
> netgroup:   files nis
> automount:  files nis
> aliases:    files

All normal.

> 
> Some more stray thoughts:
> 
> Restarting automounter makes problem go away (kill -9 required). This indicates
> that the kernel is waiting for the stuck mount, blocking further requests to
> that mountpoint to succed (back to pointing fingers at automount I guess).
> 
> If I understand you correctly, the problems (probably) resides in automount,
> since 2.6.21-rc7 [SMP PREEMPT] exhibits the problem. I slowly bring all our
> computers to use autofs-5.0.1-0.rc3.30.

Yep, there are no other known problems with the kernel module.
I'm keen to see how the update goes.
It's been pushed so it should find its way to the testing
directory on the mirrors soonish.

Ian




-- Additional comment from ikent@redhat.com on 2007-05-31 11:15 EST --
(In reply to comment #11)
> Restarting automounter makes problem go away (kill -9 required). This indicates
> that the kernel is waiting for the stuck mount, blocking further requests to
> that mountpoint to succed (back to pointing fingers at automount I guess).

OK, that sounds more like a daemon problem, yes.

> 
> If I understand you correctly, the problems (probably) resides in automount,
> since 2.6.21-rc7 [SMP PREEMPT] exhibits the problem. I slowly bring all our
> computers to use autofs-5.0.1-0.rc3.30.

It would be good to understand the workload better.
The best way for me to work out what's wrong is to be able
to duplicate it. So understanding what's wrong is the first
step. I've got a dual core notebook here so I may be able to
simulate the workload and just leave it running. That also
involves using maps similar to what you have in the same way.

Ian


-- Additional comment from anders.blomdell@control.lth.se on 2007-05-31 12:07 EST --
> Are the maps themselves simple maps, ie no multi-mount,
> submount entries or "+" map inclusion?
Yes, mainly peoples /home/ directories (coing from 2 different servers) and
/work directories residing on peoples workstations (around 50), and usually
mounted locally, but accessible from all machines.

> Yep, there are no other known problems with the kernel module.
> I'm keen to see how the update goes.
> It's been pushed so it should find its way to the testing
> directory on the mirrors soonish.
Installed on my development machine :-)

> It would be good to understand the workload better.
Basically random mounts with a huge preference to one /home and one /work
directory per machine.

I have started the following script on the upgraded and one other machine (I
will report back if I get any interesting results):

#!/usr/bin/python

import nis
import os
import random
import time

if __name__ == "__main__":
    HOME = nis.cat("auto.home").keys()
    WORK = nis.cat("auto.work").keys()
    while True:
        dir = random.choice(["/home/%s/." % random.choice(HOME) ,
                             "/work/%s/." % random.choice(WORK)])
        try:
            print dir, os.stat(dir).st_mtime
        except OSError, e:
            print e

        time.sleep(30)





-- Additional comment from anders.blomdell@control.lth.se on 2007-06-04 04:47 EST --
Created an attachment (id=156053)
gdb + strace info of rc3.30 hung machine

Conclusive answer was reached after this weekends stresstest: 

6 out of 39 computers running autofs-5.0.1-0.rc3.30 are stuck!


-- Additional comment from anders.blomdell@control.lth.se on 2007-06-04 08:22 EST --
Created an attachment (id=156069)
Better gdb trace (with symbols)

Finally managed to get my mands on the debugs symbols (despite the
recent release of Fedora7)

Deadlock:

Thread 11:
  alarm_handler(void *arg):
    alarm_lock()
    state_mutex_lock(ap)

Thread 5:
  expire_cleanup(void *arg)
    state_mutex_lock(ap);
    alarm_add(struct autofs_point *ap, time_t seconds):
      alarm_lock()

And the following lock chain evolves:

  11 <-+- 4
  ^    |
  |    +- 6 <-+- 2
  v	      |
  5	      +- 3 <- 1


To be honest, there is a lot of locking going on, and it seems (to me)
like one big lock and condwaits could do the job, if that is not the case, 
a strict locking hierarchy should be observed!


-- Additional comment from ikent@redhat.com on 2007-06-04 09:34 EST --
(In reply to comment #16)
> Created an attachment (id=156069) [edit]
> Better gdb trace (with symbols)
> 
> Finally managed to get my mands on the debugs symbols (despite the
> recent release of Fedora7)
> 
> Deadlock:
> 
> Thread 11:
>   alarm_handler(void *arg):
>     alarm_lock()
>     state_mutex_lock(ap)
> 
> Thread 5:
>   expire_cleanup(void *arg)
>     state_mutex_lock(ap);
>     alarm_add(struct autofs_point *ap, time_t seconds):
>       alarm_lock()

Mmm .. I thought I got rid of that out of order locking.
Thanks for the detective work, I'll go through the code
and see what I can find. I'm not sure why I thought I
needed to add the alarm under the state mutex.

> 
> And the following lock chain evolves:
> 
>   11 <-+- 4
>   ^    |
>   |    +- 6 <-+- 2
>   v	      |
>   5	      +- 3 <- 1
> 
> 
> To be honest, there is a lot of locking going on, and it seems (to me)
> like one big lock and condwaits could do the job, if that is not the case, 
> a strict locking hierarchy should be observed!

True, it is to complex and there is plenty of room for
improvement. It may be possible to use that approach,
I certainly thought about it and started down that path
but it rapidly got very difficult, perhaps in the next
version when what we have settles down and fewer changes
are being made.

Ian




-- Additional comment from anders.blomdell@control.lth.se on 2007-06-04 09:48 EST --
Please send the result (patch or similar) as soon as you have something to test,
and I'll put it under stress...


-- Additional comment from anders.blomdell@control.lth.se on 2007-06-04 11:00 EST --
As an extra layer of paranoia, how about adding something like this (to detect
locks that are not granted within TIMEOUT seconds)? If you think it's a good
idea, I could create an actual patch with something that even might compile :-)


static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

struct watch_queue {
  struct list_head list;
  pthread_mutex_t *mutex;
  pthread_t thread;
  time_t time
};
LIST_HEAD(watch);

/* Use this to monitor deadlocks */
int watch_pthread_mutex_lock(pthread_mutex_t *m)
{
  int result;

  struct watch_queue w;
  w.mutex = m;
  w.time = time();
  w.thread = pthread_self();

  pthread_mutex_lock(mutex);
  list_add(&w->list, &watch);
  pthread_mutex_unlock(mutex);

  result = pthread_mutex_lock(m);

  pthread_mutex_lock(mutex);
  list_delete(&w->list);
  pthread_mutex_unlock(mutex);

  return result;
}

/* To be run in a separate thread */
void watch() {
  while (true) {
     struct list_head *p;
     sleep(10);

    time_t now = time();
    pthread_mutex_lock(mutex);
    list_for_each(p, &watch) {
      struct watch w = list_entry(p, struct watch, list);
      if (now - w->time > TIMEOUT) {
        report_or_die();
      }
    }
    pthread_mutex_unlock(mutex);
  }
}

-- Additional comment from ikent@redhat.com on 2007-06-04 11:08 EST --
(In reply to comment #19)
> As an extra layer of paranoia, how about adding something like this (to detect
> locks that are not granted within TIMEOUT seconds)? If you think it's a good
> idea, I could create an actual patch with something that even might compile :-)

Ha .. thanks, good idea but I'd really rather not add to the
complexity just now.

I've had a quick look and I have a simple patch, I haven't
even checked to see if it compiles. I do want to be able
to add alarms while holding the state mutex, at least for
now.

Bearing in mind that I still need to think more about this
would you be willing to give the patch a try anyway. I'll
check it compiles and give it a quick run against the
autofs connectathon suite first.

Ian


-- Additional comment from anders.blomdell@control.lth.se on 2007-06-04 11:17 EST --
Of course, but I'll be gone for today within 35 minutes, so it will probably be
tomorrow!

-- Additional comment from ikent@redhat.com on 2007-06-04 11:23 EST --
(In reply to comment #21)
> Of course, but I'll be gone for today within 35 minutes, so it will probably be
> tomorrow!

And I'm out much of tomorrow.
I'll post the patch after testing and catch up tomorrow
afternoon (probably near your lunch time).

Ian


-- Additional comment from anders.blomdell@control.lth.se on 2007-06-04 11:32 EST --
Probably not, I'm on CET (GMT + 1), but since I don't know your timezone, what
do I know...

-- Additional comment from ikent@redhat.com on 2007-06-04 11:52 EST --
(In reply to comment #23)
> Probably not, I'm on CET (GMT + 1), but since I don't know your timezone, what
> do I know...

WST (GMT + 8).

-- Additional comment from ikent@redhat.com on 2007-06-04 11:58 EST --
Created an attachment (id=156089)
Test patch for the alarm deadlock problem

Can you give this patch a try when you get a chance please.
I've ru it against the autofs connectathon test suite but
with only a single cpu machine.

-- Additional comment from anders.blomdell@control.lth.se on 2007-06-05 10:59 EST --
Created an attachment (id=156232)
.tar file of various pieces to show presence of [possible] deadlock

I feel confident that it will deadlock (eventually). This is based on actual
traces of a short execution on my machine, and what I hope is a correct
analysis of the output of that trace. The following output from the
autofs_analyze tool on the automount.log file illustrates my point (last entry
in analyze.output, which is generated by './autofs_analyze automount.log'):

['0x8002fa68', '0x8004c2a4', '0x8002f740', '0x8004c2a4']
0xb7e886c0 :
   0x8002fa68			      0x8004c2a4			 
   &master_mutex		       &ap->state_mutex 		 
   master.c:1149 master_mount_mounts   master.c:1192 master_mount_mounts 
0xb7e87b90 :
   0x8002f740			      0x8004c2a4			 
   &mutex			       &ap->state_mutex 		 
   state.c:899 st_queue_handler        state.c:819 run_state_task	 
0xb7ab6b90 :
   0x8004c2a4			      0x8002f740			 
   &ap->state_mutex		       &mutex				 
   state.c:108 expire_cleanup	       state.c:876 st_set_done	   

The deadlock will occur beacuse tasks 0xb7e87b90 and 0xb7ab6b90 takes the
mutexes in reverse order. There are NO shortcuts to a proper lock ordering
(thats my experience after more than 20 years of realtime programming
experience), but it can take years for it to actually come and bite you...

It might be that some of the possible deadlocks are impossible due to lifetime
of tasks, but I wouldn't bet on it...


-- Additional comment from ikent@redhat.com on 2007-06-05 11:44 EST --
(In reply to comment #26)
> Created an attachment (id=156232) [edit]
> .tar file of various pieces to show presence of [possible] deadlock
> 
> I feel confident that it will deadlock (eventually). This is based on actual
> traces of a short execution on my machine, and what I hope is a correct
> analysis of the output of that trace. The following output from the
> autofs_analyze tool on the automount.log file illustrates my point (last entry
> in analyze.output, which is generated by './autofs_analyze automount.log'):
> 
> ['0x8002fa68', '0x8004c2a4', '0x8002f740', '0x8004c2a4']
> 0xb7e886c0 :
>    0x8002fa68			      0x8004c2a4			 
>    &master_mutex		       &ap->state_mutex 		 
>    master.c:1149 master_mount_mounts   master.c:1192 master_mount_mounts 
> 0xb7e87b90 :
>    0x8002f740			      0x8004c2a4			 
>    &mutex			       &ap->state_mutex 		 
>    state.c:899 st_queue_handler        state.c:819 run_state_task	 
> 0xb7ab6b90 :
>    0x8004c2a4			      0x8002f740			 
>    &ap->state_mutex		       &mutex				 
>    state.c:108 expire_cleanup	       state.c:876 st_set_done	   
> 
> The deadlock will occur beacuse tasks 0xb7e87b90 and 0xb7ab6b90 takes the
> mutexes in reverse order. There are NO shortcuts to a proper lock ordering
> (thats my experience after more than 20 years of realtime programming
> experience), but it can take years for it to actually come and bite you...
> 
> It might be that some of the possible deadlocks are impossible due to lifetime
> of tasks, but I wouldn't bet on it...
> 

Good work Anders, give me a bit to work through this but,
on the face of it, it looks like your analyzer is a smart
little cookie!

Ian




-- Additional comment from anders.blomdell@control.lth.se on 2007-06-05 11:57 EST --
Created an attachment (id=156238)
lock reorder patch

Not smart enough :-(, with the attached patch I get a lot of warnings like:

['0x8004b6a4', '0x8002f660', '0x8002fa80', '0x8004c2a4', '0x8002f660']
0xb7df9b90 :
   0x8004b6a4			   0x8002f660			    0x8002fa80 
			      
   &ap->state_mutex		    &suc.mutex			    
&instance_mutex 			 
   automount.c:1165 handle_mounts   automount.c:1167 handle_mounts  
master.c:353 master_find_source_instance 
0xb7b64b90 :
   0x8004c2a4			   0x8002f660			    0x8002fa80 
			      
   &ap->state_mutex		    &suc.mutex			    
&instance_mutex 			 
   automount.c:1165 handle_mounts   automount.c:1167 handle_mounts  
master.c:353 master_find_source_instance 

Which I think are benign (even if I feel a bit queasy about them).
But I don't know if it is correct to temporarily drop the ap_lock and then
retake it in st_set_done (but this time after st_mutex_lock has been taken)

-- Additional comment from anders.blomdell@control.lth.se on 2007-06-05 12:31 EST --
Created an attachment (id=156240)
new autofs_analyze

OK, modified autofs_analyze tool. It reports the obvious deadlock problems and
is quiet with the ones I think are benign. This really need to be cleaned up
and scrutinized if we should rely on it...

-- Additional comment from ikent@redhat.com on 2007-06-05 12:50 EST --
(In reply to comment #28)
> Created an attachment (id=156238) [edit]
> lock reorder patch
> 
> Not smart enough :-(, with the attached patch I get a lot of warnings like:
> 
> ['0x8004b6a4', '0x8002f660', '0x8002fa80', '0x8004c2a4', '0x8002f660']
> 0xb7df9b90 :
>    0x8004b6a4			   0x8002f660			    0x8002fa80 
> 			      
>    &ap->state_mutex		    &suc.mutex			    
> &instance_mutex 			 
>    automount.c:1165 handle_mounts   automount.c:1167 handle_mounts  
> master.c:353 master_find_source_instance 
> 0xb7b64b90 :
>    0x8004c2a4			   0x8002f660			    0x8002fa80 
> 			      
>    &ap->state_mutex		    &suc.mutex			    
> &instance_mutex 			 
>    automount.c:1165 handle_mounts   automount.c:1167 handle_mounts  
> master.c:353 master_find_source_instance 
> 
> Which I think are benign (even if I feel a bit queasy about them).
> But I don't know if it is correct to temporarily drop the ap_lock and then
> retake it in st_set_done (but this time after st_mutex_lock has been taken)


OK, I'll have a look there as well.

The other trace is quite interesting but I'm not sure that
deadlock is possible. The only thing I could see (assuming
output from the analyzer is correct, and it looks to be) is
a suggested deadlock between the st_queue_handler->st_run_task
and expire_cleanup->st_set_done. For a given ap (struct 
autofs_point, one for each mount) the st_run_task is run only
when a new task is present and doesn't dequeue another until that
one has finished, by calling st_set_done. At least that's the
way it's supposed to work. OTOH this assumption may be violated
at shutdown so I'll go through that and check. OTOOH I think
there really isn't any need to call st_task_done from under
the ap state_mutex so I'm going to move it out anyway.

So I guess what I'm saying is that maybe the analyzer picked
invocations of st_run_task and expire_cleanup that each
correspond to a different ap and hence different state_mutex.

Thanks for your effort, I appreciate it very much.
btw, I have no idea how you produced the automount.log, what
did you use for that?

Ian

-- Additional comment from anders.blomdell@control.lth.se on 2007-06-05 13:03 EST --
> The other trace is quite interesting but I'm not sure that
> deadlock is possible. 
Neither am I, but the risk is that some very fast multiprocessor manages to
stumble over it :-)

> btw, I have no idea how you produced the automount.log, what
> did you use for that?
The hack.patch from the earlier .tar file (don't tell anyone I'm responsible for
that one :-) )

Good that you pointed to shutdown as an important thing to check, a shutdown
that exposed the following possible problems:

['0x8002fa68', '0x8002f660', '0x8004c2a4', '0x8002fa68']
0xb7e0b6c0 :
   0x8002fa68                         0x8002f660                    
   &master_mutex                       &suc.mutex                   
   master.c:1149 master_mount_mounts   master.c:999 master_do_mount 
0xb7b64b90 :
   0x8004c2a4                         0x8002f660                      
0x8002fa80                                
   &ap->state_mutex                    &suc.mutex                      
&instance_mutex                          
   automount.c:1165 handle_mounts      automount.c:1167 handle_mounts  
master.c:353 master_find_source_instance 
0xb7874b90 :
   0x8002fa68                               0x8004c2a4                         
     
   &master_mutex                             &ap->state_mutex                  
     
   master.c:939 master_notify_state_change   master.c:948
master_notify_state_change 

['0x8002fa68', '0x8002f660', '0x8004b6a4', '0x8002fa68']
0xb7e0b6c0 :
   0x8002fa68                         0x8002f660                    
   &master_mutex                       &suc.mutex                   
   master.c:1149 master_mount_mounts   master.c:999 master_do_mount 
0xb7df9b90 :
   0x8004b6a4                         0x8002f660                      
0x8002f7ac                        
   &ap->state_mutex                    &suc.mutex                      
&parse_mutex                     
   automount.c:1165 handle_mounts      automount.c:1167 handle_mounts  
nss_parse.y:142 parse_mutex_lock 
0xb7874b90 :
   0x8002fa68                               0x8004b6a4                         
     
   &master_mutex                             &ap->state_mutex                  
     
   master.c:939 master_notify_state_change   master.c:948
master_notify_state_change 

['0x8002fa68', '0x8002f660', '0x8004bc74', '0x8002fa68']
0xb7e0b6c0 :
   0x8002fa68                         0x8002f660                    
   &master_mutex                       &suc.mutex                   
   master.c:1149 master_mount_mounts   master.c:999 master_do_mount 
0xb7c65b90 :
   0x8004bc74                         0x8002f660                      
0x8002f7ac                        
   &ap->state_mutex                    &suc.mutex                      
&parse_mutex                     
   automount.c:1165 handle_mounts      automount.c:1167 handle_mounts  
nss_parse.y:142 parse_mutex_lock 
0xb7874b90 :
   0x8002fa68                               0x8004bc74                         
     
   &master_mutex                             &ap->state_mutex                  
     
   master.c:939 master_notify_state_change   master.c:948
master_notify_state_change 



-- Additional comment from ikent@redhat.com on 2007-06-05 13:21 EST --
(In reply to comment #31)
> > The other trace is quite interesting but I'm not sure that
> > deadlock is possible. 
> Neither am I, but the risk is that some very fast multiprocessor manages to
> stumble over it :-)

Maybe, you'd be talking about between the call to st_set_done and
state_mutex_unlock in expire_cleanup since I've set the next alarm
above it. But it's worse because the user can send a prune event with
a signal, so yes, I'll move the st_set_done out from under the lock.

I can't see any reason not to move it out but I'm still thinking
about it.

> 
> > btw, I have no idea how you produced the automount.log, what
> > did you use for that?
> The hack.patch from the earlier .tar file (don't tell anyone I'm responsible for
> that one :-) )

Oh yes, I missed that.

> 
> Good that you pointed to shutdown as an important thing to check, a shutdown
> that exposed the following possible problems:

And I'll have a look at these below.

snip ...

Ian


-- Additional comment from ikent@redhat.com on 2007-06-05 23:23 EST --
(In reply to comment #32)
> (In reply to comment #31)
> > > The other trace is quite interesting but I'm not sure that
> > > deadlock is possible. 
> > Neither am I, but the risk is that some very fast multiprocessor manages to
> > stumble over it :-)
> 
> Maybe, you'd be talking about between the call to st_set_done and
> state_mutex_unlock in expire_cleanup since I've set the next alarm
> above it. But it's worse because the user can send a prune event with
> a signal, so yes, I'll move the st_set_done out from under the lock.

Nop, this deadlock is not possible. As it stands the code is
independent of execution order and timing. There's no need
to move the st_set_done or add additional state mutex locking.

I think the reason for the false positive is that watch
mutex doesn't take account of the state queue handler
predicate (struct state_queue) control variables, task->busy
and task->done, and it assumes that the ap state_mutex
referred to in st_run_task is the same ap state_mutex
that is currently held. When st_run_task is called the
ap can be any ap except for one that has a currently
active task and the one that holds the state mutex is
clearly an active task.

Now, onto the shutdown.

Iam

-- Additional comment from ikent@redhat.com on 2007-06-06 02:37 EST --
(In reply to comment #31)
> Good that you pointed to shutdown as an important thing to check, a shutdown
> that exposed the following possible problems:
> 
> ['0x8002fa68', '0x8002f660', '0x8004c2a4', '0x8002fa68']
> 0xb7e0b6c0 :
>    0x8002fa68                         0x8002f660                    
>    &master_mutex                       &suc.mutex                   
>    master.c:1149 master_mount_mounts   master.c:999 master_do_mount 
> 0xb7b64b90 :
>    0x8004c2a4                         0x8002f660                      
> 0x8002fa80                                
>    &ap->state_mutex                    &suc.mutex                      
> &instance_mutex                          
>    automount.c:1165 handle_mounts      automount.c:1167 handle_mounts  
> master.c:353 master_find_source_instance 
> 0xb7874b90 :
>    0x8002fa68                               0x8004c2a4                         
>      
>    &master_mutex                             &ap->state_mutex                  
>      
>    master.c:939 master_notify_state_change   master.c:948
> master_notify_state_change 

I'm having a lot of trouble understanding how this sequence
could be a problem.

In order for master_notify_state_change to aquire the
master_mutex master_mount_mounts must have completed
in which case the master_mutex will have been released.
It is possible for the last mount (handle_mounts) to
not have completed it's startup at this time and if
there is an event which calls master_notify_state_change
and it is able to take the master_mutex it may need to
wait on an ap state_mutex if it's held.

It looks like the analysis tool doesn't take account of
the passage of time and so is a little inaccurate in its
reporting.

Ian


-- Additional comment from anders.blomdell@control.lth.se on 2007-06-06 14:21 EST --
> I'm having a lot of trouble understanding how this sequence
> could be a problem.
So have I at a closer inspection, once more barking up the wrong tree...

Let's hope that all deadlock chances are removed.

Sorry to bother you with non-problems...


-- Additional comment from ikent@redhat.com on 2007-06-07 01:10 EST --
(In reply to comment #35)
> > I'm having a lot of trouble understanding how this sequence
> > could be a problem.
> So have I at a closer inspection, once more barking up the wrong tree...
> 
> Let's hope that all deadlock chances are removed.
> 
> Sorry to bother you with non-problems...
> 

Absolutely no problem.
There's no need to be sorry.
I welcome your efforts to find problems and it gives
me the opportunity to focus on validating specific
possible problem areas of the code.

I've tried to identify all the races and deadlocks
but Murphys' law rules, so time will tell.

Thanks
Ian


-- Additional comment from ikent@redhat.com on 2007-06-07 01:18 EST --
btw, I'm going to go ahead with the patch we have for
this so far as it's certainly a bug that needs to be
fixed.

Ian

-- Additional comment from anders.blomdell@control.lth.se on 2007-06-07 03:58 EST --
Created an attachment (id=156433)
alternate implementation of alarm_handler

(In reply to comment #37)
> btw, I'm going to go ahead with the patch we have for
> this so far as it's certainly a bug that needs to be
> fixed.
If it's the patch to alarm.c, I have attached an alternate implementation of
alarm_handler:
  * no code duplication (less risk of reintroducing bug in one branch :-)
  * one extra call to pthread_cond_timedwait per expired alarm,
    but no call to time(), so I belive that performance would be 
    on par with current implementation

-- Additional comment from ikent@redhat.com on 2007-06-07 04:56 EST --
(In reply to comment #38)
> Created an attachment (id=156433) [edit]
> alternate implementation of alarm_handler
> 
> (In reply to comment #37)
> > btw, I'm going to go ahead with the patch we have for
> > this so far as it's certainly a bug that needs to be
> > fixed.
> If it's the patch to alarm.c, I have attached an alternate implementation of
> alarm_handler:
>   * no code duplication (less risk of reintroducing bug in one branch :-)
>   * one extra call to pthread_cond_timedwait per expired alarm,
>     but no call to time(), so I belive that performance would be 
>     on par with current implementation

I'm not sure about this alternate implementation.

Why incur the overhead of possibly unlocking the mutex,
waiting (I hope it's smarter than that) and locking the
mutex again for an alarm that's already passed.

We have to assume that the mutex will be unlocked
and then locked, even if the time has passed, since
that's the way condition waits are defined, so it's
possible for current to be invalid on return.

So it's not quite right as is.

Ian


-- Additional comment from ikent@redhat.com on 2007-06-07 05:09 EST --
(In reply to comment #39)
> (In reply to comment #38)
> > Created an attachment (id=156433) [edit] [edit]
> > alternate implementation of alarm_handler
> > 
> > (In reply to comment #37)
> > > btw, I'm going to go ahead with the patch we have for
> > > this so far as it's certainly a bug that needs to be
> > > fixed.
> > If it's the patch to alarm.c, I have attached an alternate implementation of
> > alarm_handler:
> >   * no code duplication (less risk of reintroducing bug in one branch :-)
> >   * one extra call to pthread_cond_timedwait per expired alarm,
> >     but no call to time(), so I belive that performance would be 
> >     on par with current implementation
> 
> I'm not sure about this alternate implementation.
> 
> Why incur the overhead of possibly unlocking the mutex,
> waiting (I hope it's smarter than that) and locking the
> mutex again for an alarm that's already passed.
> 
> We have to assume that the mutex will be unlocked
> and then locked, even if the time has passed, since
> that's the way condition waits are defined, so it's
> possible for current to be invalid on return.
> 
> So it's not quite right as is.

Also, what happens if the pthread_cond_timedwait is
signaled, not timed out and an alarm has been added
with a timeout before the alarm we're waiting for?

Ian


-- Additional comment from anders.blomdell@control.lth.se on 2007-06-07 05:22 EST --
Created an attachment (id=156437)
new try on alarm_handler

In reply to comment #40)
> Also, what happens if the pthread_cond_timedwait is
> signaled, not timed out and an alarm has been added
> with a timeout before the alarm we're waiting for?
We start over from the top of the while loop. New version of alarm_handler,
addressing the [VERY] valid point of current to be the first element in the
alarm list (leading to out of sequence firing).

I probably should try to keep my mouth shut :-)

-- Additional comment from ikent@redhat.com on 2007-06-07 08:20 EST --
(In reply to comment #41)
> Created an attachment (id=156437) [edit]
> new try on alarm_handler
> 
> In reply to comment #40)
> > Also, what happens if the pthread_cond_timedwait is
> > signaled, not timed out and an alarm has been added
> > with a timeout before the alarm we're waiting for?
> We start over from the top of the while loop. New version of alarm_handler,
> addressing the [VERY] valid point of current to be the first element in the
> alarm list (leading to out of sequence firing).
> 

It looks like it's still not quite right.
But we're getting away from fixing a bug here now.

If you'd like to contribute to autofs please feel
welcome to join the autofs mailing list at
http://linux.kernel.org/mailman/listinfo/autofs

I prefer code submissions as unified diffs against
either the current release tarball (with all patches
from kernel.org applied) or the development git
repository.

Tarball:
http://www.kernel.org/pub/linux/daemons/autofs/v5/autofs-5.0.1.tar.gz

git:
git://git.kernel.org/pub/scm/linux/storage/autofs/autofs.git

Ian


-- Additional comment from anders.blomdell@control.lth.se on 2007-06-07 09:12 EST --
(In reply to comment #42)
> I prefer code submissions as unified diffs against
> either the current release tarball (with all patches
> from kernel.org applied) or the development git
> repository
OK, will do.
Comment 1 Ian Kent 2007-06-11 10:52:23 EDT
Created attachment 156718 [details]
Patch to resolve deadlock in alarm handler
Comment 2 RHEL Product and Program Management 2007-06-11 11:03:55 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 3 Ian Kent 2007-06-21 23:30:00 EDT
A correction for this issue is present in revision
autofs-0.rc2.48 and later.
Comment 6 Ian Kent 2007-06-26 07:40:28 EDT
A fix for this issue is present in autofs-5.0.1-0.rc2.48
and above.
Comment 11 errata-xmlrpc 2007-11-07 12:56:17 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0621.html

Note You need to log in before you can comment on or make changes to this bug.