How to get PAM LDAP local logins to work when networking is down

Tags: , , ,

I recently ran into an issue where my servers using LDAP logins became inaccessible on the console during a network outage. It turns out this was because the LDAP client was trying to reconnect to the server indefinitely which caused the login process to time out, locking me out of the machine. The fix was to set a few nss_reconnect_* entries in /etc/ldap.conf, which apparently are undocumented.

Here are the relevant lines of my /etc/ldap.conf:

timelimit 120
bind_timelimit 120
idle_timelimit 3600
# Added to permit console login during network outages.
nss_reconnect_tries 2
nss_reconnect_sleeptime 1
nss_reconnect_maxsleeptime 1
nss_reconnect_maxconntries 1

I was able to find some pretty terse documentation in the source which at least explains that these variables adjust the parameters of ldap connection retries. In the above configuration all servers are tried 2 times with a 1 second sleep between trys.

nss_reconnect_tries 5	      # no. of times to double the sleep time
nss_reconnect_sleeptime 4     # initial sleep value
nss_reconnect_maxsleeptime 64 # max sleep value to cap at
nss_reconnect_maxconntries 2  # how many tries before sleeping
# This leads to a delay of 124 seconds (4+8+16+32+64=124)
# per lookup if the server is not available.


5 Responses to “How to get PAM LDAP local logins to work when networking is down”

  1. mrmccrac Says:

    I’m having similar problems even w/ setting nss_reconnect options. Do you run nscd?


  2. keith Says:

    I try to avoid it, It hasn’t done me any favors and caused me more problems by holding on to stale data than it has helped. Are you using nscd when you see these problems? Also, do you see log entries indicating the results of your attempted ldap queries on the system in question?


  3. mrmccrac Says:

    I’m actually running a version of unscd which may or may not be related to the problem that I wrapped up in my own RPM that I grabbed here:

    You are not alone in your nscd troubles thats for sure, but this one doesn’t randomly crash on me at least and seems to do its job. I’m still working on doing more debugging to see whats causing the timeout exactly. I turned on PAM debugging and the only message I saw was:

    May 10 20:52:22 hostname login: pam_localuser(login:account): checking “root:x:0:0:root:/root:/bin/bash ”

    My /etc/ldap.conf also has:

    nss_initgroups_ignoreusers root,ldap,named,avahi,haldaemon,dbus

    I’m running RHEL5, and the only want I was able to login to a box where it lost networking (in this case, incorrect default gateway) was to reboot it and choose Interactive startup on boot and disable the networking service entirely. You could also boot into single user mode as well, but I was unable to get a grub prompt. This way, I was able to login as root immediately and don’t hit the 60 second timeout.


  4. Dick Visser Says:

    In my case it didn’t matter which nss_* options I used in /etc/ldap.conf, I would also hit a 120s limit.
    Turns out that this is a resolver issue.
    Without network, no DNS server, and hence the network address of our ldap server cannot be found.
    Hardcoding the LDAP server IP address into /etc/hosts things made things work. But that’s a bit sucky.
    I’m suspecting that the 120s is the result of the resolver trying each nameserver 2 times bofore giving up.
    In my case I have two servers, and each on has a timeout of 30s. So that makes 120s. I checked by configuring just one server, and indeed the timeout went down to 60s.
    I will try to see how I can configure the ‘attempts’ and ‘timeout’ values on Ubuntu 12.04.


  5. William Adler Says:

    Would uncsd work in an environment where the machines go offline for a day at a time?
    Or is there a better solution?


Join the Conversation