Replication Troubleshooting
Created by Greg Noe, last modified on Nov 14, 2016
Table of Contents
- Basic troubleshooting items
- Troubleshooting by Problem
- Problem: Changes are replicating slowly
- Problem: The contextCSNs match between servers, but some entries don’t match
- Problem:
- Solution:
- Related articles
Reference
OpenLDAP [[Environment Configuration|Environment-Configuration_5177591]]
LDAP Content Synchronization Operation: RFC4533
LDAP CSN Reference: [[LDAP CSNs|LDAP-CSNs_6193399]]
NTP Reference: [[[DRAFT] OpenLDAP, NTP and Replication|6193341]]
Basic troubleshooting items
When troubleshooting replication issues, the following items should always be checked first:
Are all servers up and running?
Sometimes, it’s just that simple ;-)
Can the LDAP servers ping each other?
Network and routing issues may keep providers and consumers from connecting. Note: Depending on your network/security policies, ping (ICMP) may be disabled or ignored.
Is the solserver/OpenLDAP service running?
- On Unix/Linux servers, use “
service solserver status
” or “ps -ef | grep slapd
” - On Windows servers, check the service controller or use the
“
get-service OpenLDAP
” command with Windows PowerShell
If the service isn’t running, start the service.
- On Linux servers, start with “/opt/symas/etc/solserver start”
- On Windows servers, start with the service controller or use “Start-Service -v OpenLDAP” with Windows PowerShell
If the service fails to start, check the slapd log file for error
messages and use the slaptest
command to check your
configuration.
Can consumers bind to and search providers using the replication credentials or certificates?
- If the consumer syncrepl config uses “bindmethod=simple”, test
binding/searching with the syncrepl binddn and credentials (password)
and syncrepl searchbase using
“
/opt/symas/bin/ldapsearch -x -H ldap://<server>/ -W -D <syncrepl binddn>
-LLL -s base -b <syncrepl searchbase> dn” - If the consumer syncrepl config uses “bindmethod=SASL”, test
binding/searching with the CA and client certificates. Use
“
/opt/symas/bin/ldapsearch -Z -H ldap://<server> -Y EXTERNAL -LLL -s base -b <syncrepl searchbase> dn
”
If binding and searching is successful, the returned result will be the same value as the searchbase “-b” set in your ldapsearch command.
If binding/searching fails, check the error message from the client and slapd logs.
Error codes and messages associated with BIND failures:
- (err=13) confidentialityRequired : LDAP over TLS or LDAPS connection required
- (err=49) invalidCredentials : Bind credentials do not match the bind dn
- (TLS negotiation failure) : TLS keys or certificates are invalid or incorrectly configured
Error codes and messages associated with test SEARCH failures:
- (err=32) noSuchObject : The search base in the test does not exist
- (err=50) insufficientAccessRights : The test search bind dn does not have access to the search base
If the test returns nothing and/or appears to hang, this can indicate that:
- Connections between the consumer and provider are being blocked by a firewall (iptables or intermediate firewall)
- The solserver service on the provider is hung or too busy to accept connections
To test if the connections are being blocked, repeat the previous test search from the provider to itself. If the test search from a provider to itself is successful, this is an indication that a firewall may be restricting LDAP connections between hosts. Check the firewall rules in iptables, Windows Firewall or other firewalls. A successful search from a provider to itself may also indicate a network/routing issue.
Are provider and consumer system clocks synchronized?
Having the system clocks on all LDAP servers is critical, especially multi-master replication topologies. Ideally, all the server system clocks should have zero offset (no difference in time). This can be difficult to achieve, especially on virtual machines which rely on the physical system clock on the virtual machine’s host. If the system clocks are not in sync, entry updates may be ignored by consumers.
For information on clock synchronization, see [[[DRAFT] OpenLDAP, NTP and Replication|6193341]]
Troubleshooting by Problem
Note: See the [[LDAP CSNs|LDAP-CSNs_6193399]] page for CSN details and instructions for getting contextCSNs.
Problem: Changes aren’t replicating to consumers
Are contextCSNs updating on the provider(s) and consumers?
After basic troubleshooting, the first items to check are the contextCSN values on each provider and consumer. Get the contextCSNs for each provider and consumer using the ldapsearch command. If all contextCSNs per serverID match, it can be assumed that all providers and consumers are properly synchronized. If the contextCSNs don’t match, try the following to see what isn’t getting updated:
- Get the contextCSNs for all providers and consumers with ldapsearch
- Add or update a dummy entry on the provider
- Get the contextCSNs for all providers and consumers again
- Compare the contextCSNs from the first and second searches
If the dummy entry update and replication is successful, the second search will return a newer value of the contextCSN and all contextCSNs will be identical across all consumers or providers.
If the contextCSN on one replica doesn’t update but other consumers are updated, re-run the search to see if the one consumer is lagging behind in updates. If the one consumer is still behind, start looking at the slapd logs on the provider and consumer.
If all contextCSN values differ from the values prior to the dummy entry update and are identical across all providers and consumers, replication is working correctly.
Problem: Changes are replicating slowly
Problem: The contextCSNs match between servers, but some entries don’t match
Problem:
Solution:
Fix it.