Replication Setup Problems
When troubleshooting replication issues, the following items should always be checked first:
Check slapd
status
systemctl slapd status
or
pgrep slapd
If the service isn’t running, start the service:
systemctl slapd start
If the service fails to start, check the slapd log file for error
messages and use the slaptest command to check your configuration. If
there’s nothing obvious in the slapd
log file, try
systemctl status slapd
.
Check network connectivity
Network and routing issues may keep providers and consumers from connecting. Note: Depending on your network/security policies, ping (ICMP) may be disabled or ignored.
Test bind and search functionality
If the consumer syncrepl config uses bindmethod=simple
,
test binding/searching with the syncrepl bindDN and credentials
(password) and syncrepl searchbase using:
/opt/symas/bin/ldapsearch -x -H ldap://<server>/ -W -D <syncrepl bindDN> -LLL -s base -b <syncrepl searchbase> dn
TODO Replace path
If the consumer syncrepl config uses bindmethod=SASL
,
test binding/searching with the CA and client certificates. Use:
/opt/symas/bin/ldapsearch -Z -H ldap://<server> -Y EXTERNAL -LLL -s base -b <syncrepl searchbase> dn
If binding and searching is successful, the returned result will be
the same value as the searchbase -b
set in your ldapsearch
command.
If binding/searching fails, check the error message from the client and slapd logs.
Error codes and messages associated with BIND failures:
(err=13) confidentialityRequired: LDAP over TLS or LDAPS connection required
Here, a syncrepl client is attempting to bind to a provider that requires the client to bind with TLS/LDAPS.
- Update the syncrepl configuration on the consumer to use TLS/LDAPS.
Use one of the following options:
- Set the provider URI to
ldaps://<hostname>
- Add the
starttls=yes
option
- Set the provider URI to
- If the syncrepl client uses a different CA certificate from the
server’s globally configured CA certificate
(
TLSCACertificateFile <ca cert path>
), add thetls_cacert=<ca cert path>
option to the syncrepl configuration
(err=49) invalidCredentials: Bind credentials do not match the bind DN or the bind DN does not exist in the directory
- Ensure the syncrepl bindDn is correct and exists on the provider
- Ensure the bind credentials for the replication user are correct
TLS negotiation failure : TLS keys or certificates are invalid or incorrectly configured
- Check the paths of all CA and server certificate and key files
- Verify the CA certificate files on the provider and consumer are identical
- Verify the CA certificate has not expired:
openssl x509 -in <certificate file> -noout -dates
- If using certificate-based (SASL) authentication:
- Check the paths of the client certificates
- Ensure the client certificate Subject and/or subjectAltName matches
the hostname of the consumer
- To check the Subject:
openssl x509 -in <certificate file> -noout -subject``
* To check the subjectAltName:
openssl x509 -in <certificate file> -noout -text | grep 'Subject Alternative Name'
- Verify the client certificate has not expired
openssl x509 -in <certificate file> -noout -dates``
- If using Kerberos/GSSAPI authentication:
- You should be fine.
See Securing Replication for more information on TLS/LDAPS configurations.
Error codes and messages associated with test SEARCH failures:
(err=32) noSuchObject : The search base in the test does not exist
- Verify the DN in the
searchbase
option in the consumer’s syncrepl configuration.
(err=50) insufficientAccessRights : The test search bindDN does not have access to the search base
If the test returns nothing and/or appears to hang, this can indicate:
- Connections between the consumer and provider are being blocked by a firewall (iptables or intermediate firewall)
- The solserver service on the provider is hung or too busy to accept connections To test if the connections are being blocked, repeat the previous test search from the provider to itself. If the test search from a provider to itself is successful, this is an indication that a firewall may be restricting LDAP connections between hosts.
- Check the firewall rules in iptables, Windows Firewall or other firewalls.
- If the server is situated behind a network load balancer, make sure the server is enabled in the server pool.
A successful search from a provider to itself may also indicate a network/routing issue.
Check system clock synchronization
Having the system clocks on all LDAP servers is critical, especially multi-master replication topologies. Ideally, all the server system clocks should have zero offset (no difference in time). This can be difficult to achieve, especially on virtual machines which rely on the physical system clock on the virtual machine’s host. If the system clocks are not in sync, entry updates may be ignored by consumers.
For more information see Replication and NTP
Check configuration consistency
Compare the syncrepl and syncprov settings against each other * Diff
tools such as diff, vimdiff, etc. are invaluable for comparing
syncrepl/syncprov configurations * When using cn=config, use
‘ldapsearch -b cn=config -LLL > config.ldif
’ to extract
your slapd configurations; this method excludes operational attributes
which vary from server to server and unnecessarily clutter your diff. If
you want to include operational attributes, use
‘ldapsearch -b cn=config -LLL "*" "+" > config.ldif
’
In most situations, the settings in the syncrepl stanzas and syncprov configs should be identical, except for server-specific settings like tls_cert and tls_key
Check replication permissions and limits
The DN responsible for replication (if not the rootDN) must have:
- Read permission on the entire directory
- A search size limit of “unlimited”
- A search time limit of “unlimited”
To make sure the replicator DN (when not the rootDN) has full access to a DIT, place the following in each backend configuration that the replicator needs access to:
In slapd.conf:
access to * by dn.exact="<replicator DN>" read stop
In cn=config:
olcAccess: to * by dn.exact="<replicator DN>" read stop
If the backend has other ACLs, place this rule at or near the top of the ACLs. This prevents unnecessary ACL evaluation for the replicator DN, which improves performance.
If the replication DN does not have it’s search size set to “unlimited”, the number of search results will be limited to 500 entries. With this limitation, the consumer will be able to replicate the 500 entries retrieved from the initial synchronization query, but then starts the replication process over and will continually try to replicate the same 500 entries over and over again.
To set size and time limits for the replication user:
In slapd.conf:
limits dn.exact="<replicator DN>" size=unlimited time=unlimited
In cn=config:
olcLimits: dn.exact="<replicator DN>" size=unlimited time=unlimited
Check the contextCSNs
The contextCSN attribute in the root entry of a DIT is the entryCSN of the last entry to be successfully updated in the directory, and is the attribute that is used to check replication statuses on providers and consumers. If the contextCSNs match on all providers and consumers, the LDAP environment is fully synchronized. If the provider contextCSN timestamp is newer than a consumer, the consumer is behind in replication.
See LDAP CSNs for more information on CSNs.
Obtaining contextCSNs
The following script does a simple check of all contextCSNs in an LDAP environment:
#!/bin/bash
# The URIS variable is an array of pipe-delimited strings. The pipe-delimited strings have three elements:
# * The URI of an LDAP server (equivalent to -H on the command line)
# * The DN of the root object of a directory (equivalent to -b on the command line)
# * The DN of the user used to bind to the directory (equivalent to -D on the command line)
URIS=("ldap://ldap-provider-01.example.com/|dc=example,dc=com|cn=replicator,dc=example,dc=com" \
"ldap://ldap-provider-01.example.com/|cn=accesslog|cn=replicator,dc=example,dc=com" \
"ldap://ldap-consumer-01.example.com/|dc=example,dc=com|cn=replicator,dc=example,dc=com" \
"ldap://ldap-consumer-02.example.com/|dc=example,dc=com|cn=replicator,dc=example,dc=com")
read -s -p "LDAP Password: " PW
echo; echo
for URI in "${URIS[@]}"; do
IFS='|' read -ra P <<< "${URI}";
echo "Host: ${P[0]} Base DN: ${P[1]}";
$(which ldapsearch) -x -D ${P[2]} -w $PW -H ${P[0]} -b ${P[1]} -LLL -s base contextCSN | grep -v '^dn:';
echo
done
Interpreting contextCSNs
The following is the output of the above contextCSN script in a single-master environment:
Host: ldap://ldap-provider-01.example.com/ Base DN: dc=example,dc=com
contextCSN: 20161116183346.865339Z#000000#000#000000
Host: ldap://ldap-provider-01.example.com/ Base DN: cn=accesslog
contextCSN: 20161116183346.865339Z#000000#000#000000
Host: ldap://ldap-consumer-01.example.com/ Base DN: dc=example,dc=com
contextCSN: 20161116183346.865339Z#000000#000#000000
Host: ldap://ldap-consumer-02.example.com/ Base DN: dc=example,dc=com
contextCSN: 20161116183346.865339Z#000000#000#000000
All contextCSNs match so all servers are synchronized. Note that the third segment of the contextCSN is “000”. This is the default serverID for providers in single-provider environments (no MMR), unless set with a different serverID in the slapd configuration.
The following is the output of the contextCSN script in a multi-master environment (comments added after each contextCSN):
Host: ldap://ldap-provider-01.example.com/ Base DN: dc=example,dc=com
contextCSN: 20161116182398.003291Z#000000#001#000000 # contextCSN for ldap-provider-01 (self)
contextCSN: 20161116183346.865339Z#000000#002#000000 # contextCSN for ldap-provider-02
Host: ldap://ldap-provider-02.example.com/ Base DN: dc=example,dc=com
contextCSN: 20161116182398.003291Z#000000#001#000000 # contextCSN for ldap-provider-01
contextCSN: 20161116183346.865339Z#000000#002#000000 # contextCSN for ldap-provider-02 (self)
Host: ldap://ldap-consumer-01.example.com/ Base DN: dc=example,dc=com
contextCSN: 20161116182398.003291Z#000000#001#000000 # contextCSN for ldap-provider-01
contextCSN: 20161116183346.865339Z#000000#002#000000 # contextCSN for ldap-provider-02
Host: ldap://ldap-consumer-02.example.com/ Base DN: dc=example,dc=com
contextCSN: 20161116182398.003291Z#000000#001#000000 # contextCSN for ldap-provider-01
contextCSN: 20161116183346.865339Z#000000#002#000000 # contextCSN for ldap-provider-02
In multi-master environments, the providers and consumers will have more than one contextCSN. On consumers, there will be one contextCSN per syncrepl client. The contextCSNs are differentiated by the third segment of the contextCSN which is a three digit number. This number is the serverID of the provider that the syncrepl client connects to. Similarly, MMR providers will also have two or more contextCSNs, one for itself and one for each provider it mirrors.
Notice that contextCSNs for ldap-provider-01 and ldap-provider-02 are different. This is normal. Only the contextCSNs associated with the provider that received an entry modification will be updated. The contextCSNs from all other providers will remain the same.
Troubleshooting with contextCSNs
Note: This section assumes that the contextCSNs for all LDAP servers will be checked every time. ##### Isolating Problem Servers The following routine can be used to see which LDAP servers are or aren’t replicating:
- Check the contextCSNs on all servers
- Add or update a dummy entry on the provider
- Re-check the contextCSNs on all servers
In MMR environments, execute updates on all providers to make sure updates are replicating from all providers.
If all contextCSNs update after the test, replication is functioning correctly. Any replica that doesn’t have the contextCSN updated can be isolated for further troubleshooting.