I was running up a new set of clustered search heads the other day and ran into some issues with one of the nodes talking to the indexer which stored the data.
Unable to distribute to peer named INDEXER.example.com at uri=INDEXER.example.com:8089 using the uri-scheme=https because peer has status="Down".
I proved that I could telnet from the search head to the index server on 8089 and could connect successfully, also searches at the same time from other nodes work fine, so it wasn’t actually a connectivity issue. 🤔
I found this error on the indexer:
08-31-2018 14:18:49.871 +1000 INFO KeyManagerSearchPeers - Reading public key for peer: /opt/splunk/etc/auth/distServerKeys/SEARCHHEAD.example.com/trusted.pem
08-31-2018 14:18:49.881 +1000 INFO KeyManagerSearchPeers - Failed to read public key for peer: Error opening /opt/splunk/etc/auth/distServerKeys/SEARCHHEAD.example.com/trusted.pem: No such file or directory
And these were the logs on the search head:
08-31-2018 14:18:49.854 +1000 INFO KeyManagerSearchPeers - Sending SEARCHHEAD.example.com public key to search peer: https://INDEXER.example.com:8089
127.0.0.1 - splunk-system-user [31/Aug/2018:14:18:49.854 +1000] "POST /services/shcluster/captain/members/SEARCHHEAD.example.com/send_peer_cert HTTP/1.1" 200 1875 - - - 29ms
08-31-2018 14:18:49.883 +1000 ERROR SHCMasterPeerHandler - Could not send public key to peer=https://INDEXER.example.com:8089 for server=SEARCHHEAD.example.com (reason='')
08-31-2018 14:18:49.883 +1000 INFO SHCMaster - Failed to send public key for peer: Status 401 while sending public key to search peer https://INDEXER.example.com:8089: call not properly authenticated
08-31-2018 14:18:49.883 +1000 ERROR KeyManagerSearchPeers - Status 401 while sending public key to search peer https://INDEXER.example.com:8089: call not properly authenticated
08-31-2018 14:18:49.923 +1000 WARN DistributedPeer - Peer:https://INDEXER.example.com:8089 Authentication Failed
08-31-2018 14:18:49.947 +1000 WARN DistributedPeer - Peer:https://INDEXER.example.com:8089 Authentication Failed
127.0.0.1 - splunk-system-user [31/Aug/2018:14:18:49.951 +1000] "POST /services/shcluster/captain/members/SEARCHHEAD.example.com/send_peer_cert HTTP/1.1" 200 1875 - - - 64ms
Basically, the issue was that it couldn’t authenticate with a pre-shared key because it’d never been sent to the indexer and couldn’t be after the fact. Removing and re-adding the indexer as a search peer fixed the problem :)