Tuesday, 29 May 2018

PCSD - "Error: Unable to communicate with"

PCSD daemon is running on all the nodes which are supposed to be part of cluster, but an attempt to generate a token fails with following error.

[root@pcs1 ~]# pcs cluster auth cs1.internal cs2.internal cs3.internal -u hacluster
Password:
Error: Unable to communicate with cs1.internal
Error: Unable to communicate with cs3.internal
Error: Unable to communicate with cs2.internal

Why to authenticate? 

Well pcs daemon is responsible for keeping corosync configuration files synchronized across all the nodes and starting/stopping cluster services. Each node in the cluster must be authorized to each other. This enables nodes to perform actions on other nodes.


Debug


root@pcs1 ~]# pcs cluster auth cs1.internal cs2.internal cs3.internal -u hacluster --debug

 XDG_SESSION_ID=54
  _=/usr/sbin/pcs
  http_proxy=http://proxy.test:80
  https_proxy=http://proxy.test:80

--Debug Input Start--
{"username": "hacluster", "local": false, "nodes": {"cs1.internal": null, "cs3.internal": null, "cs2.internal": null}, "password": "hapassword", "force": false}
--Debug Input End--

Finished running: /usr/bin/ruby -I/usr/lib/pcsd/ /usr/lib/pcsd/pcsd-cli.rb auth

Return value: 0

--Debug Stdout Start--
{
  "status": "ok",
  "data": {
    "auth_responses": {
      "cs3.internal": {
        "status": "noresponse"
      },
      "cs2.internal": {
        "status": "noresponse"
      },
      "cs1.internal": {
        "status": "noresponse"
      }
    },
    "sync_successful": true,
    "sync_nodes_err": [

    ],
    "sync_responses": {
    }

  },
.....

"I, [2018-05-29T16:54:06.303538 #13362]  INFO -- : SRWT Node: cs1.internal Request: check_auth\n",
    "E, [2018-05-29T16:54:06.303538 #13362] ERROR -- : Unable to connect to node cs1.internal, no token available\n",
    "I, [2018-05-29T16:54:06.303538 #13362]  INFO -- : SRWT Node: cs3.internal Request: check_auth\n",
    "E, [2018-05-29T16:54:06.303538 #13362] ERROR -- : Unable to connect to node cs3.internal, no token available\n",
    "I, [2018-05-29T16:54:06.303579 #13362]  INFO -- : SRWT Node: cs2.internal Request: check_auth\n",

    "E, [2018-05-29T16:54:06.303628 #13362] ERROR -- : Unable to connect to node cs2.internal, no token available\n"

The error message indicates a connection failure to cluster nodes,let's see the common blocking factor's.


  • Firewall is running, but High availability is allowed.(include Corosync/Pacemaker ports)

[root@pcs1 ~]# firewall-cmd --state
running
[root@pcs1 ~]#

[root@pcs1 ~]# firewall-cmd --zone=public --list-services
dhcpv6-client ssh high-availability

  • PCSD daemon status on all the nodes.
root@pcs1 ~]# systemctl status pcsd
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2018-05-27 08:53:25 IST; 2 days ago
     Docs: man:pcsd(8)
           man:pcs(8)
 Main PID: 612 (pcsd)
   CGroup: /system.slice/pcsd.service
           └─612 /usr/bin/ruby /usr/lib/pcsd/pcsd > /dev/null &

May 27 08:53:05 pcs1.tux systemd[1]: Starting PCS GUI and remote configuration interface...
May 27 08:53:25 pcs1.tux systemd[1]: Started PCS GUI and remote configuration interface.


  • PCSD listening on port 224

[root@pcs1 ~]# lsof -Pi :2224
COMMAND PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
pcsd    612 root    7u  IPv4  16876      0t0  TCP *:2224 (LISTEN)


Note: IPv6 is disabled and by default it bind to IPv4, else it will show IPv6. 

What else, most of the blocking factors are set correct? This puzzled me to think on other network factor's, then noticed a proxy line in the debug logs.

Yes, I have exported proxy in bashrc profile and pcsd didn't like that!!

[root@pcs1 ~]# echo $http_proxy
http_proxy=http://proxy.test:80

[root@pcs1 ~]# echo $https_proxy
https_proxy=http://proxy.test:80

Further research on it, revealed a bug and it seems like regression.
Let's leave that to OS vendor (Redhat/Oracle?!)
https://bugzilla.redhat.com/show_bug.cgi?id=1315627

Workaround

Unset the Environmental variable

[root@pcs1 ~]# unset http_proxy;unset https_proxy

What next?  
Try to authenticate.

[root@pcs1 ~]# pcs cluster auth cs1.internal cs2.internal cs3.internal -u hacluster

Above dissertation applies to Oracle Linux 7 and Redhat Linux 7.

No comments:

Post a Comment