SRX Services Gateway
SRX Services Gateway

SRX cluster routing engine has GR Error gres-not-ready

‎03-04-2019 03:50 AM

Hi all,

 

I have a cluster problem, and no clue to it.

After some years of running I had to stop one firewall node(srx550) - this was the node1. After the reboot it's interfaces were down (bot in fpc0 and in fpc3) - so I took if offlilne until replace the HW.

Later I tried to start the fw node withot any cable and the interfaces started normally, so I tried to put it back to the cluster.

When It started it immediately become the active node on RG0 but the reth interfaces remain in down status (with all the ge interraces up) so I turnd off again. No preemtion configured so the interfaces remained active in the other node node1.

After it I discovered, that node1 RG0 shows an error (GR) - probably this is the reason why node 0 took mastership when I plugged back.

Now node0 is turned off, I have this GR (GRES monitoring) error and the firewall is working.

I would like to take node0 back in charge, but first I want to clear this GR error.

When I check show chassis cluster information deatil I can see that gres-not-ready ....

 

{primary:node1}

user@firewall-node1> show chassis cluster status

Monitor Failure codes:

    CS  Cold Sync monitoring        FL  Fabric Connection monitoring

    GR  GRES monitoring             HW  Hardware monitoring

    IF  Interface monitoring        IP  IP monitoring

    LB  Loopback monitoring         MB  Mbuf monitoring

    NH  Nexthop monitoring          NP  NPC monitoring

    SP  SPU monitoring              SM  Schedule monitoring

    CF  Config Sync monitoring      RE  Relinquish monitoring

 

Cluster ID: 1

Node   Priority Status         Preempt Manual   Monitor-failures

 

Redundancy group: 0 , Failover count: 0

node0  0        lost           n/a     n/a      n/a

node1  255      primary        no      yes      GR

 

Redundancy group: 1 , Failover count: 0

node0  0        lost           n/a     n/a      n/a

node1  0        primary        no      no       CS

 

 

{primary:node1}

user@firewall-node1> show chassis cluster information detail

node1:

--------------------------------------------------------------------------

Redundancy mode:

    Configured mode: active-active

    Operational mode: active-active

Cluster configuration:

    Heartbeat interval: 1000 ms

    Heartbeat threshold: 3

    Control link recovery: Disabled

    Fabric link down timeout: 66 sec

Node health information:

    Local node health: Not healthy

    Remote node health: Healthy

 

Redundancy group: 0, Threshold: 255, Monitoring failures: gres-not-ready

 

Please help me clearing this gr error.

 

Thanks,

Balázs

2 REPLIES 2
SRX Services Gateway
Solution
Accepted by topic author BB
‎03-04-2019 11:40 PM

Re: SRX cluster routing engine has GR Error gres-not-ready

‎03-04-2019 06:50 PM

node1 is not in healthy state. I think it is becuase of the split brian scenario occured. And the node1 RG0 is priority is 255 which means there was a manual failover. Reset the value. "request chassis cluster failover reset redundancy-group 0"
 You have to reboot node1 to recover from the unhealthy state. 

First reboot node1 and same time power on node0 so that down time can be reduced and kernel state will not be synced to node0 from node1

 

Thanks,
Nellikka
JNCIE x3 (SEC #321; SP #2839; ENT #790)
Please Mark My Solution Accepted if it Helped, Kudos are Appreciated too!!!
Highlighted
SRX Services Gateway

Re: SRX cluster routing engine has GR Error gres-not-ready

[ Edited ]
‎03-04-2019 11:34 PM

Hello,

 

Thaks for the reply, Yesterday I restarted and it cleared the error.

I left the other node turned off.

 

Finally I've found that junbo frame was not enabled on the switch where the HA link travelled between the nodes, that caused the original error.

 

Thanks,

 

Balázs