node1 goes from hold to secondary to disabled

last person joined: yesterday

Ask questions and share experiences about the SRX Series, vSRX, and cSRX.

Back to discussions

Expand all | Collapse all

node1 goes from hold to secondary to disabled

Jump to Best Answer

1. node1 goes from hold to secondary to disabled

Recommend

baldwizard

Posted 06-10-2020 04:21

After upgrading a pair of SRX320s to 15.1X49-D210, I cannot get the cluster to reform.

The primary node comes up ok but I cannot get the secondary online.

I've tried doing the following on the secondary:

set chassis cluster cluster-id 0 node 0 reboot

...

load factory-defaults

set chassis cluster cluster-id 1 node 1 reboot

But on the primary, the status goes "lost -> hold -> secondary -> disabled".

On the secondary the only hint is in chassisid log file:

LCC: send: fpc 0 pic 0 online ack
LCC: pic attach pic 0, flags 0x0, portcount 58, fpc 0
LCC: pic_set_online: i2c 0x689 pic 0 fpc 0 state 3 in_issu 0
LCC: pic_type=1673 pic_slot=0 fpc_slot=0 pic_i2c_id=1673
LCC: hwdb: entry for pic 1673 at slot 0 in fpc 0 inserted
LCC: FPC 0 PIC 0, attaching clean
LCC: not in vc mode
LCC: Forwarding pic attach to FWDD fpc 0, pic 0
LCC: Got a pic attach ack from fwdd fpc 0pic 0
LCC: FWDD pic attach ack recd fpc 0, pic 0
LCC: pic_copy_port_info:Got SFP Rev= , Pno=NON-JNPR, Sno=PG54Q4Q
LCC: SIGWINCH handler
LCC: Node entering disabled state
CHASSISD_FRU_OFFLINE_NOTICE: Taking FPC 0 offline: Chassis cluster disable
LCC: fpc_down slot 0 reason Chassis cluster disable cargs 0xfa6120
LCC: fpc_srxsme_disconnect slot is 0
LCC: fpc_offline_now - slot 0, reason: Chassis cluster disable, error OK transition state 1
CHASSISD_SNMP_TRAP3: ENTITY trap generated: entStateOperDisabled (entPhysicalIndex 7, entStateAdmin 3, entStateAlarm 0)
LCC: fpc_offline_now - slot 0, is_resync_ready cleared
LCC: mic_get_mic_slot: clp1: fpc_slot=0, pic_slot=0, i2c=0x689
LCC: hwdb: entry for fpc 1929 at slot 0 deleted
CHASSISD_FRU_OFFLINE_NOTICE: Taking FPC 1 offline: Chassis cluster disable
LCC: fpc_down slot 1 reason Removal cargs 0x0
LCC: fpc_offline_now - slot 1, reason: Chassis cluster disable, error OK transition state 1
CHASSISD_SNMP_TRAP3: ENTITY trap generated: entStateOperDisabled (entPhysicalIndex 8, entStateAdmin 1, entStateAlarm 0)
LCC: fpc_srxsme_is_mpim_present: slot 1, FPC not present
LCC: fpc_srxsme_init: slot 1, FPC not detected
CHASSISD_FRU_OFFLINE_NOTICE: Taking FPC 2 offline: Chassis cluster disable
LCC: fpc_down slot 2 reason Removal cargs 0x0
LCC: fpc_offline_now - slot 2, reason: Chassis cluster disable, error OK transition state 1
CHASSISD_SNMP_TRAP3: ENTITY trap generated: entStateOperDisabled (entPhysicalIndex 9, entStateAdmin 1, entStateAlarm 0)
LCC: fpc_srxsme_is_mpim_present: slot 2, FPC not present
LCC: fpc_srxsme_init: slot 2, FPC not detected
...
LCC: Unable to read FPC 6 ID EEPROM
LCC: I2C read error for slot 6
...

There's an error in jam_chassisid but that file is not on either SRX:

jam_dso_find_open.776:dir: /usr/sbin/jam
jam_dso_find_open.799:Failed to Open Dir /usr/sbin/jam
jam_get_db_attribute.1013:DB Get failed for chasd.lc.modelinfo.711-062269 with ret 3
jam_get_modelnumstr.1176:Got model num str for partno: 711-062269
jam_dso_find_open.776:dir: /usr/sbin/jam
jam_dso_find_open.799:Failed to Open Dir /usr/sbin/jam
jam_get_db_attribute.1013:DB Get failed for chasd.lc.modelinfo.711-062269 with ret 3
jam_get_modelnumstr.1176:Got model num str for partno: 711-062269
jam_get_db_attribute.1011 ERR:DB Get failed for chasd.lc.modelinfo. with error 3
jam_get_modelnumstr.1176:Got model num str for partno:
jam_dso_find_open.776:dir: /usr/sbin/jam
jam_dso_find_open.799:Failed to Open Dir /usr/sbin/jam
jam_get_db_attribute.1013:DB Get failed for chasd.lc.modelinfo.711-062269 with ret 3
jam_get_modelnumstr.1176:Got model num str for partno: 711-062269

So I'm a bit confused about what to do next.... is the unit actually faulty?

2. RE: node1 goes from hold to secondary to disabled

0 Recommend
baldwizard
Posted 06-10-2020 04:33

Reply Reply Privately
When the secondary is out of the cluster, all of the ge interfaces show up correctly as being up:

root> show interfaces terse | match ge- ge-0/0/0 up up ge-0/0/1 up up ge-0/0/2 up up ge-0/0/3 up up ge-0/0/4 up up ge-0/0/5 up up ge-0/0/6 up up ge-0/0/7 up down ge-0/0/8 up down ge-0/0/9 up down

So I'm not concerned about that.
3. RE: node1 goes from hold to secondary to disabled

1 Recommend
shlinga
Posted 06-10-2020 04:54

Reply Reply Privately
Hello Baldwizard,

Greetings!

As per the description, I understand that the Secondary node is not online:

Can you help us with the below outputs:

> show chassis alarms no-forwarding

> show chassis cluster status

> show chassis cluster statistics
> show chassis cluster information
> show log jsrpd

Also, check the below KB to verify how chassis cluster nodes are configured and up on J-Series and SRX:

https://kb.juniper.net/InfoCenter/index?page=content&id=KB15439&actp=METADATA

Best Regards,

Lingabasappa H

#SRX
#cluster
4. RE: node1 goes from hold to secondary to disabled

0 Recommend
baldwizard
Posted 06-10-2020 05:07

Reply Reply Privately
Interesting that there's one alarm:

> show chassis alarms no-forwarding
1 alarms currently active
Alarm time Class Description
2020-06-10 21:41:02 EST Major NSD fails to restart because subcomponents fail
5. RE: node1 goes from hold to secondary to disabled

1 Recommend
deekshap
Posted 06-10-2020 05:15

Reply Reply Privately
Hello baldwizard,

Regarding this alarm

> show chassis alarms no-forwarding
1 alarms currently active
Alarm time Class Description
2020-06-10 21:41:02 EST Major NSD fails to restart because subcomponents fail

Starting in Junos OS Releases 12.3X48-D85, 15.1X49-D180, and 19.2R1, a system alarm is triggered when the Network Security Process (NSD) is unable to restart due to the failure of one or more NSD subcomponents. The alarm logs about the NSD are saved in the messages log. The alarm is automatically cleared when NSD restarts successfully. The show chassis alarms and show system alarms commands are updated to display the following output when NSD is unable to restart - NSD fails to restart because subcomponents fail.

Kindly go through the below Docs

https://www.juniper.net/documentation/en_US/junos/topics/concept/security-alarm-overview.html

https://www.juniper.net/documentation/en_US/junos/information-products/topic-collections/release-notes/15.1x49-d180/junos-release-notes-15.1X49-D180.pdf

I hope this helps. Please mark my post as "Accept as solution" if that has answered your query.

Kudos are always appreciated!
6. RE: node1 goes from hold to secondary to disabled

1 Recommend
shlinga
Posted 06-10-2020 05:20

Reply Reply Privately
Hello Baldwizard,

> show chassis alarms no-forwarding
1 alarms currently active
Alarm time Class Description
2020-06-10 21:41:02 EST Major NSD fails to restart because subcomponents fail

To clear the above alarm, please run the below command in a safe Maintainence window:

>restart network-security

I suspect that the daemon got stuck and it needs to be restarted, but restarting the process could impact your traffic for a short period of time.

I hope this helps. Please mark this post "Accept as solution" if this answers your query.

Kudos are always appreciated!

Best Regards,

Lingabasappa H

#Netwrok-Security
#Daemon
7. RE: node1 goes from hold to secondary to disabled

0 Recommend
JoelNovans
Posted 06-10-2020 05:54

Reply Reply Privately
Hi baldwizard,

Firstly, please verify active alarms on both nodes.

From the active alarm that you pasted output for, I see the active alarm is regarding NSD failure due to subcomponent failure.

Please note that the NSD process handles all security-related config and pushes them into the PFE. Since you are seeing these alerts, I suspect that the daemon might have gotten stuck and it needs to be restarted, but please keep in mind that restarting the process could impact your traffic for a short period of time.

To restart the daemon:

> restart network-security

If the above command does not solve the issue, please restart the device:

> request system reboot

Please be aware that you take precautionary measures while rebooting the node. You might not want to do a reboot on a node that is primary.

Hope this helps 🙂

Please mark "Accepted Solution" if this helps you solve your query.

Kudos are always appreciated!
8. RE: node1 goes from hold to secondary to disabled
Best Answer

1 Recommend
baldwizard
Posted 06-10-2020 05:04

Reply Reply Privately
Ok, this appears to be because there was an interface configuration present on the non-cluster member for one of the HA interfaces, ge-0/0/0. I found that deep in a log file but that wasn't visible!

/var/log/dcd

I needed to do a "delete interface ge-0/0/0" from the non-cluster state of the secondary (it then only had the root password in its local configuration) and then reboot.
9. RE: node1 goes from hold to secondary to disabled

0 Recommend
shlinga
Posted 06-10-2020 05:13

Reply Reply Privately
Hello Baldwizard,

Thanks for the reply.

Did deleting the interface ge-0/0/0 from a non-cluster member in the secondary node and then followed a reboot resolved the issue?

Request you to mark the solution for the queries you post as accepted if it answered your query/queries.

This would enable others to find the right solution for the same/similar queries on the forum.

I hope this helps. Please mark my post as "Accept as solution" if that has answered your query.

Kudos are always appreciated!

Best Regards,

Lingabasappa H
10. RE: node1 goes from hold to secondary to disabled

0 Recommend
deekshap
Posted 06-10-2020 05:07

Reply Reply Privately
Hello Baldwizard

Greetings !!

Kindly provide us the Output of the Below Commands

show chassis cluster status
show chassis fpc pic-status
show chassis alarms

show log jsrpd

show chassis cluster information no-forwarding

Meanwhile You can go through the below Docs it will be benefical For trouebleshooting

https://kb.juniper.net/InfoCenter/index?page=content&id=KB20641&actp=METADATA

https://kb.juniper.net/InfoCenter/index?page=content&id=KB15421&actp=METADATA

Please mark "Accept as solution" if this answers your query.

Kudos are appreciated too
11. RE: node1 goes from hold to secondary to disabled

0 Recommend
JoelNovans
Posted 06-10-2020 05:55

Reply Reply Privately
Hi baldwizard,

Firstly, please verify active alarms on both nodes.

From the active alarm that you pasted output for, I see the active alarm is regarding NSD failure due to subcomponent failure.

Please note that the NSD process handles all security-related config and pushes them into the PFE. Since you are seeing these alerts, I suspect that the daemon might have gotten stuck and it needs to be restarted, but please keep in mind that restarting the process could impact your traffic for a short period of time.

To restart the daemon:

> restart network-security

If the above command does not solve the issue, please restart the device:

> request system reboot

Please be aware that you take precautionary measures while rebooting the node. You might not want to do a reboot on a node that is primary.

Hope this helps 🙂

Please mark "Accepted Solution" if this helps you solve your query

Kudos are always appreciated!

SRX

node1 goes from hold to secondary to disabled

baldwizard06-10-2020 04:21

baldwizard06-10-2020 04:33

shlinga06-10-2020 04:54

baldwizard06-10-2020 05:07

deekshap06-10-2020 05:15

shlinga06-10-2020 05:20

JoelNovans06-10-2020 05:54

baldwizard06-10-2020 05:04Best Answer

shlinga06-10-2020 05:13

deekshap06-10-2020 05:07

JoelNovans06-10-2020 05:55

1. node1 goes from hold to secondary to disabled

2. RE: node1 goes from hold to secondary to disabled

3. RE: node1 goes from hold to secondary to disabled

4. RE: node1 goes from hold to secondary to disabled

5. RE: node1 goes from hold to secondary to disabled

6. RE: node1 goes from hold to secondary to disabled

7. RE: node1 goes from hold to secondary to disabled

8. RE: node1 goes from hold to secondary to disabled Best Answer

9. RE: node1 goes from hold to secondary to disabled

10. RE: node1 goes from hold to secondary to disabled

11. RE: node1 goes from hold to secondary to disabled

8. RE: node1 goes from hold to secondary to disabled
Best Answer