A few approaches that come to mind for this are.
- SLAX/Python event-script, could be scheduled to run daily and check if there is a discrepancy between the partitions, and if there is to trigger a snapshot.
- SLAX/Python snmp-script to check if there is a discrepancy and return an OID for the state. Although this does require Junos 15.1 or higher.
It could also be possible to implement a solution off-box as well, which could be driven by a number of options using Junos PyEZ Python library which could be leveraged by an off-box Python script, or via a tool such as Ansible/Salt, or even off-box SLAX using JUISE.
I did take a look at this in more detail and the following is a script that I have written that demonstrates some of this. With the approach shown below it should be good for some early versions of Junos as well, I suspect that it will probably work for Junos 9.6 and higher, although I only tested it with 12.1.
It shows examples of sending SNMP traps, and also updating OIDs in the Juniper Utility MIB that could then be polled without too much effort. Or, with minimal changes it could just as easily send to syslog.
This is a SLAX example, but the same approach could be written using Python, it would depend on the version of Junos and the capabilities.
In this example I was using a couple of SRX setup in a cluster, and each node of the cluster has dual partitions, so the approach would be similar for other device types.
With this event-script, the script was copied to both nodes of the cluster and stored as /var/db/scripts/event/snapshot-check.slax
I added the event script to the configuration, and setup a timer event and policy for the event-script within the script itself, so that there is less configuration that needs to be applied. But the event-definition could be removed from the script, and instead an event policy could be configured instead.
Example:
show configuration event-options
event-script {
file snapshot-check.slax;
}
With the above configuration applied, the event-policy defined within the script
is now active used, as shown by the CLI command below.
show event-options event-scripts policies
## Last changed: 2024-04-09 16:24:15 UTC
event-options {
generate-event {
every-hour time-interval 3600;
}
policy snapshot-check {
events every-hour;
then {
event-script snapshot-check.slax;
}
}
}
With this in place, the script is executed every hour. Don't forget that if the script is modified, then it will be necessary to reload the script via the CLI, e.g. request system scripts event-scripts reload
version 1.0;
ns junos = "http://xml.juniper.net/junos/*/junos";
ns xnm = "http://xml.juniper.net/xnm/1.1/xnm";
ns jcs = "http://xml.juniper.net/junos/commit-scripts/1.0";
import "../import/junos.xsl";
var $event-definition = {
<event-options> {
<generate-event> {
<name> "every-hour";
<time-interval> "3600";
}
<policy> {
<name> "snapshot-check";
<events> "every-hour";
<then> {
<event-script> {
<name> "snapshot-check.slax";
}
}
}
}
}
match / {
<event-script-results> {
/* check only 1 instance running */
if(not(jcs:dampen("snapshot-check", 1, 1))) {
expr jcs:syslog("external.notice", "snapshot-check", "dampen exit OK.");
<xsl:message terminate="yes">;
}
/* open connection */
var $conn = jcs:open();
if ($conn/..//xnm:error) {
call rpc_failure($rpc = $conn/.., $message = "Error connecting on mgd on this RE");
<xsl:message terminate="yes">;
}
/* define rpc */
var $get-snapshot-info-rpc = {
<get-snapshot-information> {
<media> "internal";
}
}
/* get snapshot info */
var $get-snapshot-results = jcs:execute($conn, $get-snapshot-info-rpc);
if ($get-snapshot-results/..//xnm:error) {
call rpc_failure($rpc = $get-snapshot-results/.., $message = "Error collecting snapshot information!");
}
/* parse results */
for-each ($get-snapshot-results/multi-routing-engine-item) {
if (re-name == "node0") {
/* check for primary and backup partition */
if (count(./snapshot-information/snapshot-medium) != 2) {
/* update utility mib */
call snmp_update(
$connection = $conn,
$instance = "node0",
$obj-type = "string",
$obj-value = "Only one parition found!"
);
call snmp_update(
$connection = $conn,
$instance = "node0_status",
$obj-type = "integer",
$obj-value = 1
);
/* generate a trap */
var $requestSnmpTrapNode0 = <request-snmp-generate-trap> {
<trap> "jnxEventTrap";
<variable-bindings> "jnxEventTrapDescr[0]=Event-Trap, "
_ "jnxEventAvAttribute[1]=event, "
_ "jnxEventAvValue[1]=NODE0-SNAPSHOT-ALARM, "
_ "jnxEventAvAttribute[2]=Desc, "
_ "jnxEventAvValue[2]=Not enough partitions, "
_ "jnxEventAvAttribute[3]=status-value, "
_ "jnxEventAvValue[3]=" _ 1;
}
var $res_trap_node0 = jcs:execute($conn, $requestSnmpTrapNode0);
} else {
/* check version on primary and backup */
if (./snapshot-information/software-version[1]/package[package-name="junos"]/package-version !=
./snapshot-information/software-version[2]/package[package-name="junos"]/package-version) {
/* update utility mib */
call snmp_update(
$connection = $conn,
$instance = "node0",
$obj-type = "string",
$obj-value = "Version mismatch between partitions!"
);
call snmp_update(
$connection = $conn,
$instance = "node0_status",
$obj-type = "integer",
$obj-value = 1
);
/* generate a trap */
var $requestSnmpTrapNode0 = <request-snmp-generate-trap> {
<trap> "jnxEventTrap";
<variable-bindings> "jnxEventTrapDescr[0]=Event-Trap, "
_ "jnxEventAvAttribute[1]=event, "
_ "jnxEventAvValue[1]=NODE0-SNAPSHOT-ALARM, "
_ "jnxEventAvAttribute[2]=Desc, "
_ "jnxEventAvValue[2]=Version mismatch, "
_ "jnxEventAvAttribute[3]=status-value, "
_ "jnxEventAvValue[3]=" _ 1;
}
var $res_trap_node0 = jcs:execute($conn, $requestSnmpTrapNode0);
} else {
/* update utility mib */
call snmp_update(
$connection = $conn,
$instance = "node0",
$obj-type = "string",
$obj-value = "Both partitions have same version " _
./snapshot-information/software-version[1]/package[package-name="junos"]/package-version
);
call snmp_update(
$connection = $conn,
$instance = "node0_status",
$obj-type = "integer",
$obj-value = 0
);
}
}
}
if (re-name == "node1") {
if (count(./snapshot-information/snapshot-medium) != 2) {
/* update utility mib */
call snmp_update(
$connection = $conn,
$instance = "node1",
$obj-type = "string",
$obj-value = "Only one parition found!"
);
call snmp_update(
$connection = $conn,
$instance = "node1_status",
$obj-type = "integer",
$obj-value = 1
);
/* generate a trap */
var $requestSnmpTrapNode1 = <request-snmp-generate-trap> {
<trap> "jnxEventTrap";
<variable-bindings> "jnxEventTrapDescr[0]=Event-Trap, "
_ "jnxEventAvAttribute[1]=event, "
_ "jnxEventAvValue[1]=NODE1-SNAPSHOT-ALARM, "
_ "jnxEventAvAttribute[2]=Desc, "
_ "jnxEventAvValue[2]=Not enough partitions, "
_ "jnxEventAvAttribute[3]=status-value, "
_ "jnxEventAvValue[3]=" _ 1;
}
var $res_trap_node1 = jcs:execute($conn, $requestSnmpTrapNode1);
} else {
if (./snapshot-information/software-version[1]/package[package-name="junos"]/package-version !=
./snapshot-information/software-version[2]/package[package-name="junos"]/package-version) {
/* update utility mib */
call snmp_update(
$connection = $conn,
$instance = "node1",
$obj-type = "string",
$obj-value = "Version mismatch between partitions!"
);
call snmp_update(
$connection = $conn,
$instance = "node1_status",
$obj-type = "integer",
$obj-value = 1
);
/* generate a trap */
var $requestSnmpTrapNode1 = <request-snmp-generate-trap> {
<trap> "jnxEventTrap";
<variable-bindings> "jnxEventTrapDescr[0]=Event-Trap, "
_ "jnxEventAvAttribute[1]=event, "
_ "jnxEventAvValue[1]=NODE1-SNAPSHOT-ALARM, "
_ "jnxEventAvAttribute[2]=Desc, "
_ "jnxEventAvValue[2]=Version mismatch, "
_ "jnxEventAvAttribute[3]=status-value, "
_ "jnxEventAvValue[3]=" _ 1;
}
var $res_trap_node1 = jcs:execute($conn, $requestSnmpTrapNode1);
} else {
/* update utility mib */
call snmp_update(
$connection = $conn,
$instance = "node1",
$obj-type = "string",
$obj-value = "Both partitions have same version " _
./snapshot-information/software-version[1]/package[package-name="junos"]/package-version
);
call snmp_update(
$connection = $conn,
$instance = "node1_status",
$obj-type = "integer",
$obj-value = 0
);
}
}
}
}
expr jcs:close($conn);
}
}
/* syslog */
template rpc_failure($rpc, $message = "Following errors occurred while trying to gather data:") {
expr jcs:syslog("daemon.error", $message);
for-each ($rpc//xnm:error) {
expr jcs:syslog("daemon.error", message);
}
}
/* snmp utility mib */
template snmp_update($connection, $instance, $obj-type, $obj-value) {
var $rpc = {
<request-snmp-utility-mib-set> {
<instance> $instance;
<object-type> $obj-type;
<object-value> $obj-value;
}
}
var $result = jcs:execute($connection, $rpc);
if ($result/..//xnm:error) {
call rpc_failure($rpc = $result/.., $message = "Error updating utility mib!");
}
}
So the script does a few things and repeats the actions for each of nodes of the cluster. Some of the features are described here:
- jcs:dampen() used to prevent the script being executed multiple times at the same time.
- Make RPC to get-snapshot-information, this is equivalent to "show system snapshot media internal".
- Iterate through the response for each routing-engine, e.g. node0 and node1
- Verify that each node has 2 partitions (primary and backup)
- Verify that the partitions on each node have the same junos version present.
- Send snmp traps when either step 4 or step 5 are not true.
- Update the Juniper utility mib with a status value [0|1] where 0 is good, and 1 is bad for each node.
- Update the Juniper utility mib with a status string describing the state of the snapshots for each node.
Based on that the expected state would be, that the utility MIB is updated with the following information that could be retrieved via SNMP. e.g.
show snmp mib walk jnxUtil
jnxUtilIntegerValue.110.111.100.101.48.95.115.116.97.116.117.115 = 0
jnxUtilIntegerValue.110.111.100.101.49.95.115.116.97.116.117.115 = 0
jnxUtilIntegerTime.110.111.100.101.48.95.115.116.97.116.117.115 = 07 e8 04 09 0e 16 15 00 2b 00 00
jnxUtilIntegerTime.110.111.100.101.49.95.115.116.97.116.117.115 = 07 e8 04 09 0e 16 16 00 2b 00 00
jnxUtilStringValue.110.111.100.101.48 = Both partitions have same version 12.1X46-D86-domestic
jnxUtilStringValue.110.111.100.101.49 = Both partitions have same version 12.1X46-D86-domestic
jnxUtilStringTime.110.111.100.101.48 = 07 e8 04 09 0e 16 15 00 2b 00 00
jnxUtilStringTime.110.111.100.101.49 = 07 e8 04 09 0e 16 16 00 2b 00 00
or using ascii output
show snmp mib walk jnxUtil ascii
jnxUtilIntegerValue."node0_status" = 0
jnxUtilIntegerValue."node1_status" = 0
jnxUtilIntegerTime."node0_status" = 07 e8 04 09 0e 16 15 00 2b 00 00
jnxUtilIntegerTime."node1_status" = 07 e8 04 09 0e 16 16 00 2b 00 00
jnxUtilStringValue."node0" = Both partitions have same version 12.1X46-D86-domestic
jnxUtilStringValue."node1" = Both partitions have same version 12.1X46-D86-domestic
jnxUtilStringTime."node0" = 07 e8 04 09 0e 16 15 00 2b 00 00
jnxUtilStringTime."node1" = 07 e8 04 09 0e 16 16 00 2b 00 00
I don't have an example of the SNMP trap that was sent, but it shouldn't be too difficult to see how the event trap is defined., and the MIB file can be found below,
https://www.juniper.net/documentation/en_US/junos/topics/reference/mibs/mib-jnx-event.txt
Regards
------------------------------
Andy Sharp
------------------------------
Original Message:
Sent: 04-05-2024 03:22
From: ANDREAS WESTERGAARD ANDERSEN
Subject: Dual root, but different JUNOS versions
Hello Erdem
Have you found a solution for monitoring this issue?
We have the same problem..
------------------------------
ANDREAS WESTERGAARD ANDERSEN
Original Message:
Sent: 03-11-2013 13:31
From: Erdem
Subject: Dual root, but different JUNOS versions
Is there a way to catch when the partitions are not the same via a SNMP OID via polling or a trap?
We've seen the error "WARNING: JUNOS versions running on dual partitions are not same" in the logs but tht's it.