Junos
Junos

EX2300-48T Stop working after random time after upgrade to 19.1R1.6

[ Edited ]
‎05-05-2019 06:24 AM

Have some trouble:

After upgarade to 19.1R1.6 Junos on my EX2300-48 switches (have 10 switches) - i have some problem:

1) 3 of 10 switches stop working after  a certain amount of time (one switch a day after the update, the second after a week) and only the reboot helps. Unfortunately, I do not see what is happening with it at the moment, core dumps are empty

2) show chassis enviroment dont show second fan and show chassis alarms - no alarms - its software bug or its really second fan doesnt work? On 18.2R1.9 Junos second fan is visible.

3) and of course jweb not work - but this problem was in release note document.1.PNGex2300-48t with Junos 18.2R1.92.PNGex2300-48T with Junos 19.1R1.6

 

120 REPLIES 120
Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎05-05-2019 11:56 AM

Stops working meaning what? What troubleshooting have you done?

Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎05-17-2019 04:01 AM

Good day,

 

Is it switching off with overheat messsage?

This can be a problem of the "show" command. (Display issue) or can be real problem with FANs.

Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎06-01-2019 04:19 AM

No, Temp is normal. Switch  just down. L2 domain is working- all other no. Cant view anything from console connection or management port. Only reboot work. On 18.2R1  and 18.2R2 all is ok.

Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎06-03-2019 05:18 AM

Good Day,

 

According to the next document, EX2300 Switches without PoE Capability have only one fan.

https://www.juniper.net/documentation/en_US/release-independent/junos/topics/reference/general/cooli...

 

Basically, the fact that second fan is visible on 18.2R1.9 is a bug, fixed in public PR# 1361696

https://prsearch.juniper.net/InfoCenter/index?page=prcontent&id=PR1361696

PR# 1361696 is fixed in 19.1R1 JunOS version, but not in 18.2R1.9

Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎06-03-2019 06:32 AM

We are seeing the same problem with our EX2300-C units since upgrading to 18.1R3-S4.2 (from 15.1X53-D590.1).  I have a case open and have just escalated since we haven't gotten anywhere in the last two weeks.  My fear is this hitting our production EX3400 virtual chassis installations.

 

We see the same as described... doesn't respond to management traffic and the console port is unresponive.  It also stops forwarding syslog traffic.  It really looks like the RE is "dead" but it keeps forwarding L2 traffic.

Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎07-16-2019 06:24 AM

Hi,

I'm seeing the same behavior on an EX2300-24T running 18.3R2.7

Symptoms:
My laptop and a bunch of Juniper MX and QFX mgmt interfaces are connected to the switch in the same VLAN. After a while (few hours), new SSH sessions from my laptop to any other device connected on the switch in the same VLAN fails, while an existing SSH session keeps working until I disconnect.
The switch however is unresponsive for management: no output on the serial console port, and no SSH connection possible. Needed to pull the plug to reboot the switch to get it working again (check the serial console and ssh again, after the reboot they were fine). 
The funky part is that we have 2 EX2300 switches configured almost exactly the same way, one in each of 2 racks.

Did you get any feedback from JTAC regarding the matter?

Regards,
Dante

 

 

Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎07-16-2019 06:30 AM

JTAC closed my case since they couldn't reproduce (having a switch idling).  We went back to 15.1X53-D591.1.

Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎07-28-2019 12:07 AM

This is probably some obscure bug. When I cleaned up the config (my colleague left a lot of default stuff in it, it's a lab setup), the switch hasn't stalled on me for at least a week.

Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎08-22-2019 09:02 AM

I've seen the same problem on a EX-2300-C-PoE running 19.2R1.8. It's new, running for a month now and I've seen it twice.

Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎08-26-2019 11:27 AM

Hello RJTaylor, 

        we were about to deploy 250 EX2300-C in june but they started exhibiting the behavior you and other in that thread described: 

- no access SSH

-no ping

-no access console

-SPD/DX/EN switch on the right do not respond

-when disconnecting interface cable, leds keep blinking

-the only way to reset the switch was to disconnect and reconnect AC feed.

-seems the trafic on connected interfaces keep flowing. You can't connect a new interface, though. 

the issue affect at minimum image 19.1 and also 18.1R3 . The switches were delivered with 15.1 and we had upgraded them. 

Not sure if going back to 15.1 would resolve the issue.

pened a case in june and escalated it. We have had an image beta to try in july and no switch went zombie with it. Not a prod image yet, though. 

what are you up to ? 

Michel

 

Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎08-27-2019 12:34 AM

@LapointeMichel wrote:

<snip>

- no access SSH

-no ping

-no access console

-SPD/DX/EN switch on the right do not respond

-when disconnecting interface cable, leds keep blinking

-the only way to reset the switch was to disconnect and reconnect AC feed.

-seems the trafic on connected interfaces keep flowing. You can't connect a new interface, though. 

</snip>


This describes the issue I'm seeing exactly. I even had the "privilege" to experience that an existing SSH session kept working when the issue started. Any new connection (physical or logical) fails. It seems the switch keeps working with as-is information, but isn't processing any new information, like connecting/disconnecting an interface, adding a new address in the mac or arp table, etc.

I've got a case running, but not much progress yet. The engineer might have found a related PR, and he's asking to open an new case 8-). Do you have a case number? I'd like to mention it in our case.

 

Regards,

Dante

 

Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎08-27-2019 05:02 AM

Bonjour,

My pleasure:  Service Request 2019-0603-0600 "4 EX2300 Switches went into an unreachable state for no particular reason" was opened on June 3rd 2019. it is now high priority, and was moved to PR sometime in mid june. We were about to deploy them in 245 sites and while configuring them in lab, they kept goign down, roughly 1 /day . It has grown since then to 161 notes and 87 attached files. Highlight happenned on july 1st when I installed a beta image on 60, then 245 ex2300 and those never went into zombie state. I don't have any date for a production release, so we are now planning to go back to the 15.1X53.D591.1  which was released on may 17, 2019. I plan to install it on 100 switches today and wait for failures. I'll keep you posted. 

I tried this weekend 18.1R3S7.1 which was released mid august. I installed it on 109 switches and by monday, I had 4 zombies ! 

By the way, we had 2 switches in production going zombie (they were running 19.1 and 18.3) and trafic kept flowing with no issues from users. But once a switch is in that state, you can't connect a new interface.

Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎08-27-2019 12:36 PM

hello everyone,  

 

as some of you have stated this is an ongoing PR investigation, the PR has not been published yet so the details I cannot share here but you should ask your JTAC or account team, this is known to affect 2300s only not 3400.

 

hope that helps in some way.

I help you, you help me... please share a Kudos or accepted solution whenever you feel I have helped with your problem! Smiley Happy
Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎09-03-2019 05:11 AM
this is known to affect 2300s only not 3400.

Carlos, 

Thanks for posting this!  It's a big relief for us.  We've been entirely unsure what to do with our EX3400s since May.

 

RJ

Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎09-03-2019 05:18 AM

Not sure if going back to 15.1 would resolve the issue.

pened a case in june and escalated it.

 Salut Michel,

 

I've been out of the office for a few weeks so I'm just seeing your message now.  I'm glad to hear that you have a working beta.  We're looking forward to its release to production.

 

To answer your question, we began getting watchdog/swizzle reboots with 15.1X53-D591.1.  We were told this should be fixed somewhere in 18.3R3 but it was unclear whether this includes the fixes for the zombie state we're seeing or just the swizzle reboot.

 

We're currently running 18.1R3-S6.1 and haven't had a zombie in three weeks.  That said, our sample size is *way* smaller than yours.

Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎09-03-2019 11:04 AM

Bon mardi, 

before the holiday weekend, I finished downgrading roughly 130 switches to 15.1 591.1  No zombies yet, and that amount of switches was a gaarantee since june to have average one a day going zombie on images 18.n or 19,1 . I have the "chance" of having 245 switches plugged and available for test until I get a reliable image so I can deploy them and be sure I will not have to drive 90mns to unplug the AC to re-establish communication. Deployment has been on hold since june and I can tell you that it is not easy to squat unused spaces with 250 switches !!!

I am trying today to finish the other ones and if I can have 5 days with no zombies, then we'll keep this image. we just add this weekend a 19.1 going zombie and we plan to downgrade it when we are sure our configuration made on 19.1 is accepted by 15.1.

we are tryning this way since I can't get a production date on the test version we have.

I'll keep you posted on the success-failure of the test.  Question for you: I don't think I experienced the sizzle-reboot you mention. How does that happenned and what traces does it leaves ? I can access switches log, and also run Junos Space and Network Director.

Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎09-03-2019 11:20 AM

@LapointeMichel wrote:

Question for you: I don't think I experienced the sizzle-reboot you mention. How does that happenned and what traces does it leaves?


We have not found a pattern causing the swizzle reboot.  It's a known bug but there doesn't seem to be anything in our environment that we could change to prevent it.

 

You can tell you've had a Swizzle (watchdog) reboot by checking "show chassis routing-engine".   The last reboot reason will read: "0x8000Smiley Frustratedwizzle reboot".

Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎09-03-2019 12:48 PM

You can tell you've had a Swizzle (watchdog) reboot by checking "show chassis routing-engine".   The last reboot reason will read: "0x8000Smiley Frustratedwizzle reboot".

 

thanks RJTaylor. I'll keep an eye open. 

184 switches are running15.1X53 D591.1 (we call it "may 2019 release"). I'll finish the last ones tomorrow.  The minute one goes zombie, that's the end of the test for me. If you don't hear from me, then it means it's stil going. 5 straight days with no failures will reassure me. 

I am a bit concerned about how long Juniper will support 15.1.  But it's going to take a lot to convince me to upgrade.

by the way, if you upgrade , you may want to check an issue we had: no leds going up when connecting the optical interface connectors., even though traffic flows. 15.1 is ok. this is EX2300-C we are talking about.

 

 


 

Junos

Re: EX2300-48T Stop working after random time after upgrade to 19.1R1.6

‎09-03-2019 01:14 PM

Hi Michel,

Thanks a lot for sharing!

We have about 15 EX2300-24Ts that will be rolled out in an OOB setup for a customers new (Juniper) MPLS network, gradually during September-October. We've downgraded them all to 15.1X (latest) to avoid the "freezing" issue, which we have encountered several times in the lab setup. There are only 2 switches in that setup and we hit the issue multiple times per week before the downgrade, which we decided on end of last week. Unfortunately we don't have the option to postpone the roll-out and test more. If any show the issue after downgrading, I'll share it in this post. Hopefully they don't, in which case I'll be able to give an (positive) update at the end of the roll-out + some extra time, say mid November.

Regards,

Dante