I've been scratching my head for a week as to what is the best solution to my problem. I haven't found any other discussion on this topic which I think is pretty common:
We have 2 BGP sessions with our upstreams, each announcing a default-route. If one of the ISP performs an upgrade on their router, BGP goes down and life is good. However, as their router reloads, the BGP session with us - announcing a default-route - is coming back up before their router as the full table loaded, resulting in a ~30 seconds outage after the BGP neighbor is back up.
** I depicted the situation as most people would see it, in our case we have an advantage, we are also the upstream provider.
Here are my thought:
1. Have the upstream (still us in our case) receive the default-route from the internet and propagate it to the customer, in this case the default route wouldn't be generated by the PE, avoiding the blackhole situation on reload
2. Use a recursive default route with a next-hop of a summary (eg 192.0.2.0/24) from the upstream (e.g. static route 0.0.0.0/0 next-hop 192.0.2.0)
I don't like the first one as it means introducing a default-route in our public ASN.
I don't like the second one as it still a static route in a fully dynamic environment