Routing
Highlighted
Routing

CGNAT (napt44) on MS-DPC randomly stops working for some customers.

[ Edited ]
‎03-09-2020 07:32 AM

I've been working on setting up a CG-NAT on our MX480 which has 2 MS-DPC cards. I've read the Day-One book and scoured a lot of forums, so I think I have it setup ok. I must be missing something though, because about 2-3 hours after turning it on, we start to have issues where certain customers stop passing any traffic at all. I've checked our nat pool and there aren't any errors. We're NAT'ing about 1700 customers behind about 110 IPs using NAPT, giving them PBA's of 1024 with a 16 block limit. Our flows never go above ~190k. 

 

Here's our setup: 

 

services {
  service-set napt44-svc {
      syslog {
          host local {
              services any;
              log-prefix napt44-svc;
              class {
                  stateful-firewall-logs;
                  alg-logs;
                  nat-logs;
              }
          }
      }
      nat-rules napt44-rule;
      interface-service {
          service-interface rsp1;
      }
  }
}

 nat {
  pool napt44-pool {
      address 1.2.2.0/26;   # < dummy public IP
      address 1.2.3.128/27; # < dummy public IP
      address 1.2.3.48/28;  # < dummy public IP
      port {
          automatic {
              random-allocation;
          }
          secured-port-block-allocation block-size 1024 max-blocks-per-address 16;
      }
      address-allocation round-robin;
      mapping-timeout 120;
  }

  rule napt44-rule {
      match-direction output;
      term nat-term1 {
          from {
              source-prefix-list {
                  napt44-private;
              }
          }
          then {
              translated {
                  source-pool napt44-pool;
                  translation-type {
                      napt-44;
                  }
                  address-pooling paired;
              }
              syslog;
          }
      }
  }

 }


ae1 {
  unit 0 {
      family inet {
          filter {
              input-list [ block-ip-ingress block-port-ingress static-src-filter default-accept ];
              output-list [ block-ip-egress block-port-egress default-accept ];
          }
          service {
              input {
                  service-set napt44-svc service-filter napt44-ingress-filter;
              }
              output {
                  service-set napt44-svc service-filter napt44-egress-filter;
              }
          }
          # address omitted
      }
  }
}

When a customer stops passing traffic from their private IP, I do the usual and check for any AP-P port allocation errors, etc, but it shows zero. I checked the stateful-firewall flows for the customer and see a lot of DNS requests, but nothing much else. My ports in use tops out at about 95,257, and unique pool users around 1710. 

 

I just can't figure out why after a few hours, a ton of random customers simply stop working, even though they have flow data. If it was port exhaustion, it would show up in "show services nat pool detail" as an error. 

 

How can I troubleshoot why certain IPs simply stop working? Right now our old mikrotik is doing NAT just fine, but I reeeaally want to get away from Mikrotik on our edge. 

 

3 REPLIES 3
Routing

Re: CGNAT (napt44) on MS-DPC randomly stops working for some customers.

‎03-10-2020 05:56 AM

After another early morning of testing, it /may/ be stable so far. I turned off PBA, narrowed our NAT pool to a /26 (as per guideline recommendations), and added some ALG support for DNS/ICMP. With APP enabled, in addition to PBA with large blocks and a high max count, I suspect an IP had all its port blocks assigned, but because of APP it didn't try to fetch a block from another IP and simply started dropping packets. Since the customer was still under their PBA limit, I never saw any AP-P Port allocation errors. PBA + APP seems to have some hidden cases where it just drops packets. 😕 

 

I'm going to keep testing and see where we get in a few days. 

Highlighted
Routing

Re: CGNAT (napt44) on MS-DPC randomly stops working for some customers.

[ Edited ]
‎03-11-2020 06:25 AM

Another morning testing, and the issue is still present. Turning off TBA didn't help. As far as I can tell, there might be a hardware/software error that happens and then the firewall or nat system fails to create new flows. I tried logging as much as I could. Any clues as to where I should look? 

Highlighted
Routing
Solution
Accepted by topic author EchoB
‎03-12-2020 09:36 AM

Re: CGNAT (napt44) on MS-DPC randomly stops working for some customers.

‎03-12-2020 09:36 AM

Alright, insert "I am an idiot sandwich" pic here. Turns out we had one IP in our NAT pool range that we had blacklisted due to a DDoS event. With APP on, a customer got assigned to that IP with their DNS and subsequently all DNS would stop for that customer. 😛  Finding it (other than remembering that we had done that) involved turning on syslog for the NAT service set and comparing downed clients with the logs. With ~1,700 clients being NAT'd, we were getting 50k events a minute (~2GB /hour of logs)!  But we finally got it! 

Feedback