Juniper vMX – Getting Started Guide

I’m excited to finally have the opportunity to play with Juniper’s vMX! Since it was announced last year I’ve been eagerly waiting for release – a couple of client projects already have passed by where the vMX would have been a perfect fit. vMX already won an award earlier this year at Interop Tokyo 2015!

In this post I’ll be giving a bit of background on the vMX architecture and licensing, and then go on to walk through a lab based configuration of vMX.

The vMX is a virtual MX Series Router that is optimized to run as software on x86 servers. Like other MX routers, it runs Junos, and Trio has been compiled for x86! Yes, that means the sophisticated L2, L2.5 and L3 forwarding features we are used to on the MX are present on the vMX.

Architecture

vMX can be installed on server hardware of your choice, so long as it is x86 based and running Linux (although I’m sure a version to run on vmware won’t be too far away).

vMX itself actually consists of two separate VMs – a virtual forwarding plane (VFP) running the vTrio, and a virtual control plane (VCP) running Junos.

The Linux virtualisation solution KVM is what Juniper are using to spin up the virtual instances of the control and forwarding planes, and multiple instances of vMX can be run on the same hardware. To see Juniper using Linux and KVM is no surprise as this is what we are used to on Juniper’s other products such as the QFX.

The VMs are managed by a simple orchestration script which is used to create, stop and start the vMX instances. A simple configuration file defines parameters such as memory and vCPUs to allocate to the VCP and VFP.

A couple of Linux bridges are created by the orchestration script. Clearly VCP and VFP need to be able to communicate directly so an “internal” bridge is automatically created for each vMX instance to enable this communication.  An “external” bridge is also created, this is used to enable the management interface on the Linux Physical host to be used for the virtual management interfaces on the VCP and VFP.

For data interfaces, there are a couple of techniques available for packet I/O depending on the required vMX throughput –

  • Paravirtualisation using KVMs virtio drivers
  • PCI passthrough using single root I/O virtualisation (SR-IOV), enabling packets to bypass the hypervisor and therefore increase I/O.

Juniper recommend virtio or SR-IOV up to 3Gbps, and SR-IOV over 3Gbps (using a minimum of 2 x 10GE interfaces).

Which you will choose will ultimately depend on  your use case for the vMX.

Licensing

Now this is what I really like about vMX! Licensing is based on a combination of throughput and features, and the lowest available throughput license is 100Mbps! Yes – you don’t need to be shifting multi-Gigabits of traffic to start with vMX. You can start small and pay-as-you-grow with vMX.

Below 1Gbps there are only 3 options – 100Mbps, 250Mbps and 500Mbps. Full scale features are included! List price on the 100Mbps option is a very reasonable $750.

At 1Gbps and above, licences are a combination of features (Base, Advance, and Premium) and full duplex throughput (1G, 5G, 10G, 40G)

  • Base – IP routing with 32,000 routes in the forwarding table. Basic Layer 2 functionality, Layer 2 bridging and switching.
  • Advance – Features in the BASE application package IP routing with routes up to platform scale in the forwarding table. IP and MPLS switching for unicast and multicast applications. Layer 2 features include Layer 2 VPN, VPLS, EVPN, and Layer 2 Circuit
    VXLAN.
  • Premium – Features in the BASE and ADVANCE application packages. Layer 3 VPN for IP and multicast

Setting up vMX on Ubuntu

Now I’m going to walk through setting up vMX on Ubuntu 14.04 LTS server (Juniper’s recommended flavour of Linux for vMX). Just for fun this is actually running as a nested Vmware VM on my Macbook Pro – fine for a lab, but don’t try this in production! 🙂 I have allocated 8GB RAM, 4 vCPUs  and two vNICs to the Ubuntu VM. Also the VM is enabled to support hypervisor applications within the VM.

At this point Ubuntu Server has been freshly installed, and the option to install virtualisation was selected during setup.

First things first, let’s update all packages, install the prerequsite packages and restart the system

mdinham@ubuntu:~$ sudo apt-get upgrade
<snip>
mdinham@ubuntu:~$ sudo apt-get install bridge-utils qemu-kvm libvirt-bin python python-netifaces vnc4server libyaml-dev python-yaml numactl libparted0-dev libpciaccess-dev libnuma-dev libyajl-dev libxml2-dev libglib2.0-dev libnl-dev python-pip python-dev libxml2-dev libxslt-dev
<snip>
mdinham@ubuntu:~$ sudo reboot

Configuring vMX

As this is a lab based build, I will be using virtio for the virtual NIC. There are two options on the VFP – a “Lite” version PFE for labs and performance version for normal operation.

Note: Ubuntu 14.04 provides libvirt 1.2.2 which works for VFP lite version. However for the VFP performance version you must upgrade to libvirt 1.2.8.

Let’s extract the vMX application bundle and get going!

mdinham@ubuntu:~$ tar xzf vmx-14.1R5.4-1.tgz
mdinham@ubuntu:~$ cd vmx-14.1R5.4-1/
mdinham@ubuntu:~/vmx-14.1R5.4-1$ ls
config drivers env images scripts vmx.sh

First of all we need to setup the vmx config file, this is done by editing config/vmx.conf

First of all I set an instance name for vmx, and set the correct vmx images. I’m using vPFE-lite.

---
#Configuration on the host side - management interface, VM images etc.
HOST:
    identifier                : vmx1   # Maximum 4 characters
    host-management-interface : eth0
    routing-engine-image      : "/home/mdinham/vmx-14.1R5.4-1/images/jinstall64-vmx-14.1R5.4-domestic.img"
    routing-engine-hdd        : "/home/mdinham/vmx-14.1R5.4-1/images/vmxhdd.img"
    forwarding-engine-image   : "/home/mdinham/vmx-14.1R5.4-1/images/vPFE-lite-20150707.img"

Now the parameters the control plane and forwarding plane.

I’ve allocated 1 vCPU to vRE and 3 vCPU to vPFE. 1GB RAM to the RE and 6GB to the forwarding plane, as per the defaults for 14.1

UPDATE: Feb 2016
For vMX on 15.1, allocate 1 vCPU to vRE and 3 vCPU to vPFE. 2GB RAM to the RE and 8GB to the forwarding plane.

I have also tried vMX with 2GB allocated to the vPFE and the forwarding plane loaded, which could be fine for lab purposes. I’d expect 1GB to be the minimum on the vRE. 3 x vCPU seems to be the minimum for the vPFE.

Note that device-type is set to “virtio” for the interfaces.

---
#External bridge configuration
BRIDGES:
    - type  : external
      name  : br-ext                  # Max 10 characters

---
#vRE VM parameters
CONTROL_PLANE:
    vcpus       : 1
    memory-mb   : 2048
    console_port: 8601

    interfaces  :
      - type      : static
        ipaddr    : 10.102.144.94
        macaddr   : "0A:00:DD:C0:DE:0E"

---
#vPFE VM parameters
FORWARDING_PLANE:
    memory-mb   : 6144
    vcpus       : 3
    console_port: 8602
    device-type : virtio

    interfaces  :
      - type      : static
        ipaddr    : 10.102.144.98
        macaddr   : "0A:00:DD:C0:DE:10"

---
#Interfaces
JUNOS_DEVICES:
 - interface : ge-0/0/0
 mac-address : "02:06:0A:0E:FF:F0"
 description : "ge-0/0/0 interface"

I will only be using one interface in this lab, but up to 10 can be configured. For SR-IOV, things are done slightly differently – see this vMX doc for reference.

I now need to deploy the vMX instance using the orchestration script. “-lv” provides verbose logging. My vMX instance will be created by the script and automatically started.

mdinham@ubuntu:~/vmx-14.1R5.4-1$ sudo ./vmx.sh -lv --install
==================================================
 Welcome to VMX
==================================================
Date..............................................07/18/15 13:19:03
VMX Identifier....................................vmx1
Config file......................................./home/mdinham/vmx-14.1R5.4-1/config/vmx.conf
Build Directory.................................../home/mdinham/vmx-14.1R5.4-1/build/vmx1
Environment file................................../home/mdinham/vmx-14.1R5.4-1/env/ubuntu_virtio.env
Junos Device Type.................................virtio
Initialize scripts................................[OK]
Copy images to build directory....................[OK]
==================================================
 VMX Environment Setup Completed
==================================================
==================================================
 VMX Install & Start
==================================================
Linux distribution................................ubuntu
Check GRUB........................................[Disabled]
Installation status of qemu-kvm...................[OK]
Installation status of libvirt-bin................[OK]
Installation status of bridge-utils...............[OK]
Installation status of python.....................[OK]
Installation status of libyaml-dev................[OK]
Installation status of python-yaml................[OK]
Installation status of numactl....................[OK]
Installation status of libnuma-dev................[OK]
Installation status of libparted0-dev.............[OK]
Installation status of libpciaccess-dev...........[OK]
Installation status of libyajl-dev................[OK]
Installation status of libxml2-dev................[OK]
Installation status of libglib2.0-dev.............[OK]
Installation status of libnl-dev..................[OK]
Check Kernel Version..............................[Disabled]
Check Qemu Version................................[Disabled]
Check libvirt Version.............................[Disabled]
Check virsh connectivity..........................[OK]
IXGBE Enabled.....................................[Disabled]
==================================================
 Pre-Install Checks Completed
==================================================
Check for VM vcp-vmx1.............................[Not Running]
Check for VM vfp-vmx1.............................[Not Running]
Cleanup VM states.................................[OK]
Check if bridge br-ext exists.....................[No]
Cleanup VM bridge br-ext..........................[OK]
Cleanup VM bridge br-int-vmx1.....................[OK]
==================================================
 VMX Stop Completed
==================================================
Check VCP image...................................[OK]
Check VFP image...................................[OK]
VMX Model.........................................Lite
Check VCP Config image............................[OK]
Check management interface........................[OK]
Setup huge pages to 8192..........................[OK]
Attempt to kill libvirt...........................[OK]
Attempt to start libvirt..........................[OK]
Sleep 2 secs......................................[OK]
Check libvirt support for hugepages...............[OK]
==================================================
 System Setup Completed
==================================================
Get Management Address of eth0....................[OK]
Generate libvirt files............................[OK]
Sleep 2 secs......................................[OK]
Find configured management interface..............eth0
Find existing management gateway..................eth0
Check if eth0 is already enslaved to br-ext.......[No]
Gateway interface needs change....................[Yes]
Create br-ext.....................................[OK]
Get Management Gateway............................192.168.100.254
Flush eth0........................................[OK]
Start br-ext......................................[OK]
Bind eth0 to br-ext...............................[OK]
Get Management MAC................................00:0c:29:76:a8:15
Assign Management MAC 00:0c:29:76:a8:15...........[OK]
Add default gw 192.168.100.254....................[OK]
Create br-int-vmx1................................[OK]
Start br-int-vmx1.................................[OK]
Check and start default bridge....................[OK]
Define vcp-vmx1...................................[OK]
Define vfp-vmx1...................................[OK]
Wait 2 secs.......................................[OK]
Start vcp-vmx1....................................[OK]
Start vfp-vmx1....................................[OK]
Wait 2 secs.......................................[OK]
==================================================
 VMX Bringup Completed
==================================================
Check if br-ext is created........................[Created]
Check if br-int-vmx1 is created...................[Created]
Check if VM vcp-vmx1 is running...................[Running]
Check if VM vfp-vmx1 is running...................[Running]
Check if tap interface vcp_ext-vmx1 exists........[OK]
Check if tap interface vcp_int-vmx1 exists........[OK]
Check if tap interface vfp_ext-vmx1 exists........[OK]
Check if tap interface vfp_int-vmx1 exists........[OK]
==================================================
 VMX Status Verification Completed.
==================================================
Log file..........................................
 /home/mdinham/vmx-14.1R5.4-1/build/vmx1/logs/vmx_1437221943.log
==================================================
 Thankyou for using VMX
==================================================

Connecting to the console port on the VMs

We can now connect to the vMX control plane! This is done using the vmx.sh script again.

Specify vcp (control plane – Junos) or vcf (vPFE) and the instance name.

mdinham@ubuntu:~/vmx-14.1R5.4-1$ ./vmx.sh --console vcp vmx1
--
Login Console Port For vcp-vmx1 - 8601
Press Ctrl-] to exit anytime
--
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.


Amnesiac (ttyd0)

login:

After a while, the FPC and interfaces will come online

root> show interfaces terse | match ge-0/0/0
ge-0/0/0 up up

root> show chassis fpc
 Temp CPU Utilization (%) Memory Utilization (%)
Slot State (C) Total Interrupt DRAM (MB) Heap Buffer
 0 Online Absent 100 0 512 14 0

I’ll go ahead and add an IP address to ge-0/0/0. Note: if I was using the management interface I could configure interface FXP0 also now. Remember FXP0 will be bridged to the host eth0 adapter (or an adapter you specify).

root# set interfaces ge-0/0/0.0 family inet address 192.168.100.5/24

Can I ping anything?

root> ping 192.168.100.5
PING 192.168.100.5 (192.168.100.5): 56 data bytes
64 bytes from 192.168.100.5: icmp_seq=0 ttl=64 time=0.059 ms
^C
--- 192.168.100.5 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.059/0.059/0.059/0.000 ms

root> ping 192.168.100.254
PING 192.168.100.254 (192.168.100.254): 56 data bytes
^C
--- 192.168.100.254 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss

OK, so I can ping the interface but nothing else on the host. As I’m using virtio I need to create a device binding between the host physical NIC and the vMX interface.

Creating a virtio binding

This is done in the config file config/vmx-junosdev.conf.

virtio bindings are flexible and can be used to map multiple vMX instances to the same physical host interface, or to connect vMX instances together.

A new Linux bridge will be created between host interface eth1 and ge-0/0/0 on vmx1.

##############################################################
#
#  vmx-junos-dev.conf
#  - Config file for junos device bindings.
#  - Uses YAML syntax.
#  - Leave a space after ":" to specify the parameter value.
#  - For physical NIC, set the 'type' as 'host_dev'
#  - For junos devices, set the 'type' as 'junos_dev' and
#    set the mandatory parameter 'vm-name' to the name of
#    the vPFE where the device exists
#  - For bridge devices, set the 'type' as 'bridge_dev'
#
##############################################################
interfaces :

     - link_name  : vmx_link
       endpoint_1 :
         - type        : junos_dev
           vm_name     : vmx1
           dev_name    : ge-0/0/0
       endpoint_2 :
         - type        : host_dev
           dev_name    : eth1

If eth1 is not already up on the Linux host, bring it up

sudo ifconfig eth1 up

Again the orchestration script vmx.sh is used to create the device bindings

mdinham@ubuntu:~/vmx-14.1R5.4-1$ sudo ./vmx.sh --bind-dev
Bind Link vmx_link(ge-0.0.0-vmx1, eth1)...........[OK]

And we can see a new bridge has been created called “vmx_link” as referenced in the bindings configuration file

mdinham@ubuntu:~/vmx-14.1R5.4-1$ brctl show
bridge name     bridge id               STP enabled     interfaces
br-ext          8000.000c2976a815       yes             br-ext-nic
                                                        eth0
                                                        vcp_ext-vmx1
                                                        vfp_ext-vmx1
br-int-vmx1             8000.52540050c859       yes             br-int-vmx1-nic
                                                        vcp_int-vmx1
                                                        vfp_int-vmx1
virbr0          8000.fe060a0efff1       yes             ge-0.0.1-vmx1
                                                        ge-0.0.2-vmx1
                                                        ge-0.0.3-vmx1
vmx_link                8000.000c2976a81f       no              eth1
                                                        ge-0.0.0-vmx1

Now to retry that ping!

mdinham@ubuntu:~/vmx-14.1R5.4-1$ ./vmx.sh --console vcp vmx1
--
Login Console Port For vcp-vmx1 - 8601
Press Ctrl-] to exit anytime
--
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.


root> ping 192.168.100.254
PING 192.168.100.254 (192.168.100.254): 56 data bytes
64 bytes from 192.168.100.254: icmp_seq=0 ttl=64 time=4.951 ms
64 bytes from 192.168.100.254: icmp_seq=1 ttl=64 time=2.081 ms
^C
--- 192.168.100.254 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 2.081/3.516/4.951/1.435 ms

Success! At this point I’ve a working vMX with an interface mapped to a NIC on the Ubuntu host. What happens if I turn on OSPF and LDP?

root> show ospf neighbor
Address          Interface              State     ID               Pri  Dead
192.168.100.254  ge-0/0/0.0             Full      10.0.0.2           1    37
192.168.100.1    ge-0/0/0.0             Full      10.0.0.1         128    39

root> show ldp neighbor
Address            Interface          Label space ID         Hold time
192.168.100.254    ge-0/0/0.0         10.0.0.2:0               13

Excellent, now the fun can really begin, but I’ll save that for another time!

vPFE

One last thing – what does the VFP look like?

mdinham@ubuntu:~/vmx-14.1R5.4-1$ sudo ./vmx.sh --console vfp vmx1
--
Login Console Port For vfp-vmx1 - 8602
Press Ctrl-] to exit anytime
--
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.


Wind River Linux 6.0.0.12 localhost console

localhost login: root
Password:
Last login: Sat Jul 18 12:20:49 UTC 2015 on console

The riot process is where all the magic happens!

top - 12:52:55 up 33 min,  1 user,  load average: 0.38, 0.74, 0.62
Tasks: 102 total,   1 running, 101 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5%us,  2.1%sy,  0.0%ni, 96.8%id,  0.0%wa,  0.0%hi,  0.6%si,  0.0%st
Mem:   5824060k total,  4454308k used,  1369752k free,    12184k buffers
Swap:        0k total,        0k used,        0k free,    44552k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1019 root      20   0 36.6g  37m  10m S   12  0.7   4:25.04 riot
<snip>

Further reference

I hope you enjoyed this vMX post! For further reference on any of the above material please see the Juniper Release Notes for vMX

Junos – securing the RE (filter order is important – eBGP running slow?)

First of all – the Juniper Day One books are a superb resource for learning Junos. If you’ve not checked out the library already – do it now! 😉

Recently a client of mine found a bit of a gotcha with the framework filters discussed in the Day One – Securing the Routing Engine (and also the O’Reilly MX book which references the same material).

These books are both great references for deriving a set of RE filters to secure your Junos routers, so I won’t go in to the detail here – just check out the books!

Now on the the gotcha… I’m also mailing a link to this post to the Juniper community folks, so hopefully a revised copy can be linked on the website soon 🙂

UPDATE August 2015: there is now a revised version of the day 1 that mentions the issue discussed in this post and I have also reviewed the O’Reilly MX book and suggested an edit.

The books provide a detailed framework set of filters for protecting the routing engine, allowing you to pick and choose and chain filters as necessary. But if you chain the filters exactly as shown in the example material, you might have some problems. The example material is great, just remember to adapt it to your environment and only include what you need.

Below is an example of the filter order

input-list [ accept-common-services accept-ospf accept-rip accept-bfd accept-bgp accept-ldp accept-rsvp discard-all ];

Filter “accept-common-services” is first in the chain, which essentially references a further set of filters to be checked in order. Let’s take a look at what that is doing. Pretty self explanatory 🙂

filter accept-common-services {
   apply-flags omit;
   term accept-icmp {
      filter accept-icmp;
   }
   term accept-traceroute {
      filter accept-traceroute;
   }
   term accept-ssh {
      filter accept-ssh;
   }
   term accept-snmp {
      filter accept-snmp;
   }
   term accept-ntp {
      filter accept-ntp;
   }
   term accept-web {
      filter accept-web;
   }
   term accept-dns {
      filter accept-dns;
   }
}

The filter we need to look is “accept-traceroute”. In the Day 1, there are terms for UDP, ICMP and TCP, but I’ve only shown the TCP term below.

filter accept-traceroute {
/* omitted text - UDP and ICMP terms */ 
  apply-flags omit;
   term accept-traceroute-tcp {
      from {
         destination-prefix-list {
            router-ipv4;
            router-ipv4-logical-systms;
         }
         protocol tcp;
         ttl 1;
      }
      then {
         policer management-1m;
         count accept-traceroute-tcp;
         accept;
      }
   }
}

Let’s take a look at what this is doing – matching TCP traffic to IPv4 interfaces on the router, with a TTL of 1. If the traffic is matches, then we accept the traffic, count the packets and police to 1Mbps. At first glance this sounds good – traceroute traffic (e.g. tcptraceroute) using TCP is rate limited (the accept-traceroute also does the same for UDP and ICMP).

But what about TCP traffic with a legitimate TTL of 1?

Which routing protocol uses TCP, a TTL of 1 and might send a lot of data – eBGP !

So the unintended effect is if the filters are chained in as shown in the examples, i.e. as below:

[ accept-common-services accept-ospf accept-rip accept-bfd accept-bgp accept-ldp accept-rsvp discard-all ]

then the “accept-bgp” filter will not be matched for eBGP trafic – the “accept-common-services” filter will match eBGP, and eBGP traffic will be rate limited to 1Mbps! Clearly you don’t want this if you are synchronising a full routing table.

The quick way to look for this is to run the command “show firewall filter lo0.0-i”, and look for the counters “accept-traceroute-tcp-lo0.0-i” and the policer “management-1m-accept-traceroute-tcp-lo0.0-i”. If either of these are showing high counter values, then either you are experiencing excessive TCP traceroute, or perhaps you might want to look at the filter order! 🙂

There is a simple fix of course – either put the “accept-bgp” filter first in the chain, or if TCP traceroute isn’t of interest then remove the configuration from the “accept-traceroute” filter.

So if you’ve used these filters as a basis for you own RE filters, it might be worth taking a look at the filter chain order.

Mapping traffic to an LSP on Junos – BGP and table inet.3 (part 2)

Now that we know about IGP based LSP forwarding on Junos, the 2nd part in this series focuses on BGP and table inet.3. 

We also continue from where part 1 left off, by looking at how traffic-engineering bgp-igp and mpls-forwarding can affect route redistribution from OSPF into BGP.

Lab Topology

For this lab, I’ll be using the topology below.

lsplab

Software revisions are as follows

  • CE Routers (CE1, CE2): IOS (Cisco 7200 12.4(24)T)
  • P Routers (R1, R2, R3, R4, R5): IOS (Cisco 7200 12.4(24)T)
  • PE Routers (Junos1, Junos2): Junos (Olive 12.3R5.7)

As with Part 1, the base configurations are using OSPF as the routing protocol and LDP to exchange transport labels.

Route redistribution (bgp-igp and mpls-forwarding)

Here’s what I changed on Junos 1. The OSPF route 102.102.102.102 learnt via OSPF from CE2 will be redistributed in to BGP.

By the way, I’m not suggesting that your CEs should be part of your core IGP, but for the purposes on this lab test… 🙂

root@R6-Junos1# show | compare
[edit protocols bgp group internal]
+    export ospf2bgp;
[edit policy-options]
+   policy-statement ospf2bgp {
+       from {
+           protocol ospf;
+           route-filter 102.102.102.102/32 exact;
+       }
+       then accept;
+   }

The configuration on Junos1 is still running with “traffic-engineering mpls-forwarding” so the routing table has OSPF as the active route for routing, and the LDP route is active for forwarding

root@R6-Junos1> show route 102.102.102.102

inet.0: 21 destinations, 35 routes (21 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

102.102.102.102/32 @[OSPF/150] 00:00:28, metric 0, tag 0
                    > to 192.168.46.4 via em0.0
                   #[LDP/9] 00:00:28, metric 1
                    > to 192.168.46.4 via em0.0, Push 28

Hence you would definitely expect the routing policy to match on the OSPF route 102.102.102.102 and therefore we’ll see the route in BGP right? Sure enough if I hop over to Junos2, the route is there:

root@R7-Junos2> show route receive-protocol bgp 6.6.6.6

inet.0: 22 destinations, 23 routes (22 active, 0 holddown, 0 hidden)
  Prefix                  Nexthop              MED     Lclpref    AS path
  102.102.102.102/32      192.168.46.4         0       100        I

OK show what happens if I change over to traffic-engineering bgp-igp  on Junos1? The LDP route becomes the active route for routing and forwarding, and isn’t matched by my policy, so is not advertised to Junos2.

inet.0: 22 destinations, 36 routes (22 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

102.102.102.102/32 *[LDP/9] 00:00:04, metric 1
                    > to 192.168.46.4 via em0.0, Push 28
                    [OSPF/150] 00:04:54, metric 0, tag 0
                    > to 192.168.46.4 via em0.0

root@R7-Junos2> show route receive-protocol bgp 6.6.6.6

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)

inet.3: 14 destinations, 14 routes (14 active, 0 holddown, 0 hidden)

BGP (LSP forwarding) and table inet.3

We’ll now have a look at how BGP operates. I’ve set the “traffic-engineering” on Junos1 back to the defaults. We should expect BGP to recursively resolve it’s next-hop via inet.3 and therefore MPLS route the traffic. Let’s see!

iBGP Peering

There is an iBGP peering session between Junos1 and Junos2. No other routers are running iBGP

eBGP Peering

Junos2 has an eBGP peering with CE1. CE has a second Loopback 112.112.112.112 being advertised via this eBGP session.

root@R7-Junos2> show configuration protocols bgp
group as102 {
    peer-as 102;
    neighbor 192.168.102.1;
}
group internal {
    local-address 7.7.7.7;
    peer-as 1;
    neighbor 6.6.6.6;
}

root@R7-Junos2> show route receive-protocol bgp 192.168.102.1

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)
  Prefix                  Nexthop              MED     Lclpref    AS path
* 112.112.112.112/32      192.168.102.1        0                  102 I

LSP Fowarding and Routing

So how does Junos1 route to CE2s IP address 112.112.112.112? Let’s take a look at the routing tables.

root@R6-Junos1> show route 112.112.112.112

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

112.112.112.112/32 *[BGP/170] 00:05:00, MED 0, localpref 100, from 7.7.7.7
                      AS path: 102 I, validation-state: unverified
                    > to 192.168.46.4 via em0.0, Push 27

OK, so we see 112.112.112.112/32 in table inet.0 as expected, and it looks like label 27 is going to be pushed. Let’s take a look at this in more detail:

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)
112.112.112.112/32 (1 entry, 1 announced)
TSI:
KRT in-kernel 112.112.112.112/32 -> {indirect(131070)}
        *BGP    Preference: 170/-101
                Next hop type: Indirect
                Address: 0x9378ba4
                Next-hop reference count: 3
                Source: 7.7.7.7
                Next hop type: Router, Next hop index: 561
                Next hop: 192.168.46.4 via em0.0, selected
                Label operation: Push 27
                Label TTL action: prop-ttl
                Session Id: 0x1
                Protocol next hop: 192.168.102.1
                Indirect next hop: 93b8000 131070 INH Session ID: 0x2
                State: 
                Local AS:     1 Peer AS:     1
                Age: 5:41       Metric: 0       Metric2: 1
                Validation State: unverified
                Task: BGP_1.7.7.7.7+179
                Announcement bits (2): 0-KRT 6-Resolve tree 2
                AS path: 102 I
                Accepted
                Localpref: 100
                Router ID: 7.7.7.7
                Indirect next hops: 1
                        Protocol next hop: 192.168.102.1 Metric: 1
                        Indirect next hop: 93b8000 131070 INH Session ID: 0x2
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 192.168.46.4 via em0.0
                                Session Id: 0x1
                        192.168.102.0/24 Originating RIB: inet.3
                          Metric: 1                       Node path count: 1
                          Forwarding nexthops: 1
                                Nexthop: 192.168.46.4 via em0.0

The key here is the protocol next hop – 192.168.102.1.

192.168.102.1 isn’t directly attached to Junos1 – it is CE2s address on the Junos2<->CE2 segment, Therefore BGP will recursively resolve this next hop via table inet.3 and inet.0. As the inet.3 LDP route has a lower preference compared to the inet.0 OSPF route, the inet.3 route will be chosen and traffic will be placed on the LSP automatically, pushing label 27 in this case.

root@R6-Junos1> show route 192.168.102.1

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

192.168.102.0/24   *[OSPF/10] 00:06:30, metric 5
                    > to 192.168.46.4 via em0.0

inet.3: 14 destinations, 14 routes (14 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

192.168.102.0/24   *[LDP/9] 00:06:30, metric 1
                    > to 192.168.46.4 via em0.0, Push 27

root@R6-Junos1> show route forwarding-table destination 112.112.112.112
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
112.112.112.112/32 user     0                    indr 131070     2
                              192.168.46.4      Push 27   561     2 em0.0

root@R6-Junos1> traceroute 112.112.112.112
traceroute to 112.112.112.112 (112.112.112.112), 30 hops max, 40 byte packets
 1  192.168.46.4 (192.168.46.4)  28.026 ms  27.884 ms  28.510 ms
     MPLS Label=27 CoS=0 TTL=1 S=1
 2  192.168.34.3 (192.168.34.3)  29.767 ms  25.848 ms  28.571 ms
     MPLS Label=28 CoS=0 TTL=1 S=1
 3  192.168.35.5 (192.168.35.5)  29.831 ms  26.455 ms  28.586 ms
     MPLS Label=27 CoS=0 TTL=1 S=1
 4  192.168.57.7 (192.168.57.7)  29.478 ms  25.518 ms  29.075 ms
 5  192.168.102.1 (192.168.102.1)  32.961 ms  31.147 ms  33.398 ms

Traffic is labelled!

But what about IGP traffic to the protocol next hop? Well that won’t follow the LSP of course because we don’t have “mpls traffic-engineering” configured.

root@R6-Junos1> show route forwarding-table destination 192.168.102.1
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
192.168.102.0/24   user     0 192.168.46.4       ucst   555    32 em0.0

Exactly as expected!

I’ve shown that BGP is using table inet.3 to resolve next hops, where as normal IGP routing is using inet.0.

Another thing to remember with BGP & inet.3… if inet.0 contains a better route (e.g. better preference) then BGP would use the inet.0 route and traffic would not be forwarded on the LSP.

In this case, as none of the P routers are running BGP, this would break the connectivity (the P routers don’t know how to get to 112.112.112.112 so would drop the traffic). Hence, the traffic has to follow the LSP for the traffic to reach CE2.

Mapping traffic to an LSP on Junos (part 1)

It’s been a while since I wrote a Junos post… there are a few things that I’ve been meaning to write up for a while, so this series will focus on MPLS label switched paths (LSP) and how to put traffic on to an LSP.

By default IGPs won’t use an LSP, so this 1st part focuses on what you’ll need to configure should you want both BGP and IGP traffic forwarding to use LSPs.

Later parts will look at other options available in Junos to forward traffic via an LSP, and more advanced topics e.g. LSP selection based on BGP extended community.

I’ll start with a quick recap of the Junos routing tables – inet.0, inet.3 and mpls.0

Junos Routing Tables (IPv4)

inet.0

Table inet.0 is the primary unicast routing table used by IPv4. It’s where IGPs will resolve next hops.

inet.3

Table inet.3 is the routing table populated by MPLS protocols such as RSVP or LDP. A lookup here will result in a label being pushed.

It’s also worth noting that inet.3 is used by BGP to resolve BGP next-hops. BGP examines both inet.3 and inet.0, choosing a next-hop based on the lowest Junos preference value. In case of a tie, inet.3 is used.

mpls.0

This is the MPLS label switching table and is used by label switch routers. Routers along the LSP will use this table to swap and pop labels as appropriate.

Lab Topology

For this lab, I’ll be using the topology below.

lsplab

Software revisions are as follows

  • CE Routers (CE1, CE2): IOS (Cisco 7200 12.4(24)T)
  • P Routers (R1, R2, R3, R4, R5): IOS (Cisco 7200 12.4(24)T)
  • PE Routers (Junos1, Junos2): Junos (Olive 12.3R5.7)

The base configurations are using OSPF as the routing protocol and LDP to exchange transport labels.

LSP Traffic Forwarding

Default MPLS forwarding on Cisco and Juniper

Let’s have a look at Junos1 and R4 and see how traffic would be forwarded by default.

Junos1

102.102.102.102 is  an IP address assigned to the Looback0 on CE2. Let’s take a look at the routing table on Junos1:

root@R6-Junos1> show route 102.102.102.102

inet.0: 21 destinations, 21 routes (21 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

102.102.102.102/32 *[OSPF/150] 00:08:21, metric 0, tag 0
> to 192.168.46.4 via em0.0

inet.3: 14 destinations, 14 routes (14 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

102.102.102.102/32 *[LDP/9] 00:04:32, metric 1
> to 192.168.46.4 via em0.0, Push 28

Table inet.0 contains only the OSPF leant route. Remember that although there is a LDP route in table inet.3, it won’t be used for IGP route lookups, and we can verify this with a look at the forwarding table for prefix 102.102.102.102.

root@R6-Junos1> show route forwarding-table destination 102.102.102.102
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
102.102.102.102/32 user     0 192.168.46.4       ucst   555    30 em0.0

Note that type is “ucst”, if the traffic was to be labelled it would say “Push” followed by the label number to be pushed.

Therefore IPv4 unicast traffic by default on Junos will not be labelled.

R4

Now let’s take a look at R4

R4#show mpls forwarding-table 102.102.102.102
Local  Outgoing      Prefix            Bytes Label   Outgoing   Next Hop
Label  Label or VC   or Tunnel Id      Switched      interface
28     29            102.102.102.102/32   \
                                       0             Fa1/0      192.168.34.3

R4#sh ip cef 102.102.102.102 detail
102.102.102.102/32, epoch 0
  local label info: global/28
  nexthop 192.168.34.3 FastEthernet1/0 label 29

R4 shows a very different story – traffic will be labelled, and will push label 29.

Also note that R4 expects labelled traffic going to 102.102.102.102 to be received with label 28.

Verification

Let’s verify this situation with a traceroute on Junos1

root@R6-Junos1> traceroute 102.102.102.102
traceroute to 102.102.102.102 (102.102.102.102), 30 hops max, 40 byte packets
 1  192.168.46.4 (192.168.46.4)  8.855 ms  2.253 ms  4.572 ms
 2  192.168.34.3 (192.168.34.3)  28.783 ms  26.034 ms  28.780 ms
     MPLS Label=29 CoS=0 TTL=1 S=1
 3  192.168.35.5 (192.168.35.5)  27.265 ms  25.306 ms  28.669 ms
     MPLS Label=28 CoS=0 TTL=1 S=1
 4  192.168.57.7 (192.168.57.7)  29.321 ms  26.853 ms  28.707 ms
 5  192.168.102.1 (192.168.102.1)  34.871 ms  30.649 ms  33.596 ms

Well that’s pretty clear that the 1st hop traffic from Junos was not labelled, and traffic from the IOS box R4 was labelled. The egress label applied for the R4->R3 traffic was label 29 as expected.

So what if we want Junos to forward IGP traffic via an LSP? Well there are a couple of MPLS configuration options: traffic-engineering bgp-igp and mpls-forwarding.

traffic-engineering bgp-igp

Traffic-engineering bgp-igp configures BGP and the IGPs to use LSPs for forwarding traffic destined for egress routers. The bgp-igp option causes all inet.3 routes to be moved to the inet.0 routing table.

root@R6-Junos1# show | compare
[edit protocols mpls]
+   traffic-engineering bgp-igp;

root@R6-Junos1> show route 102.102.102.102

inet.0: 21 destinations, 35 routes (21 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

102.102.102.102/32 *[LDP/9] 00:00:33, metric 1
                    > to 192.168.46.4 via em0.0, Push 28
                    [OSPF/150] 00:00:33, metric 0, tag 0
                    > to 192.168.46.4 via em0.0

The LDP route and the OSPF route are now in table inet.0, and the LDP route with preference 9 is now the best route.

The traceroute should now show the 1st hop being labelled – let’s see:

root@R6-Junos1> traceroute 102.102.102.102
traceroute to 102.102.102.102 (102.102.102.102), 30 hops max, 40 byte packets
 1  192.168.46.4 (192.168.46.4)  30.523 ms  28.792 ms  28.561 ms
     MPLS Label=28 CoS=0 TTL=1 S=1
 2  192.168.34.3 (192.168.34.3)  28.502 ms  26.692 ms  28.756 ms
     MPLS Label=29 CoS=0 TTL=1 S=1
 3  192.168.35.5 (192.168.35.5)  28.296 ms  26.501 ms  28.403 ms
     MPLS Label=28 CoS=0 TTL=1 S=1
 4  192.168.57.7 (192.168.57.7)  28.205 ms  26.365 ms  28.581 ms
 5  192.168.102.1 (192.168.102.1)  33.717 ms  30.332 ms  33.996 ms

Spot on – and the 1st hop label is 28, as expected.

traffic-engineering mpls-forwarding

traffic-engineering bgp-igp will allow high-priority LSPs to supersede IGP routes in the inet.0 routing table.  Essentially this means that the active route might not be the IGP route, and therefore IGP routing might not be as expected, e.g. routing policy routes may not be matched.

The mpls-forwarding option enables LSPs to be used for forwarding but not route selection. Routes are added to both the inet.0 and inet.3 routing tables.

Let’s take a look at the routing table with the mpls-forwarding option in place:

root@R6-Junos1> show route 102.102.102.102

inet.0: 21 destinations, 35 routes (21 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

102.102.102.102/32 @[OSPF/150] 00:03:56, metric 0, tag 0
                    > to 192.168.46.4 via em0.0
                   #[LDP/9] 00:00:02, metric 1
                    > to 192.168.46.4 via em0.0, Push 28

inet.3: 14 destinations, 14 routes (14 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

102.102.102.102/32 *[LDP/9] 00:00:02, metric 1
                    > to 192.168.46.4 via em0.0, Push 28

Note that both the OSPF routes and the LDP routes a present in table inet.0 but the OSPF route is marked “Routing Use Only” and the LDP route “Forwarding Use Only”.

Therefore the outcome for traffic forwarding will be the identical to “bgp-igp”:

root@R6-Junos1> traceroute 102.102.102.102
traceroute to 102.102.102.102 (102.102.102.102), 30 hops max, 40 byte packets
 1  192.168.46.4 (192.168.46.4)  28.526 ms  28.594 ms  29.658 ms
     MPLS Label=28 CoS=0 TTL=1 S=1
 2  192.168.34.3 (192.168.34.3)  28.161 ms  26.584 ms  28.419 ms
     MPLS Label=29 CoS=0 TTL=1 S=1
 3  192.168.35.5 (192.168.35.5)  28.882 ms  25.807 ms  28.394 ms
     MPLS Label=28 CoS=0 TTL=1 S=1
 4  192.168.57.7 (192.168.57.7)  28.521 ms  26.170 ms  28.564 ms
 5  192.168.102.1 (192.168.102.1)  34.301 ms  31.283 ms  34.266 ms

The difference can be noted when we are matching against a protocol in a routing policy.

In the second part in this series, I will look at BGP forwarding via an LSP, and will also demonstrate how these two MPLS options affect route redistribution from OSPF to BGP.

 

 

 

IPv6: So I have my /32… now what?

I was recently doing some work at an ISP. They had been assigned their /32 IPv6 prefix a while ago, but other than a few internal test networks hadn’t done much with it since it was assigned.

I was pretty much asked “So I have my /32 … now what?” OK, so that’s not quite the question I was asked 🙂 but essentially they wanted some guidance on how to cut up the /32 for their own infrastructure, customers, etc and clarification on the RIPE IPv6 assignment policy.

Some people reading this post might not be familiar with how IP prefixes are allocated globally, so let’s start with a few definitions.

So how is all this IP stuff allocated anyway?

The Internet Assigned Numbers Authority (IANA) has authority over all IP address space and Autonomous system (AS) Numbers allocated and in use on the Internet and it is IANA that makes the allocations to the Regional Internet Registries.

 

“RIPE NCC” provides the IPv4, IPv6 and AS Number resources to it’s members in Europe, Central Asia and the Middle East.

Internet Registry (IR) – An Internet Registry is an organisation that is responsible for distributing IP address space to its members or customers and for registering those distributions. IRs are classified according to their primary function and territorial scope.

Regional Internet Registry (RIR) – Regional Internet Registries are established and authorised by respective regional communities and recognised by the IANA to serve and represent large geographical regions. The primary role of RIRs is to manage and distribute public Internet address space within their respective regions.

Local Internet Registry (LIR) – A Local Internet Registry is an IR that primarily assigns address space to the users of the network services that it provides. LIRs are generally ISPs whose customers are primarily End Users and possibly other ISPs.

Allocate – To “allocate” means to distribute address space to IRs for the purpose of subsequent distribution by them.

Assign – To “assign” means to delegate address space to an ISP or End User for specific use within the Internet infrastructure they operate. Assignments must only be made for specific purposes documented by specific organisations and are not to be sub-assigned to other parties.

End Site – An End Site is defined as an End User (subscriber) who has a business or legal relationship (same or associated entities) with a service provider that involves:

  • that service provider assigning address space to the End User
  • that service provider providing transit service for the End User to other sites
  • that service provider carrying the End User’s traffic
  • that service provider advertising an aggregate prefix route that contains the End User’s assignment

The definitions above are taken from RIPE document 589: IPv6 Address Allocation and Assignment Policy.

From Allocation to Assignment

When an ISP (an LIR) makes an application to get’s it’s IP space from the Regional Registry, in this case RIPE. Subject to meeting the criteria, the LIR will be provided with the current minimum IPv6 allocation size of a /32.

The LIR must then divide up their /32 and make IPv6 assignments in accordance with RIPE policy and network operator current best practice. It is at this point where I got involved.

Let’s start with the RIPE policy on allocating IPv6 to the ISPs own infrastructure…

ISP Infrastructure

RIPE document 589: IPv6 Address Allocation and Assignment Policy states:

“An organisation (i.e. ISP/LIR) may assign a network prefix per PoP as the service infrastructure of an IPv6 service operator. Each assignment to a PoP is regarded as one assignment regardless of the number of users using the PoP. A separate assignment can be obtained for the in-house operations of the operator.”

This means that an LIR can allocate a prefix per PoP that provides IPv6 addressing to all infrastructure in that PoP, i.e. to routers, switches, servers, backbone p2p links, etc.

RIPE will allow a /48 to be assigned per PoP without sending a request to RIPE.

Whilst a large ISP might assign a /48 per PoP, a smaller ISP might pick the choose to use only a single /48 for it’s own infrastructure.

It’s important to keep separate blocks for infrastructure and customer assignments (customer ranges are not “trusted”, so it’s not desirable to assign customer ranges from the infrastructure block. For example, addressing these ranges from a separate block enables simpler ingress/egress edge filtering.)

More detail on address planning in part 2 of this series.

End Users / End Sites

Basically, for a /48 or longer, the LIR can assign prefixes however it feels is appropriate to and End Site. The current guidelines are documented in RFC6177 (IPv6 Address Assignment to End Sites), which obsoletes the previous RFC3177 recommendation of a /48 to all End Sites – “a one-size-fits-all recommendation of /48 is not nuanced enough for the broad range of end sites and is no longer recommended as a single default.”

RFC6177 makes recommendations as follows:

  • The minimum allocation to sites should be a /64. Even where only a single IP address is needed, a /128 should no longer be allocated to End Sites (as a site implies multiple devices)
  • RFC3177 recommended prefix lengths of /48, /64 and /128 and this raised some concerns that operational practice and implementation may become “hard coded” around these fixed boundaries. This has never been the actual intention and CIDR continues to apply to IPv6.
  • A /48 is no longer the recommended default assignment size. End sites are all different and the assignment should be appropriate to their needs (be it a /48, /52, /56, etc)

But RFC6177 does not make a formal recommendation on what assignment sizes should be made. This is now left to the discretion of the LIR to allocate as appropriate, but it does reaffirm that the allocation should allow for growth:

“A key principle for address management is that end sites always be able to obtain a reasonable amount of address space for their actual and planned usage, and over time ranges specified in years rather than just months. In practice, that means at least one /64, and in most cases significantly more.”

What this really means is that an ISP should assign IPv6 space to End Sites based on their needs for the next few years. For ease of operation and to make things easier for us humans, the prefix subnet should be on the nibble boundary for (a power of 4 subnet boundary), i.e. a /48, /52, /56, /60 or /64. More on this in part 2.

An ISP might end up assigning IPv6 prefixes to it’s customers like this:

  • /64 (1 subnet needed)
  • /60 (16 subnets needed)
  • /56 (256 subnets)
  • /52 (4096 subnets)
  • /48 (65536 subnets)
    At the time of writing, an LIR can assign a /48 or longer prefix to a single End Site without asking RIPE for approval. Anything bigger than a /48 needs approval.

And now all these allocations need to be documented in the RIPE database. More on this in another part.

And finally, yes, this blog is accessible via IPv6 🙂