Category Archives: Blog

Juniper vMX – Lab Setup (2 vMX, EVPN, Logical Systems)

Following my Juniper vMX getting started guide post, I thought it would be useful to show how vMX could be used to create a lab environment.

This post follows on immediately where the last one finished. I will create a multi-router topology on a vMX instance using Logical Systems, and then go on to configure EVPN on this topology. As with the previous post, this is all running on my Macbook pro on a nested Ubuntu VM.

Lab topology

In this post I will create the following simple topology of 4 MX routers. You will be able to extend the principles shown here to expand your own topology to be as large and complex as you like.

vmx lab

The topology will consist of a 2 x vMX running on the same Ubuntu host.

I will configure EVPN however EVPN is unfortunately not supported within a Logical System, so R2 and R3 will be the main routers on each vMX and will be my EVPN PEs.

R1 and R4 will be created as Logical System routers.

I will connect ge-0/0/1 and ge-0/0/2 on each of vMX back to back using a linux bridge and these interfaces will then be used to provide the interconnection between the main router and Logical System using VLANs. I could use LT interfaces but where is the fun in that.

ge-0/0/3 on vMX1 and vMX2 will be interconnected using a Linux virtio bridge on the host.

vMX2 instance setup

First things first, let’s get the second instance of vMX running. If you remember from my 1st vMX post there is a configuration file for the vMX instance. Running a second vMX instance is no different and has it’s own settings file. I will copy vmx1’s config file and use that as the basis for the vMX2.

[email protected]:~/vmx-14.1R5.4-1$ cd config/
[email protected]:~/vmx-14.1R5.4-1/config$ cp vmx.conf vmx2.conf

Now let’s have a look at what settings need to be changed in vmx2.conf

The vMX identifier is changed to vmx2. I am using the same host management interface for both vMX1 and vMX2 and no changes are needed to the images.

HOST:
    identifier                : vmx2   # Maximum 4 characters
    host-management-interface : eth0
    routing-engine-image      : "/home/mdinham/vmx-14.1R5.4-1/images/jinstall64-vmx-14.1R5.4-domestic.img"
    routing-engine-hdd        : "/home/mdinham/vmx-14.1R5.4-1/images/vmxhdd.img"
    forwarding-engine-image   : "/home/mdinham/vmx-14.1R5.4-1/images/vPFE-lite-20150707.img"

The external bridge can be used by both vMX1 and vMX2 so no need to change this setting. This is used to bridge the management interfaces on vMX to the host management interface defined above.

BRIDGES:
    - type  : external
      name  : br-ext                  # Max 10 characters

For the vRE and vPFE  I will need to make some changes to console port, management ip address and MAC address. The MAC addresses taken from the locally administered MAC addresses ranges, so no problem to choose my own – taking care not to overlap with vMX1. Also, chose a console port number and management IP address that will not overlap with vMX1.

---
#vRE VM parameters
CONTROL_PLANE:
    vcpus       : 1
    memory-mb   : 2048
    console_port: 8603

    interfaces  :
      - type      : static
        ipaddr    : 192.168.100.52
        macaddr   : "0A:00:DD:C0:DE:0F"

---
#vPFE VM parameters
FORWARDING_PLANE:
    memory-mb   : 6144
    vcpus       : 3
    console_port: 8604
    device-type : virtio

    interfaces  :
      - type      : static
        ipaddr    : 192.168.100.53
        macaddr   : "0A:00:DD:C0:DE:11"

We also need to adjust the MAC addresses on each vMX2 interface.

---
#Interfaces
JUNOS_DEVICES:
   - interface            : ge-0/0/0
     mac-address          : "02:06:0A:0E:FF:F4"
     description          : "ge-0/0/0 interface"

   - interface            : ge-0/0/1
     mac-address          : "02:06:0A:0E:FF:F5"
     description          : "ge-0/0/0 interface"

   - interface            : ge-0/0/2
     mac-address          : "02:06:0A:0E:FF:F6"
     description          : "ge-0/0/0 interface"

   - interface            : ge-0/0/3
     mac-address          : "02:06:0A:0E:FF:F7"
     description          : "ge-0/0/0 interface"

vMX2 is now ready to be built. The same orchestration script that I used to create vMX1 is again used for vMX2, but this time I need to specify the configuration file.

Note: each time I use “vmx.sh” to perform stop/start operations on vMX2, I must specify the configuration file for vMX2.

The script will create the new vMX instance and automatically start it.

[email protected]:~/vmx-14.1R5.4-1$ sudo ./vmx.sh -lv --install --cfg config/vmx2.conf

I’m now ready to connect to the console on vMX2. This is done the same way for vMX1 and vMX2, we simply reference the correct vMX instance when running the script.

[email protected]:~/vmx-14.1R5.4-1$ ./vmx.sh --console vcp vmx2
--nesiac (ttyd0)
Login Console Port For vcp-vmx2 - 8603
Press Ctrl-] to exit anytime
--
Trying ::1...1R5.4 built 2015-07-02 08:01:42 UTC
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.


Amnesiac (ttyd0)

login:

If I look at the linux bridges the script automatically created you’ll see that another internal bridge is present to enable the RE and PFE communication on vMX2. The external bridge (management bridge) is shared by all vMX management interfaces.

[email protected]:~/vmx-14.1R5.4-1/config$ brctl show
bridge name     bridge id               STP enabled     interfaces
br-ext          8000.000c2976a815       yes             br-ext-nic
                                                        eth0
                                                        vcp_ext-vmx1
                                                        vcp_ext-vmx2
                                                        vfp_ext-vmx1
                                                        vfp_ext-vmx2
br-int-vmx1     8000.525400866237       yes             br-int-vmx1-nic
                                                        vcp_int-vmx1
                                                        vfp_int-vmx1
br-int-vmx2     8000.5254006ec6d9       yes             br-int-vmx2-nic
                                                        vcp_int-vmx2
                                                        vfp_int-vmx2

virtio bindings

As I did in my vMX getting started post, for the Ethernet connectivity to the vMX I will be using KVM virtio paravirtualisation.

virtio bindings are flexible and can be used to map multiple vMX instances to a physical host interface, or to connect vMX instances or vMX interfaces together which we will be doing here. Linux bridges are used to stitch everything together.

At this point both vMX1 and vMX2 are running, but I need to create the virtio bindings to enable the communication between each MX.

For both vMX1 and vMX2 this is done in the same configuration file – config/vmx-junosdev.conf

I’ll create a link between vMX1 interfaces ge-0/0/1 and ge-0/0/2.

     - link_name  : vmx_link_ls
       endpoint_1 :
         - type        : junos_dev
           vm_name     : vmx1
           dev_name    : ge-0/0/1
       endpoint_2 :
         - type        : junos_dev
           vm_name     : vmx1
           dev_name    : ge-0/0/2

The same is done for vMX2

     - link_name  : vmx2_link_ls
       endpoint_1 :
         - type        : junos_dev
           vm_name     : vmx2
           dev_name    : ge-0/0/1
       endpoint_2 :
         - type        : junos_dev
           vm_name     : vmx2
           dev_name    : ge-0/0/2

Finally I will create a link between ge-0/0/3 on vMX1 and vMX2. I could use the same technique as shown above, but what if I wanted to connect more than 2 vMX together on the same Ethernet segment? It would be done like this with an additional bridge being defined and shared by each vMX.

     - link_name  : bridge_vmx_12
       endpoint_1 :
         - type        : junos_dev
           vm_name     : vmx1
           dev_name    : ge-0/0/3
       endpoint_2 :
         - type        : bridge_dev
           dev_name    : bridge_vmx12

     - link_name  : bridge_vmx_12
       endpoint_1 :
         - type        : junos_dev
           vm_name     : vmx2
           dev_name    : ge-0/0/3
       endpoint_2 :
         - type        : bridge_dev
           dev_name    : bridge_vmx12

Again the orchestration script vmx.sh is used to create the device bindings

[email protected]:~/vmx-14.1R5.4-1$ sudo ./vmx.sh --bind-dev

Now let’s look at what bridges we have!

  • br-ext – the external bridge for management traffic
  • br-int-vmx1 – the internal bridge for vMX1 RE to PFE traffic
  • br-int-vmx2 – the internal bridge for vMX2 RE to PFE traffic
  • bridge_vmx12 – to enable the communication between ge-0/0/3 on vMX1 and vMX2
  • virbr0 – unused as all vMX interfaces are defined
  • vmx1_link_ls – connects ge-0/0/1 and ge-0/0/2 on vMX1
  • vmx2_link_ls – connects ge-0/0/1 and ge-0/0/2 on vMX2
  • vmx_link – connects ge-0/0/0 on vMX1 and vMX2 to eth1 on the host
[email protected]:~/vmx-14.1R5.4-1$ brctl show
bridge name     bridge id            STP enabled     interfaces
br-ext          8000.000c2976a815    yes             br-ext-nic
                                                     eth0
                                                     vcp_ext-vmx1
                                                     vcp_ext-vmx2
                                                     vfp_ext-vmx1
                                                     vfp_ext-vmx2
br-int-vmx1     8000.525400866237    yes             br-int-vmx1-nic
                                                     vcp_int-vmx1
                                                     vfp_int-vmx1
br-int-vmx2     8000.5254006ec6d9    yes             br-int-vmx2-nic
                                                     vcp_int-vmx2
                                                     vfp_int-vmx2
bridge_vmx12    8000.fe060a0efff3    no              ge-0.0.3-vmx1
                                                     ge-0.0.3-vmx2
virbr0          8000.000000000000    yes
vmx2_link_ls    8000.fe060a0efff5    no              ge-0.0.1-vmx2
                                                     ge-0.0.2-vmx2
vmx_link        8000.000c2976a81f    no              eth1
                                                     ge-0.0.0-vmx1
                                                     ge-0.0.0-vmx2
vmx_link_ls     8000.fe060a0efff1    no              ge-0.0.1-vmx1
                                                     ge-0.0.2-vmx1

At this point vMX1 and vMX2 are ready to be configured.

EVPN Lab

EVPN is defined in RFC7432. It provides a number of enhancements over VPLS, particularly as MAC address learning now occurs in the control plane and is advertised between PEs using an MP-BGP MAC route.  Compared to VPLS which uses data plane flooding to learn MAC addresses, this BGP based approach enables EVPN to limit the flooding of unknown unicast. MAC addresses are now being routed which in multi homed scenarios enables all active links to be utilised. Neat stuff. Also look up the Juniper Day 1 on EVPN.

I’ve already configured a base configuration on R2 and R3. Note I changed the chassis network-services mode to enhanced-ip from the vMX default of enhanced-ethernet.

[email protected]# show | compare
[edit]
+  chassis {
+      network-services enhanced-ip;
+  }
+  interfaces {
+      ge-0/0/3 {
+          unit 0 {
+              family inet {
+                  address 192.168.23.2/24;
+              }
+              family mpls;
+          }
+      }
+      lo0 {
+          unit 0 {
+              family inet {
+                  address 2.2.2.2/32;
+              }
+          }
+      }
+  }
+  protocols {
+      mpls {
+          interface ge-0/0/3.0;
+      }
+      ospf {
+          area 0.0.0.0 {
+              interface lo0.0 {
+                  passive;
+              }
+              interface ge-0/0/3.0;
+          }
+      }
+      ldp {
+          interface ge-0/0/3.0;
+      }
+  }

Comms are up between vMX1 and vMX2

[email protected]> ping 192.168.23.3 rapid
PING 192.168.23.3 (192.168.23.3): 56 data bytes
!!!!!
--- 192.168.23.3 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 2.031/2.253/2.805/0.281 ms

[email protected]> show ospf neighbor
Address          Interface              State     ID               Pri  Dead
192.168.23.3     ge-0/0/3.0             Full      3.3.3.3          128    37

[email protected]> show ldp neighbor
Address            Interface          Label space ID         Hold time
192.168.23.3       ge-0/0/3.0         3.3.3.3:0                14

Now I have reachability between R2 and R3 I can go ahead and add the required base config for EVPN.

Note: EVPN is unfortunately not supported within a Logical System so I am configuring EVPN on the main routers.

From Junos 14.1R4 the chained composite next hop features for EVPN will automatically be configured. Chained composite next hops are required for EVPN and allow the ingress PE to take multiple actions before forwarding.

[email protected]> ...configuration routing-options | display inheritance defaults
autonomous-system 65000;
##
## 'forwarding-table' was inherited from group 'junos-defaults'
##
forwarding-table {
    ##
    ## 'evpn-pplb' was inherited from group 'junos-defaults'
    ##
    export evpn-pplb;
    ##
    ## 'chained-composite-next-hop' was inherited from group 'junos-defaults'
    ##
    chained-composite-next-hop {
        ##
        ## 'ingress' was inherited from group 'junos-defaults'
        ##
        ingress {
            ##
            ## 'evpn' was inherited from group 'junos-defaults'
            ##
            evpn;
        }
    }
}

We require the evpn and inet-vpn MP-BGP address families. Here I am configuring an iBGP peering with R3.

[email protected]# show | compare
[edit]
+  routing-options {
+      autonomous-system 65000;
+  }
[edit protocols]
+   bgp {
+       group internal {
+           type internal;
+           local-address 2.2.2.2;
+           family inet-vpn {
+               unicast
+           }
+           family evpn {
+               signaling;
+           }
+           neighbor 3.3.3.3;
+       }
+   }

At this point the core configuration for EVPN is complete.

Logical Systems

My configuration gets a little more complicated here, because I need to create R1 and R4 as Logical Systems on my vMX. I will do this now.

Remember that ge-0/0/1 and ge0/0/2 have been connected back to back by the virtio bridge. I will use ge-0/0/1 as the interface on R2/R3 and ge-0/0/2 as the interfaces on the Logical System routers R1/R4.

[email protected]# show | compare
[edit]
+ logical-systems {
+     R1 {
+         interfaces {
+             ge-0/0/2 {
+                 unit 100 {
+                     vlan-id 100;
+                     family inet {
+                         address 192.168.14.1/24;
+                     }
+                 }
+             }
+         }
+     }
+ }
[edit interfaces]
+   ge-0/0/2 {
+       vlan-tagging;
+   }

Not required for this lab, but if you wanted to create multiple Logical System routers on the same vMX this can of course be done. In the example below I have created two routers R5 and R6, they are linked together via ge-0/0/1 (R5) and ge-0/0/2 (R6) with vlan 56 being used as the VLAN ID for this point to point link. You can of course configure OSPF/BGP/MPLS etc directly between these routers. The configuration is defined in the appropriate logical system stanza.

logical-systems {
    R5 {
        interfaces {
            ge-0/0/1 {
                unit 56 {
                    vlan-id 56;
                    family inet {
                        address 192.168.56.5/24;
                    }
                }
            }
            lo0 {
                unit 5 {
                    family inet {
                        address 5.5.5.5/32;
                    }
                }
            }
        }
    }
    R6 {
        interfaces {
            ge-0/0/2 {
                unit 56 {
                    vlan-id 56;
                    family inet {
                        address 192.168.56.6/24;
                    }
                }
            }
            lo0 {
                unit 6 {
                    family inet {
                        address 6.6.6.6/32;
                    }
                }
            }
        }
    }
}

Working with Logical Systems is simple and commands can be entered in a couple of ways. Configuration can also be entered directly when the CLI is set to a Logical System.

[email protected]> set cli logical-system R1
Logical system: R1

[email protected]:R1> ping 192.168.14.1 rapid
PING 192.168.14.1 (192.168.14.1): 56 data bytes
!!!!!
--- 192.168.14.1 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.012/0.091/0.242/0.085 ms

[email protected]:R1> clear cli logical-system
Cleared default logical system

[email protected]> ping logical-system R1 192.168.14.1 rapid
PING 192.168.14.1 (192.168.14.1): 56 data bytes
!!!!!
--- 192.168.14.1 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.009/0.013/0.026/0.007 ms

Completing the EVPN configuration

I’m going to be configuring the EVPN VLAN based service. This requires a separate EVI per VLAN. An EVI is a an EVPN instance spanning across the PEs participating in a particular EVPN.

There isn’t too much to the configuration. I configure the interface facing R1, and then define the evpn routing-instance.

[email protected]# show | compare
[edit interfaces]
+   ge-0/0/1 {
+       flexible-vlan-tagging;
+       encapsulation flexible-ethernet-services;
+       unit 100 {
+           encapsulation vlan-bridge;
+           vlan-id 100;
+       }
+   }
[edit]
+  routing-instances {
+      EVPN100 {
+          instance-type evpn;
+          vlan-id 100;
+          interface ge-0/0/1.100;
+          route-distinguisher 2.2.2.2:1;
+          vrf-target target:1:1;
+          protocols {
+              evpn;
+          }
+      }
+  }

Note: If you try to configure an evpn routing-instance on a logical system, you won’t see the option for evpn.

[email protected]> set cli logical-system R1
Logical system: R1

[email protected]:R1> configure
Entering configuration mode

[edit]
[email protected]:R1# set routing-instances evpn instance-type ?
Possible completions:
  forwarding           Forwarding instance
  l2backhaul-vpn       L2Backhaul/L2Wholesale routing instance
  l2vpn                Layer 2 VPN routing instance
  layer2-control       Layer 2 control protocols
  mpls-internet-multicast  Internet Multicast over MPLS routing instance
  no-forwarding        Nonforwarding instance
  virtual-router       Virtual routing instance
  virtual-switch       Virtual switch routing instance
  vpls                 VPLS routing instance
  vrf                  Virtual routing forwarding instance
[edit]

Verification

Let’s see if I can ping across the EVI from R1 to R4.

[email protected]> set cli logical-system R1
Logical system: R1

[email protected]:R1> ping 192.168.14.4 rapid
PING 192.168.14.4 (192.168.14.4): 56 data bytes
!!!!!
--- 192.168.14.4 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 3.644/26.570/97.943/36.259 ms

[email protected]:R1> show arp
MAC Address Address Name Interface Flags
02:06:0a:0e:ff:f6 192.168.14.4 192.168.14.4 ge-0/0/2.100 none

Excellent!

Now what does this look like from R2’s perspective, we see 2 BGP paths received.

[email protected]> show bgp summary
Groups: 1 Peers: 1 Down peers: 0
Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending
bgp.evpn.0
                       2          2          0          0          0          0
Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
3.3.3.3               65000         93         94       0       0       36:29 Establ
  bgp.evpn.0: 2/2/2/0
  EVPN100.evpn.0: 2/2/2/0
  __default_evpn__.evpn.0: 0/0/0/0

Looking more deeply we can see MAC addresses in the EVPN100 table, both the directly attached device and also the device attached to R3.

[email protected]> show route table EVPN100.evpn.0

EVPN100.evpn.0: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

2:2.2.2.2:1::100::02:06:0a:0e:ff:f2/304
                   *[EVPN/170] 00:03:27
                      Indirect
2:3.3.3.3:1::100::02:06:0a:0e:ff:f6/304
                   *[BGP/170] 00:03:27, localpref 100, from 3.3.3.3
                      AS path: I, validation-state: unverified
                    > to 192.168.23.3 via ge-0/0/3.0
3:2.2.2.2:1::100::2.2.2.2/304
                   *[EVPN/170] 00:20:27
                      Indirect
3:3.3.3.3:1::100::3.3.3.3/304
                   *[BGP/170] 00:18:38, localpref 100, from 3.3.3.3
                      AS path: I, validation-state: unverified
                    > to 192.168.23.3 via ge-0/0/3.0

Here we can see EVPN database and MAC table information.

[email protected]> show evpn database
Instance: EVPN100
VLAN  MAC address        Active source                  Timestamp        IP address
100   02:06:0a:0e:ff:f2  ge-0/0/1.100                   Jul 28 17:11:14
100   02:06:0a:0e:ff:f6  3.3.3.3                        Jul 28 17:11:15

[email protected]> show evpn mac-table

MAC flags       (S -static MAC, D -dynamic MAC, L -locally learned, C -Control MAC
    O -OVSDB MAC, SE -Statistics enabled, NM -Non configured MAC, R -Remote PE MAC)

Routing instance : EVPN100
 Bridging domain : __EVPN100__, VLAN : 100
   MAC                 MAC      Logical          NH     RTR
   address             flags    interface        Index  ID
   02:06:0a:0e:ff:f2   D        ge-0/0/1.100
   02:06:0a:0e:ff:f6   DC                        1048575 1048575

Local MAC addresses are being advertised from R2 to R3.

[email protected]> show route advertising-protocol bgp 3.3.3.3

EVPN100.evpn.0: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)
  Prefix                  Nexthop              MED     Lclpref    AS path
  2:2.2.2.2:1::100::02:06:0a:0e:ff:f2/304
*                         Self                         100        I
  3:2.2.2.2:1::100::2.2.2.2/304
*                         Self                         100        I

Here we can see detailed information about the EVPN routing instance.

[email protected]> show evpn instance EVPN100 extensive
Instance: EVPN100
  Route Distinguisher: 3.3.3.3:1
  VLAN ID: 100
  Per-instance MAC route label: 299792
  MAC database status                Local  Remote
    Total MAC addresses:                 1       1
    Default gateway MAC addresses:       0       0
  Number of local interfaces: 1 (1 up)
    Interface name  ESI                            Mode             Status
    ge-0/0/1.100    00:00:00:00:00:00:00:00:00:00  single-homed     Up
  Number of IRB interfaces: 0 (0 up)
  Number of bridge domains: 1
    VLAN ID  Intfs / up    Mode             MAC sync  IM route label
    100          1   1     Extended         Enabled   299872
  Number of neighbors: 1
    2.2.2.2
      Received routes
        MAC address advertisement:              1
        MAC+IP address advertisement:           0
        Inclusive multicast:                    1
        Ethernet auto-discovery:                0
  Number of ethernet segments: 0

Summary

In this post I showed how multiple vMX can be configured and interconnected on the same Linux host. I also built a topology of 4 logical routers on the two vMX and used EVPN to demonstrate the capability of vMX.

I’ve also completed a VPLS lab with 5 x Logical System routers running on a single vMX. If you would like to see a post on this type of configuration please mention in the comments or tweet @mattdinham.

Thanks for reading 🙂

Juniper vMX – Getting Started Guide

I’m excited to finally have the opportunity to play with Juniper’s vMX! Since it was announced last year I’ve been eagerly waiting for release – a couple of client projects already have passed by where the vMX would have been a perfect fit. vMX already won an award earlier this year at Interop Tokyo 2015!

In this post I’ll be giving a bit of background on the vMX architecture and licensing, and then go on to walk through a lab based configuration of vMX.

The vMX is a virtual MX Series Router that is optimized to run as software on x86 servers. Like other MX routers, it runs Junos, and Trio has been compiled for x86! Yes, that means the sophisticated L2, L2.5 and L3 forwarding features we are used to on the MX are present on the vMX.

Architecture

vMX can be installed on server hardware of your choice, so long as it is x86 based and running Linux (although I’m sure a version to run on vmware won’t be too far away).

vMX itself actually consists of two separate VMs – a virtual forwarding plane (VFP) running the vTrio, and a virtual control plane (VCP) running Junos.

The Linux virtualisation solution KVM is what Juniper are using to spin up the virtual instances of the control and forwarding planes, and multiple instances of vMX can be run on the same hardware. To see Juniper using Linux and KVM is no surprise as this is what we are used to on Juniper’s other products such as the QFX.

The VMs are managed by a simple orchestration script which is used to create, stop and start the vMX instances. A simple configuration file defines parameters such as memory and vCPUs to allocate to the VCP and VFP.

A couple of Linux bridges are created by the orchestration script. Clearly VCP and VFP need to be able to communicate directly so an “internal” bridge is automatically created for each vMX instance to enable this communication.  An “external” bridge is also created, this is used to enable the management interface on the Linux Physical host to be used for the virtual management interfaces on the VCP and VFP.

For data interfaces, there are a couple of techniques available for packet I/O depending on the required vMX throughput –

  • Paravirtualisation using KVMs virtio drivers
  • PCI passthrough using single root I/O virtualisation (SR-IOV), enabling packets to bypass the hypervisor and therefore increase I/O.

Juniper recommend virtio or SR-IOV up to 3Gbps, and SR-IOV over 3Gbps (using a minimum of 2 x 10GE interfaces).

Which you will choose will ultimately depend on  your use case for the vMX.

Licensing

Now this is what I really like about vMX! Licensing is based on a combination of throughput and features, and the lowest available throughput license is 100Mbps! Yes – you don’t need to be shifting multi-Gigabits of traffic to start with vMX. You can start small and pay-as-you-grow with vMX.

Below 1Gbps there are only 3 options – 100Mbps, 250Mbps and 500Mbps. Full scale features are included! List price on the 100Mbps option is a very reasonable $750.

At 1Gbps and above, licences are a combination of features (Base, Advance, and Premium) and full duplex throughput (1G, 5G, 10G, 40G)

  • Base – IP routing with 32,000 routes in the forwarding table. Basic Layer 2 functionality, Layer 2 bridging and switching.
  • Advance – Features in the BASE application package IP routing with routes up to platform scale in the forwarding table. IP and MPLS switching for unicast and multicast applications. Layer 2 features include Layer 2 VPN, VPLS, EVPN, and Layer 2 Circuit
    VXLAN.
  • Premium – Features in the BASE and ADVANCE application packages. Layer 3 VPN for IP and multicast

Setting up vMX on Ubuntu

Now I’m going to walk through setting up vMX on Ubuntu 14.04 LTS server (Juniper’s recommended flavour of Linux for vMX). Just for fun this is actually running as a nested Vmware VM on my Macbook Pro – fine for a lab, but don’t try this in production! 🙂 I have allocated 8GB RAM, 4 vCPUs  and two vNICs to the Ubuntu VM. Also the VM is enabled to support hypervisor applications within the VM.

At this point Ubuntu Server has been freshly installed, and the option to install virtualisation was selected during setup.

First things first, let’s update all packages, install the prerequsite packages and restart the system

[email protected]:~$ sudo apt-get upgrade
<snip>
[email protected]:~$ sudo apt-get install bridge-utils qemu-kvm libvirt-bin python python-netifaces vnc4server libyaml-dev python-yaml numactl libparted0-dev libpciaccess-dev libnuma-dev libyajl-dev libxml2-dev libglib2.0-dev libnl-dev python-pip python-dev libxml2-dev libxslt-dev
<snip>
[email protected]:~$ sudo reboot

Configuring vMX

As this is a lab based build, I will be using virtio for the virtual NIC. There are two options on the VFP – a “Lite” version PFE for labs and performance version for normal operation.

Note: Ubuntu 14.04 provides libvirt 1.2.2 which works for VFP lite version. However for the VFP performance version you must upgrade to libvirt 1.2.8.

Let’s extract the vMX application bundle and get going!

[email protected]:~$ tar xzf vmx-14.1R5.4-1.tgz
[email protected]:~$ cd vmx-14.1R5.4-1/
[email protected]:~/vmx-14.1R5.4-1$ ls
config drivers env images scripts vmx.sh

First of all we need to setup the vmx config file, this is done by editing config/vmx.conf

First of all I set an instance name for vmx, and set the correct vmx images. I’m using vPFE-lite.

---
#Configuration on the host side - management interface, VM images etc.
HOST:
    identifier                : vmx1   # Maximum 4 characters
    host-management-interface : eth0
    routing-engine-image      : "/home/mdinham/vmx-14.1R5.4-1/images/jinstall64-vmx-14.1R5.4-domestic.img"
    routing-engine-hdd        : "/home/mdinham/vmx-14.1R5.4-1/images/vmxhdd.img"
    forwarding-engine-image   : "/home/mdinham/vmx-14.1R5.4-1/images/vPFE-lite-20150707.img"

Now the parameters the control plane and forwarding plane.

I’ve allocated 1 vCPU to vRE and 3 vCPU to vPFE. 1GB RAM to the RE and 6GB to the forwarding plane, as per the defaults for 14.1

UPDATE: Feb 2016
For vMX on 15.1, allocate 1 vCPU to vRE and 3 vCPU to vPFE. 2GB RAM to the RE and 8GB to the forwarding plane.

I have also tried vMX with 2GB allocated to the vPFE and the forwarding plane loaded, which could be fine for lab purposes. I’d expect 1GB to be the minimum on the vRE. 3 x vCPU seems to be the minimum for the vPFE.

Note that device-type is set to “virtio” for the interfaces.

---
#External bridge configuration
BRIDGES:
    - type  : external
      name  : br-ext                  # Max 10 characters

---
#vRE VM parameters
CONTROL_PLANE:
    vcpus       : 1
    memory-mb   : 2048
    console_port: 8601

    interfaces  :
      - type      : static
        ipaddr    : 10.102.144.94
        macaddr   : "0A:00:DD:C0:DE:0E"

---
#vPFE VM parameters
FORWARDING_PLANE:
    memory-mb   : 6144
    vcpus       : 3
    console_port: 8602
    device-type : virtio

    interfaces  :
      - type      : static
        ipaddr    : 10.102.144.98
        macaddr   : "0A:00:DD:C0:DE:10"

---
#Interfaces
JUNOS_DEVICES:
 - interface : ge-0/0/0
 mac-address : "02:06:0A:0E:FF:F0"
 description : "ge-0/0/0 interface"

I will only be using one interface in this lab, but up to 10 can be configured. For SR-IOV, things are done slightly differently – see this vMX doc for reference.

I now need to deploy the vMX instance using the orchestration script. “-lv” provides verbose logging. My vMX instance will be created by the script and automatically started.

[email protected]:~/vmx-14.1R5.4-1$ sudo ./vmx.sh -lv --install
==================================================
 Welcome to VMX
==================================================
Date..............................................07/18/15 13:19:03
VMX Identifier....................................vmx1
Config file......................................./home/mdinham/vmx-14.1R5.4-1/config/vmx.conf
Build Directory.................................../home/mdinham/vmx-14.1R5.4-1/build/vmx1
Environment file................................../home/mdinham/vmx-14.1R5.4-1/env/ubuntu_virtio.env
Junos Device Type.................................virtio
Initialize scripts................................[OK]
Copy images to build directory....................[OK]
==================================================
 VMX Environment Setup Completed
==================================================
==================================================
 VMX Install & Start
==================================================
Linux distribution................................ubuntu
Check GRUB........................................[Disabled]
Installation status of qemu-kvm...................[OK]
Installation status of libvirt-bin................[OK]
Installation status of bridge-utils...............[OK]
Installation status of python.....................[OK]
Installation status of libyaml-dev................[OK]
Installation status of python-yaml................[OK]
Installation status of numactl....................[OK]
Installation status of libnuma-dev................[OK]
Installation status of libparted0-dev.............[OK]
Installation status of libpciaccess-dev...........[OK]
Installation status of libyajl-dev................[OK]
Installation status of libxml2-dev................[OK]
Installation status of libglib2.0-dev.............[OK]
Installation status of libnl-dev..................[OK]
Check Kernel Version..............................[Disabled]
Check Qemu Version................................[Disabled]
Check libvirt Version.............................[Disabled]
Check virsh connectivity..........................[OK]
IXGBE Enabled.....................................[Disabled]
==================================================
 Pre-Install Checks Completed
==================================================
Check for VM vcp-vmx1.............................[Not Running]
Check for VM vfp-vmx1.............................[Not Running]
Cleanup VM states.................................[OK]
Check if bridge br-ext exists.....................[No]
Cleanup VM bridge br-ext..........................[OK]
Cleanup VM bridge br-int-vmx1.....................[OK]
==================================================
 VMX Stop Completed
==================================================
Check VCP image...................................[OK]
Check VFP image...................................[OK]
VMX Model.........................................Lite
Check VCP Config image............................[OK]
Check management interface........................[OK]
Setup huge pages to 8192..........................[OK]
Attempt to kill libvirt...........................[OK]
Attempt to start libvirt..........................[OK]
Sleep 2 secs......................................[OK]
Check libvirt support for hugepages...............[OK]
==================================================
 System Setup Completed
==================================================
Get Management Address of eth0....................[OK]
Generate libvirt files............................[OK]
Sleep 2 secs......................................[OK]
Find configured management interface..............eth0
Find existing management gateway..................eth0
Check if eth0 is already enslaved to br-ext.......[No]
Gateway interface needs change....................[Yes]
Create br-ext.....................................[OK]
Get Management Gateway............................192.168.100.254
Flush eth0........................................[OK]
Start br-ext......................................[OK]
Bind eth0 to br-ext...............................[OK]
Get Management MAC................................00:0c:29:76:a8:15
Assign Management MAC 00:0c:29:76:a8:15...........[OK]
Add default gw 192.168.100.254....................[OK]
Create br-int-vmx1................................[OK]
Start br-int-vmx1.................................[OK]
Check and start default bridge....................[OK]
Define vcp-vmx1...................................[OK]
Define vfp-vmx1...................................[OK]
Wait 2 secs.......................................[OK]
Start vcp-vmx1....................................[OK]
Start vfp-vmx1....................................[OK]
Wait 2 secs.......................................[OK]
==================================================
 VMX Bringup Completed
==================================================
Check if br-ext is created........................[Created]
Check if br-int-vmx1 is created...................[Created]
Check if VM vcp-vmx1 is running...................[Running]
Check if VM vfp-vmx1 is running...................[Running]
Check if tap interface vcp_ext-vmx1 exists........[OK]
Check if tap interface vcp_int-vmx1 exists........[OK]
Check if tap interface vfp_ext-vmx1 exists........[OK]
Check if tap interface vfp_int-vmx1 exists........[OK]
==================================================
 VMX Status Verification Completed.
==================================================
Log file..........................................
 /home/mdinham/vmx-14.1R5.4-1/build/vmx1/logs/vmx_1437221943.log
==================================================
 Thankyou for using VMX
==================================================

Connecting to the console port on the VMs

We can now connect to the vMX control plane! This is done using the vmx.sh script again.

Specify vcp (control plane – Junos) or vcf (vPFE) and the instance name.

[email protected]:~/vmx-14.1R5.4-1$ ./vmx.sh --console vcp vmx1
--
Login Console Port For vcp-vmx1 - 8601
Press Ctrl-] to exit anytime
--
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.


Amnesiac (ttyd0)

login:

After a while, the FPC and interfaces will come online

root> show interfaces terse | match ge-0/0/0
ge-0/0/0 up up

root> show chassis fpc
 Temp CPU Utilization (%) Memory Utilization (%)
Slot State (C) Total Interrupt DRAM (MB) Heap Buffer
 0 Online Absent 100 0 512 14 0

I’ll go ahead and add an IP address to ge-0/0/0. Note: if I was using the management interface I could configure interface FXP0 also now. Remember FXP0 will be bridged to the host eth0 adapter (or an adapter you specify).

root# set interfaces ge-0/0/0.0 family inet address 192.168.100.5/24

Can I ping anything?

root> ping 192.168.100.5
PING 192.168.100.5 (192.168.100.5): 56 data bytes
64 bytes from 192.168.100.5: icmp_seq=0 ttl=64 time=0.059 ms
^C
--- 192.168.100.5 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.059/0.059/0.059/0.000 ms

root> ping 192.168.100.254
PING 192.168.100.254 (192.168.100.254): 56 data bytes
^C
--- 192.168.100.254 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss

OK, so I can ping the interface but nothing else on the host. As I’m using virtio I need to create a device binding between the host physical NIC and the vMX interface.

Creating a virtio binding

This is done in the config file config/vmx-junosdev.conf.

virtio bindings are flexible and can be used to map multiple vMX instances to the same physical host interface, or to connect vMX instances together.

A new Linux bridge will be created between host interface eth1 and ge-0/0/0 on vmx1.

##############################################################
#
#  vmx-junos-dev.conf
#  - Config file for junos device bindings.
#  - Uses YAML syntax.
#  - Leave a space after ":" to specify the parameter value.
#  - For physical NIC, set the 'type' as 'host_dev'
#  - For junos devices, set the 'type' as 'junos_dev' and
#    set the mandatory parameter 'vm-name' to the name of
#    the vPFE where the device exists
#  - For bridge devices, set the 'type' as 'bridge_dev'
#
##############################################################
interfaces :

     - link_name  : vmx_link
       endpoint_1 :
         - type        : junos_dev
           vm_name     : vmx1
           dev_name    : ge-0/0/0
       endpoint_2 :
         - type        : host_dev
           dev_name    : eth1

If eth1 is not already up on the Linux host, bring it up

sudo ifconfig eth1 up

Again the orchestration script vmx.sh is used to create the device bindings

[email protected]:~/vmx-14.1R5.4-1$ sudo ./vmx.sh --bind-dev
Bind Link vmx_link(ge-0.0.0-vmx1, eth1)...........[OK]

And we can see a new bridge has been created called “vmx_link” as referenced in the bindings configuration file

[email protected]:~/vmx-14.1R5.4-1$ brctl show
bridge name     bridge id               STP enabled     interfaces
br-ext          8000.000c2976a815       yes             br-ext-nic
                                                        eth0
                                                        vcp_ext-vmx1
                                                        vfp_ext-vmx1
br-int-vmx1             8000.52540050c859       yes             br-int-vmx1-nic
                                                        vcp_int-vmx1
                                                        vfp_int-vmx1
virbr0          8000.fe060a0efff1       yes             ge-0.0.1-vmx1
                                                        ge-0.0.2-vmx1
                                                        ge-0.0.3-vmx1
vmx_link                8000.000c2976a81f       no              eth1
                                                        ge-0.0.0-vmx1

Now to retry that ping!

[email protected]:~/vmx-14.1R5.4-1$ ./vmx.sh --console vcp vmx1
--
Login Console Port For vcp-vmx1 - 8601
Press Ctrl-] to exit anytime
--
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.


root> ping 192.168.100.254
PING 192.168.100.254 (192.168.100.254): 56 data bytes
64 bytes from 192.168.100.254: icmp_seq=0 ttl=64 time=4.951 ms
64 bytes from 192.168.100.254: icmp_seq=1 ttl=64 time=2.081 ms
^C
--- 192.168.100.254 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 2.081/3.516/4.951/1.435 ms

Success! At this point I’ve a working vMX with an interface mapped to a NIC on the Ubuntu host. What happens if I turn on OSPF and LDP?

root> show ospf neighbor
Address          Interface              State     ID               Pri  Dead
192.168.100.254  ge-0/0/0.0             Full      10.0.0.2           1    37
192.168.100.1    ge-0/0/0.0             Full      10.0.0.1         128    39

root> show ldp neighbor
Address            Interface          Label space ID         Hold time
192.168.100.254    ge-0/0/0.0         10.0.0.2:0               13

Excellent, now the fun can really begin, but I’ll save that for another time!

vPFE

One last thing – what does the VFP look like?

[email protected]:~/vmx-14.1R5.4-1$ sudo ./vmx.sh --console vfp vmx1
--
Login Console Port For vfp-vmx1 - 8602
Press Ctrl-] to exit anytime
--
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.


Wind River Linux 6.0.0.12 localhost console

localhost login: root
Password:
Last login: Sat Jul 18 12:20:49 UTC 2015 on console

The riot process is where all the magic happens!

top - 12:52:55 up 33 min,  1 user,  load average: 0.38, 0.74, 0.62
Tasks: 102 total,   1 running, 101 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5%us,  2.1%sy,  0.0%ni, 96.8%id,  0.0%wa,  0.0%hi,  0.6%si,  0.0%st
Mem:   5824060k total,  4454308k used,  1369752k free,    12184k buffers
Swap:        0k total,        0k used,        0k free,    44552k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1019 root      20   0 36.6g  37m  10m S   12  0.7   4:25.04 riot
<snip>

Further reference

I hope you enjoyed this vMX post! For further reference on any of the above material please see the Juniper Release Notes for vMX

Mapping traffic to an LSP on Junos – BGP and table inet.3 (part 2)

Now that we know about IGP based LSP forwarding on Junos, the 2nd part in this series focuses on BGP and table inet.3. 

We also continue from where part 1 left off, by looking at how traffic-engineering bgp-igp and mpls-forwarding can affect route redistribution from OSPF into BGP.

Lab Topology

For this lab, I’ll be using the topology below.

lsplab

Software revisions are as follows

  • CE Routers (CE1, CE2): IOS (Cisco 7200 12.4(24)T)
  • P Routers (R1, R2, R3, R4, R5): IOS (Cisco 7200 12.4(24)T)
  • PE Routers (Junos1, Junos2): Junos (Olive 12.3R5.7)

As with Part 1, the base configurations are using OSPF as the routing protocol and LDP to exchange transport labels.

Route redistribution (bgp-igp and mpls-forwarding)

Here’s what I changed on Junos 1. The OSPF route 102.102.102.102 learnt via OSPF from CE2 will be redistributed in to BGP.

By the way, I’m not suggesting that your CEs should be part of your core IGP, but for the purposes on this lab test… 🙂

[email protected]# show | compare
[edit protocols bgp group internal]
+    export ospf2bgp;
[edit policy-options]
+   policy-statement ospf2bgp {
+       from {
+           protocol ospf;
+           route-filter 102.102.102.102/32 exact;
+       }
+       then accept;
+   }

The configuration on Junos1 is still running with “traffic-engineering mpls-forwarding” so the routing table has OSPF as the active route for routing, and the LDP route is active for forwarding

[email protected]> show route 102.102.102.102

inet.0: 21 destinations, 35 routes (21 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

102.102.102.102/32 @[OSPF/150] 00:00:28, metric 0, tag 0
                    > to 192.168.46.4 via em0.0
                   #[LDP/9] 00:00:28, metric 1
                    > to 192.168.46.4 via em0.0, Push 28

Hence you would definitely expect the routing policy to match on the OSPF route 102.102.102.102 and therefore we’ll see the route in BGP right? Sure enough if I hop over to Junos2, the route is there:

[email protected]> show route receive-protocol bgp 6.6.6.6

inet.0: 22 destinations, 23 routes (22 active, 0 holddown, 0 hidden)
  Prefix                  Nexthop              MED     Lclpref    AS path
  102.102.102.102/32      192.168.46.4         0       100        I

OK show what happens if I change over to traffic-engineering bgp-igp  on Junos1? The LDP route becomes the active route for routing and forwarding, and isn’t matched by my policy, so is not advertised to Junos2.

inet.0: 22 destinations, 36 routes (22 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

102.102.102.102/32 *[LDP/9] 00:00:04, metric 1
                    > to 192.168.46.4 via em0.0, Push 28
                    [OSPF/150] 00:04:54, metric 0, tag 0
                    > to 192.168.46.4 via em0.0

[email protected]> show route receive-protocol bgp 6.6.6.6

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)

inet.3: 14 destinations, 14 routes (14 active, 0 holddown, 0 hidden)

BGP (LSP forwarding) and table inet.3

We’ll now have a look at how BGP operates. I’ve set the “traffic-engineering” on Junos1 back to the defaults. We should expect BGP to recursively resolve it’s next-hop via inet.3 and therefore MPLS route the traffic. Let’s see!

iBGP Peering

There is an iBGP peering session between Junos1 and Junos2. No other routers are running iBGP

eBGP Peering

Junos2 has an eBGP peering with CE1. CE has a second Loopback 112.112.112.112 being advertised via this eBGP session.

[email protected]> show configuration protocols bgp
group as102 {
    peer-as 102;
    neighbor 192.168.102.1;
}
group internal {
    local-address 7.7.7.7;
    peer-as 1;
    neighbor 6.6.6.6;
}

[email protected]> show route receive-protocol bgp 192.168.102.1

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)
  Prefix                  Nexthop              MED     Lclpref    AS path
* 112.112.112.112/32      192.168.102.1        0                  102 I

LSP Fowarding and Routing

So how does Junos1 route to CE2s IP address 112.112.112.112? Let’s take a look at the routing tables.

[email protected]> show route 112.112.112.112

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

112.112.112.112/32 *[BGP/170] 00:05:00, MED 0, localpref 100, from 7.7.7.7
                      AS path: 102 I, validation-state: unverified
                    > to 192.168.46.4 via em0.0, Push 27

OK, so we see 112.112.112.112/32 in table inet.0 as expected, and it looks like label 27 is going to be pushed. Let’s take a look at this in more detail:

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)
112.112.112.112/32 (1 entry, 1 announced)
TSI:
KRT in-kernel 112.112.112.112/32 -> {indirect(131070)}
        *BGP    Preference: 170/-101
                Next hop type: Indirect
                Address: 0x9378ba4
                Next-hop reference count: 3
                Source: 7.7.7.7
                Next hop type: Router, Next hop index: 561
                Next hop: 192.168.46.4 via em0.0, selected
                Label operation: Push 27
                Label TTL action: prop-ttl
                Session Id: 0x1
                Protocol next hop: 192.168.102.1
                Indirect next hop: 93b8000 131070 INH Session ID: 0x2
                State: 
                Local AS:     1 Peer AS:     1
                Age: 5:41       Metric: 0       Metric2: 1
                Validation State: unverified
                Task: BGP_1.7.7.7.7+179
                Announcement bits (2): 0-KRT 6-Resolve tree 2
                AS path: 102 I
                Accepted
                Localpref: 100
                Router ID: 7.7.7.7
                Indirect next hops: 1
                        Protocol next hop: 192.168.102.1 Metric: 1
                        Indirect next hop: 93b8000 131070 INH Session ID: 0x2
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 192.168.46.4 via em0.0
                                Session Id: 0x1
                        192.168.102.0/24 Originating RIB: inet.3
                          Metric: 1                       Node path count: 1
                          Forwarding nexthops: 1
                                Nexthop: 192.168.46.4 via em0.0

The key here is the protocol next hop – 192.168.102.1.

192.168.102.1 isn’t directly attached to Junos1 – it is CE2s address on the Junos2<->CE2 segment, Therefore BGP will recursively resolve this next hop via table inet.3 and inet.0. As the inet.3 LDP route has a lower preference compared to the inet.0 OSPF route, the inet.3 route will be chosen and traffic will be placed on the LSP automatically, pushing label 27 in this case.

[email protected]> show route 192.168.102.1

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

192.168.102.0/24   *[OSPF/10] 00:06:30, metric 5
                    > to 192.168.46.4 via em0.0

inet.3: 14 destinations, 14 routes (14 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

192.168.102.0/24   *[LDP/9] 00:06:30, metric 1
                    > to 192.168.46.4 via em0.0, Push 27

[email protected]> show route forwarding-table destination 112.112.112.112
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
112.112.112.112/32 user     0                    indr 131070     2
                              192.168.46.4      Push 27   561     2 em0.0

[email protected]> traceroute 112.112.112.112
traceroute to 112.112.112.112 (112.112.112.112), 30 hops max, 40 byte packets
 1  192.168.46.4 (192.168.46.4)  28.026 ms  27.884 ms  28.510 ms
     MPLS Label=27 CoS=0 TTL=1 S=1
 2  192.168.34.3 (192.168.34.3)  29.767 ms  25.848 ms  28.571 ms
     MPLS Label=28 CoS=0 TTL=1 S=1
 3  192.168.35.5 (192.168.35.5)  29.831 ms  26.455 ms  28.586 ms
     MPLS Label=27 CoS=0 TTL=1 S=1
 4  192.168.57.7 (192.168.57.7)  29.478 ms  25.518 ms  29.075 ms
 5  192.168.102.1 (192.168.102.1)  32.961 ms  31.147 ms  33.398 ms

Traffic is labelled!

But what about IGP traffic to the protocol next hop? Well that won’t follow the LSP of course because we don’t have “mpls traffic-engineering” configured.

[email protected]> show route forwarding-table destination 192.168.102.1
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
192.168.102.0/24   user     0 192.168.46.4       ucst   555    32 em0.0

Exactly as expected!

I’ve shown that BGP is using table inet.3 to resolve next hops, where as normal IGP routing is using inet.0.

Another thing to remember with BGP & inet.3… if inet.0 contains a better route (e.g. better preference) then BGP would use the inet.0 route and traffic would not be forwarded on the LSP.

In this case, as none of the P routers are running BGP, this would break the connectivity (the P routers don’t know how to get to 112.112.112.112 so would drop the traffic). Hence, the traffic has to follow the LSP for the traffic to reach CE2.

Mapping traffic to an LSP on Junos (part 1)

It’s been a while since I wrote a Junos post… there are a few things that I’ve been meaning to write up for a while, so this series will focus on MPLS label switched paths (LSP) and how to put traffic on to an LSP.

By default IGPs won’t use an LSP, so this 1st part focuses on what you’ll need to configure should you want both BGP and IGP traffic forwarding to use LSPs.

Later parts will look at other options available in Junos to forward traffic via an LSP, and more advanced topics e.g. LSP selection based on BGP extended community.

I’ll start with a quick recap of the Junos routing tables – inet.0, inet.3 and mpls.0

Junos Routing Tables (IPv4)

inet.0

Table inet.0 is the primary unicast routing table used by IPv4. It’s where IGPs will resolve next hops.

inet.3

Table inet.3 is the routing table populated by MPLS protocols such as RSVP or LDP. A lookup here will result in a label being pushed.

It’s also worth noting that inet.3 is used by BGP to resolve BGP next-hops. BGP examines both inet.3 and inet.0, choosing a next-hop based on the lowest Junos preference value. In case of a tie, inet.3 is used.

mpls.0

This is the MPLS label switching table and is used by label switch routers. Routers along the LSP will use this table to swap and pop labels as appropriate.

Lab Topology

For this lab, I’ll be using the topology below.

lsplab

Software revisions are as follows

  • CE Routers (CE1, CE2): IOS (Cisco 7200 12.4(24)T)
  • P Routers (R1, R2, R3, R4, R5): IOS (Cisco 7200 12.4(24)T)
  • PE Routers (Junos1, Junos2): Junos (Olive 12.3R5.7)

The base configurations are using OSPF as the routing protocol and LDP to exchange transport labels.

LSP Traffic Forwarding

Default MPLS forwarding on Cisco and Juniper

Let’s have a look at Junos1 and R4 and see how traffic would be forwarded by default.

Junos1

102.102.102.102 is  an IP address assigned to the Looback0 on CE2. Let’s take a look at the routing table on Junos1:

[email protected]> show route 102.102.102.102

inet.0: 21 destinations, 21 routes (21 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

102.102.102.102/32 *[OSPF/150] 00:08:21, metric 0, tag 0
> to 192.168.46.4 via em0.0

inet.3: 14 destinations, 14 routes (14 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

102.102.102.102/32 *[LDP/9] 00:04:32, metric 1
> to 192.168.46.4 via em0.0, Push 28

Table inet.0 contains only the OSPF leant route. Remember that although there is a LDP route in table inet.3, it won’t be used for IGP route lookups, and we can verify this with a look at the forwarding table for prefix 102.102.102.102.

[email protected]> show route forwarding-table destination 102.102.102.102
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
102.102.102.102/32 user     0 192.168.46.4       ucst   555    30 em0.0

Note that type is “ucst”, if the traffic was to be labelled it would say “Push” followed by the label number to be pushed.

Therefore IPv4 unicast traffic by default on Junos will not be labelled.

R4

Now let’s take a look at R4

R4#show mpls forwarding-table 102.102.102.102
Local  Outgoing      Prefix            Bytes Label   Outgoing   Next Hop
Label  Label or VC   or Tunnel Id      Switched      interface
28     29            102.102.102.102/32   \
                                       0             Fa1/0      192.168.34.3

R4#sh ip cef 102.102.102.102 detail
102.102.102.102/32, epoch 0
  local label info: global/28
  nexthop 192.168.34.3 FastEthernet1/0 label 29

R4 shows a very different story – traffic will be labelled, and will push label 29.

Also note that R4 expects labelled traffic going to 102.102.102.102 to be received with label 28.

Verification

Let’s verify this situation with a traceroute on Junos1

[email protected]> traceroute 102.102.102.102
traceroute to 102.102.102.102 (102.102.102.102), 30 hops max, 40 byte packets
 1  192.168.46.4 (192.168.46.4)  8.855 ms  2.253 ms  4.572 ms
 2  192.168.34.3 (192.168.34.3)  28.783 ms  26.034 ms  28.780 ms
     MPLS Label=29 CoS=0 TTL=1 S=1
 3  192.168.35.5 (192.168.35.5)  27.265 ms  25.306 ms  28.669 ms
     MPLS Label=28 CoS=0 TTL=1 S=1
 4  192.168.57.7 (192.168.57.7)  29.321 ms  26.853 ms  28.707 ms
 5  192.168.102.1 (192.168.102.1)  34.871 ms  30.649 ms  33.596 ms

Well that’s pretty clear that the 1st hop traffic from Junos was not labelled, and traffic from the IOS box R4 was labelled. The egress label applied for the R4->R3 traffic was label 29 as expected.

So what if we want Junos to forward IGP traffic via an LSP? Well there are a couple of MPLS configuration options: traffic-engineering bgp-igp and mpls-forwarding.

traffic-engineering bgp-igp

Traffic-engineering bgp-igp configures BGP and the IGPs to use LSPs for forwarding traffic destined for egress routers. The bgp-igp option causes all inet.3 routes to be moved to the inet.0 routing table.

[email protected]# show | compare
[edit protocols mpls]
+   traffic-engineering bgp-igp;

[email protected]> show route 102.102.102.102

inet.0: 21 destinations, 35 routes (21 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

102.102.102.102/32 *[LDP/9] 00:00:33, metric 1
                    > to 192.168.46.4 via em0.0, Push 28
                    [OSPF/150] 00:00:33, metric 0, tag 0
                    > to 192.168.46.4 via em0.0

The LDP route and the OSPF route are now in table inet.0, and the LDP route with preference 9 is now the best route.

The traceroute should now show the 1st hop being labelled – let’s see:

[email protected]> traceroute 102.102.102.102
traceroute to 102.102.102.102 (102.102.102.102), 30 hops max, 40 byte packets
 1  192.168.46.4 (192.168.46.4)  30.523 ms  28.792 ms  28.561 ms
     MPLS Label=28 CoS=0 TTL=1 S=1
 2  192.168.34.3 (192.168.34.3)  28.502 ms  26.692 ms  28.756 ms
     MPLS Label=29 CoS=0 TTL=1 S=1
 3  192.168.35.5 (192.168.35.5)  28.296 ms  26.501 ms  28.403 ms
     MPLS Label=28 CoS=0 TTL=1 S=1
 4  192.168.57.7 (192.168.57.7)  28.205 ms  26.365 ms  28.581 ms
 5  192.168.102.1 (192.168.102.1)  33.717 ms  30.332 ms  33.996 ms

Spot on – and the 1st hop label is 28, as expected.

traffic-engineering mpls-forwarding

traffic-engineering bgp-igp will allow high-priority LSPs to supersede IGP routes in the inet.0 routing table.  Essentially this means that the active route might not be the IGP route, and therefore IGP routing might not be as expected, e.g. routing policy routes may not be matched.

The mpls-forwarding option enables LSPs to be used for forwarding but not route selection. Routes are added to both the inet.0 and inet.3 routing tables.

Let’s take a look at the routing table with the mpls-forwarding option in place:

[email protected]> show route 102.102.102.102

inet.0: 21 destinations, 35 routes (21 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

102.102.102.102/32 @[OSPF/150] 00:03:56, metric 0, tag 0
                    > to 192.168.46.4 via em0.0
                   #[LDP/9] 00:00:02, metric 1
                    > to 192.168.46.4 via em0.0, Push 28

inet.3: 14 destinations, 14 routes (14 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

102.102.102.102/32 *[LDP/9] 00:00:02, metric 1
                    > to 192.168.46.4 via em0.0, Push 28

Note that both the OSPF routes and the LDP routes a present in table inet.0 but the OSPF route is marked “Routing Use Only” and the LDP route “Forwarding Use Only”.

Therefore the outcome for traffic forwarding will be the identical to “bgp-igp”:

[email protected]> traceroute 102.102.102.102
traceroute to 102.102.102.102 (102.102.102.102), 30 hops max, 40 byte packets
 1  192.168.46.4 (192.168.46.4)  28.526 ms  28.594 ms  29.658 ms
     MPLS Label=28 CoS=0 TTL=1 S=1
 2  192.168.34.3 (192.168.34.3)  28.161 ms  26.584 ms  28.419 ms
     MPLS Label=29 CoS=0 TTL=1 S=1
 3  192.168.35.5 (192.168.35.5)  28.882 ms  25.807 ms  28.394 ms
     MPLS Label=28 CoS=0 TTL=1 S=1
 4  192.168.57.7 (192.168.57.7)  28.521 ms  26.170 ms  28.564 ms
 5  192.168.102.1 (192.168.102.1)  34.301 ms  31.283 ms  34.266 ms

The difference can be noted when we are matching against a protocol in a routing policy.

In the second part in this series, I will look at BGP forwarding via an LSP, and will also demonstrate how these two MPLS options affect route redistribution from OSPF to BGP.

 

 

 

IPv6: So I have my /32… now what?

I was recently doing some work at an ISP. They had been assigned their /32 IPv6 prefix a while ago, but other than a few internal test networks hadn’t done much with it since it was assigned.

I was pretty much asked “So I have my /32 … now what?” OK, so that’s not quite the question I was asked 🙂 but essentially they wanted some guidance on how to cut up the /32 for their own infrastructure, customers, etc and clarification on the RIPE IPv6 assignment policy.

Some people reading this post might not be familiar with how IP prefixes are allocated globally, so let’s start with a few definitions.

So how is all this IP stuff allocated anyway?

The Internet Assigned Numbers Authority (IANA) has authority over all IP address space and Autonomous system (AS) Numbers allocated and in use on the Internet and it is IANA that makes the allocations to the Regional Internet Registries.

 

“RIPE NCC” provides the IPv4, IPv6 and AS Number resources to it’s members in Europe, Central Asia and the Middle East.

Internet Registry (IR) – An Internet Registry is an organisation that is responsible for distributing IP address space to its members or customers and for registering those distributions. IRs are classified according to their primary function and territorial scope.

Regional Internet Registry (RIR) – Regional Internet Registries are established and authorised by respective regional communities and recognised by the IANA to serve and represent large geographical regions. The primary role of RIRs is to manage and distribute public Internet address space within their respective regions.

Local Internet Registry (LIR) – A Local Internet Registry is an IR that primarily assigns address space to the users of the network services that it provides. LIRs are generally ISPs whose customers are primarily End Users and possibly other ISPs.

Allocate – To “allocate” means to distribute address space to IRs for the purpose of subsequent distribution by them.

Assign – To “assign” means to delegate address space to an ISP or End User for specific use within the Internet infrastructure they operate. Assignments must only be made for specific purposes documented by specific organisations and are not to be sub-assigned to other parties.

End Site – An End Site is defined as an End User (subscriber) who has a business or legal relationship (same or associated entities) with a service provider that involves:

  • that service provider assigning address space to the End User
  • that service provider providing transit service for the End User to other sites
  • that service provider carrying the End User’s traffic
  • that service provider advertising an aggregate prefix route that contains the End User’s assignment

The definitions above are taken from RIPE document 589: IPv6 Address Allocation and Assignment Policy.

From Allocation to Assignment

When an ISP (an LIR) makes an application to get’s it’s IP space from the Regional Registry, in this case RIPE. Subject to meeting the criteria, the LIR will be provided with the current minimum IPv6 allocation size of a /32.

The LIR must then divide up their /32 and make IPv6 assignments in accordance with RIPE policy and network operator current best practice. It is at this point where I got involved.

Let’s start with the RIPE policy on allocating IPv6 to the ISPs own infrastructure…

ISP Infrastructure

RIPE document 589: IPv6 Address Allocation and Assignment Policy states:

“An organisation (i.e. ISP/LIR) may assign a network prefix per PoP as the service infrastructure of an IPv6 service operator. Each assignment to a PoP is regarded as one assignment regardless of the number of users using the PoP. A separate assignment can be obtained for the in-house operations of the operator.”

This means that an LIR can allocate a prefix per PoP that provides IPv6 addressing to all infrastructure in that PoP, i.e. to routers, switches, servers, backbone p2p links, etc.

RIPE will allow a /48 to be assigned per PoP without sending a request to RIPE.

Whilst a large ISP might assign a /48 per PoP, a smaller ISP might pick the choose to use only a single /48 for it’s own infrastructure.

It’s important to keep separate blocks for infrastructure and customer assignments (customer ranges are not “trusted”, so it’s not desirable to assign customer ranges from the infrastructure block. For example, addressing these ranges from a separate block enables simpler ingress/egress edge filtering.)

More detail on address planning in part 2 of this series.

End Users / End Sites

Basically, for a /48 or longer, the LIR can assign prefixes however it feels is appropriate to and End Site. The current guidelines are documented in RFC6177 (IPv6 Address Assignment to End Sites), which obsoletes the previous RFC3177 recommendation of a /48 to all End Sites – “a one-size-fits-all recommendation of /48 is not nuanced enough for the broad range of end sites and is no longer recommended as a single default.”

RFC6177 makes recommendations as follows:

  • The minimum allocation to sites should be a /64. Even where only a single IP address is needed, a /128 should no longer be allocated to End Sites (as a site implies multiple devices)
  • RFC3177 recommended prefix lengths of /48, /64 and /128 and this raised some concerns that operational practice and implementation may become “hard coded” around these fixed boundaries. This has never been the actual intention and CIDR continues to apply to IPv6.
  • A /48 is no longer the recommended default assignment size. End sites are all different and the assignment should be appropriate to their needs (be it a /48, /52, /56, etc)

But RFC6177 does not make a formal recommendation on what assignment sizes should be made. This is now left to the discretion of the LIR to allocate as appropriate, but it does reaffirm that the allocation should allow for growth:

“A key principle for address management is that end sites always be able to obtain a reasonable amount of address space for their actual and planned usage, and over time ranges specified in years rather than just months. In practice, that means at least one /64, and in most cases significantly more.”

What this really means is that an ISP should assign IPv6 space to End Sites based on their needs for the next few years. For ease of operation and to make things easier for us humans, the prefix subnet should be on the nibble boundary for (a power of 4 subnet boundary), i.e. a /48, /52, /56, /60 or /64. More on this in part 2.

An ISP might end up assigning IPv6 prefixes to it’s customers like this:

  • /64 (1 subnet needed)
  • /60 (16 subnets needed)
  • /56 (256 subnets)
  • /52 (4096 subnets)
  • /48 (65536 subnets)
    At the time of writing, an LIR can assign a /48 or longer prefix to a single End Site without asking RIPE for approval. Anything bigger than a /48 needs approval.

And now all these allocations need to be documented in the RIPE database. More on this in another part.

And finally, yes, this blog is accessible via IPv6 🙂