Skip to content

Experimenting with Gatekeeper

Qiaobin Fu edited this page Nov 1, 2019 · 29 revisions

This page describes how to test Gatekeeper and experiment with its functionality.

Table of Contents

Setting Up a Gatekeeper Environment

There are multiple options for setting up Gatekeeper:

Once the environment is setup, follow the instructions in the README to compile, configure, and run Gatekeeper.

Configuring Gatekeeper

For more details about how to configure Gatekeeper before it runs and at runtime, see the Configuration page.

Debugging Gatekeeper

To generate packets from an interface bound to a DPDK-compatible driver and to debug Gatekeeper, we recommended the tips on our Tips for Debugging page.

Testing Gatekeeper DoS Defense Algorithms

This section describes how to test the functional blocks that compose Gatekeeper's main denial of service defense capabilities. These include the GK (Gatekeeper), GT (Grantor), GT-GK Unit (GGU), and SOL (Solicitor) blocks. Once you successfully configured Gatekeeper, and compiled it. Then, we can generate packets using pktgen to test each functional block. One can find instructions on how to setup the pktgen project.

First, open two terminals. In one terminal (T1), runs the following command to open the Gatekeeper directory:

 $ cd gatekeeper

On the other terminal (T2), runs the following command to open the pktgen directory:

 $ cd pktgen-dpdk/app/x86_64-native-linuxapp-gcc/

Testing GK block

To test the GK bock, one needs to specify the packets' IP destinations, which will be used to lookup the policies in LPM table. Note that the policies maintained by the LPM table can be dynamically configured using Lua scripts. For simplicity, we don't configure any policy here. On terminal T1, we can start Gatekeeper program on port C as the front port and port D as the back port by running the command:

 $ sudo ./build/gatekeeper -c 0xff -b 83:00.0 -b 83:00.1 --socket-mem 256

Since we blacklisted ports A and B, we need to modify the lua/if_map.lua to filter the interface mapping for ports A and B:

 return {
        ["ens2f0"] = "0000:04:00.0",
        ["ens2f1"] = "0000:04:00.1",
        ["ens2f2"] = "0000:04:00.2",
        ["ens2f3"] = "0000:04:00.3",
 --     ["enp131s0f0"] = "0000:83:00.0",
 --     ["enp131s0f1"] = "0000:83:00.1",
        ["enp133s0f1"] = "0000:85:00.1",
        ["enp133s0f0"] = "0000:85:00.0",
 }

Note that, if you are using different ports names on your machine (e.g., virtual machine), you need to manually adjust the network ports (i.e., front_ports and back_ports) in lua/net.lua.

On terminal T2, we can run the pktgen program to generate test packets on port A using the following command:

 $ sudo ./pktgen -c 0xf00 --socket-mem 256 --file-prefix pg1 -b 83:00.1 -b 85:00.0 -b 85:00.1 -- -T -P -m "[9:10].0"

To generate test packets, and send them to port C (i.e., the front port), one can use the following command:

 $ Pktgen> set 0 count 1
 $ Pktgen> set 0 src ip 10.0.0.2/24
 $ Pktgen> set 0 dst ip 10.0.0.1
 $ Pktgen> set 0 dst mac e8:ea:6a:06:21:b2
 $ Pktgen> start 0

Note that, this packet will be dropped by GK block. One can construct more complex test packets with more knowledge about the GK block. However, the commands are similar but with different parameters for the packets.

Testing GGU block

The procedure is similar to the one for testing the GK block. However, one needs to send packets to the back port (i.e., port D), since these test packets are generated from GT block running on a Grantor server inside ISP. Specifically, one can use the following commands:

 $ sudo ./pktgen -c 0xf00 --socket-mem 256 --file-prefix pg1 -b 83:00.0 -b 85:00.0 -b 85:00.1 -- -T -P -m "[9:10].0"
 $ Pktgen> set 0 count 1
 $ Pktgen> set 0 proto udp
 $ Pktgen> set 0 dport 45232
 $ Pktgen> set 0 sport 41120
 $ Pktgen> set 0 src ip 66.9.149.187/32
 $ Pktgen> set 0 dst ip 10.0.1.1
 $ Pktgen> set 0 dst mac e8:ea:6a:06:21:b3
 $ Pktgen> start 0

Note that, 10.0.1.1 is the configured IP address of the back port, specified in lua/net.lua. 45232 (that is 0xB0B0) and 41120 (that is 0xA0A0) are the destination and source ports respectively, which are configured in lua/gt.lua. This UDP packet is just for illustration and doesn't carry any GGU policy decisions.

Testing Gatekeeper Configuration and Setup

This section describes how to test the functional blocks that enable Gatekeeper to be setup and function in a network. These include the CPS (Control Plane Services) and LLS (Link Layer Services) blocks.

TODO

Stress Testing Gatekeeper on XIA1

For those with access, you can perform stress testing of the GK block on the XIA1 server.

You can find a Git branch configured to work in this environment here:

https://github.com/cjdoucette/gatekeeper/tree/scale_tests

This branch:

  • Is configured to work with only a Gatekeeper server. All requests are dropped (as if they are declined) and declined decisions that are expiring are not attempted as requests.
  • Includes a dynamic configuration script (lua/examples/add.lua) that adds the FIB entry to be repeatedly looked up.
  • Provides GK measurements every 30 seconds.
  • Disables VLAN tagging.
  • Contains code that reports how specific addresses are mapped to lcores so that flows can be targeted for lcores, if desired.
We also have scripts for generating client traffic, changing configuration files, and collecting measurements:

https://github.com/cjdoucette/gk-test-files

This should be placed alongside the gatekeeper directory (not inside of it). Follow the README in the gk-test-files repository to use it.

Gatekeeper Profiling

To improve the performance of Gatekeeper, one may need to profile Gatekeeper. One option to do the profiling is to use Linux perf command. The detailed command usage is available on its wiki.

CPU cycles profiling

To record the call-graph of Gatekeeper, one can use the following command:

 $ sudo perf record -g ./build/gatekeeper

After running Gatekeeper for a period of time, one can terminate Gatekeeper using Ctr + C command. Note that, one can specify the running time of the test using scripts. Finally, Linux perf records Gatekeeper profile into perf.data. One can use the following command to check the details of call-graph in Gatekeeper:

 $ sudo perf report

Below is an example output after running this command:

Samples: 864K of event 'cycles:ppp', Event count (approx.): 8360255027836
  Children      Self  Command          Shared Object       Symbol
+   22.15%     0.00%  lcore-slave-4    gatekeeper          [.] eal_thread_loop
+   19.97%    10.06%  lcore-slave-4    gatekeeper          [.] ggu_proc
+   17.56%     0.00%  gatekeeper       [unknown]           [.] 0x0000000000000005
+   16.09%     1.77%  gatekeeper       gatekeeper          [.] lls_proc
+   11.28%     9.46%  lcore-slave-4    gatekeeper          [.] ixgbe_recv_scattered_pkts_vec
+   10.80%     0.00%  lcore-slave-5    gatekeeper          [.] eal_thread_loop
+   10.51%     6.65%  gatekeeper       gatekeeper          [.] process_pkts
+    8.52%     6.64%  lcore-slave-5    gatekeeper          [.] sol_proc
+    6.70%     0.00%  lcore-slave-2    gatekeeper          [.] eal_thread_loop
+    6.32%     1.31%  lcore-slave-2    gatekeeper          [.] gk_proc
+    5.86%     0.00%  lcore-slave-3    gatekeeper          [.] eal_thread_loop
+    5.57%     0.65%  lcore-slave-3    gatekeeper          [.] gk_proc
+    4.75%     0.00%  lcore-slave-1    gatekeeper          [.] eal_thread_loop
+    4.52%     0.49%  lcore-slave-1    gatekeeper          [.] gk_proc
+    4.08%     3.39%  lcore-slave-1    gatekeeper          [.] rte_hash_cuckoo_make_space_mw
......

Note that the first two columns represent the children and self overhead respectively. The overhead means the percentage of overall samples collected in the corresponding function. According to the perf manual page: the self overhead is simply calculated by adding all period values of the entry - usually a function (symbol). This is the value that perf shows traditionally and sum of all the self overhead values should be 100%; The children overhead is calculated by adding all period values of the child functions so that it can show the total overhead of the higher level functions even if they don't directly execute much. Children here means functions that are called from another (parent) function.

We are particularly interested in how GK blocks perform,as they are on the critical path. We can zoom into lcore-slave-1 thread which runs the core function gk_proc(), the results are shown as follows:

Samples: 864K of event 'cycles:ppp', Event count (approx.): 8360255027836, Thread: lcore-slave-1
  Children      Self  Command        Shared Object       Symbol                                                                                                                  ◆
+    4.75%     0.00%  lcore-slave-1  gatekeeper          [.] eal_thread_loop                                                                                                     ▒
+    4.52%     0.49%  lcore-slave-1  gatekeeper          [.] gk_proc                                                                                                             ▒
+    4.08%     3.39%  lcore-slave-1  gatekeeper          [.] rte_hash_cuckoo_make_space_mw                                                                                       ▒
+    2.40%     1.02%  lcore-slave-1  gatekeeper          [.] process_pkts_front                                                                                                  ▒
......

CPU memory loads profiling

To record the call-graph of Gatekeeper for memory loads, one can use the following command:

 $ sudo perf mem record ./build/gatekeeper

Similarly, one can use the following command to check the details of call-graph in Gatekeeper:

 $ sudo perf report

Below is an example output after running this command:

Samples: 2M of event 'cpu/mem-loads,ldlat=30/P', Event count (approx.): 295025233
Overhead  Command         Shared Object       Symbol
  11.14%  lcore-slave-5   gatekeeper          [.] sol_proc
   8.96%  lcore-slave-1   gatekeeper          [.] rte_hash_cuckoo_make_space_mw
   7.26%  lcore-slave-3   gatekeeper          [.] rte_hash_cuckoo_make_space_mw
   4.82%  lcore-slave-5   gatekeeper          [.] common_ring_mp_enqueue
   4.30%  lcore-slave-5   gatekeeper          [.] ixgbe_xmit_pkts
   2.15%  lcore-slave-2   gatekeeper          [.] __rte_hash_del_key_with_hash
   1.93%  lcore-slave-1   gatekeeper          [.] __rte_hash_lookup_bulk_with_hash_l.constprop.29
   1.90%  lcore-slave-3   gatekeeper          [.] __rte_hash_lookup_bulk_with_hash_l.constprop.29
   1.87%  lcore-slave-2   gatekeeper          [.] __rte_hash_lookup_bulk_with_hash_l.constprop.29

We can also zoom into lcore-slave-1 thread, as the GK block core function gk_proc() was running on it:

Samples: 2M of event 'cpu/mem-loads,ldlat=30/P', Event count (approx.): 295025233, Thread: lcore-slave-1
Overhead  Command        Shared Object      Symbol                                                                                                                               ◆
   8.96%  lcore-slave-1  gatekeeper         [.] rte_hash_cuckoo_make_space_mw                                                                                                    ▒
   1.93%  lcore-slave-1  gatekeeper         [.] __rte_hash_lookup_bulk_with_hash_l.constprop.29                                                                                  ▒
   1.35%  lcore-slave-1  gatekeeper         [.] ixgbe_recv_pkts_vec                                                                                                              ▒
   1.32%  lcore-slave-1  gatekeeper         [.] __rte_hash_del_key_with_hash                                                                                                     ▒
   1.13%  lcore-slave-1  gatekeeper         [.] gk_process_request.isra.15                                                                                                       ▒
   1.11%  lcore-slave-1  gatekeeper         [.] __rte_hash_add_key_with_hash                                                                                                     ▒
   0.89%  lcore-slave-1  gatekeeper         [.] common_ring_mc_dequeue                                                                                                           ▒
   0.87%  lcore-slave-1  gatekeeper         [.] extract_packet_info                                                                                                              ▒
   0.87%  lcore-slave-1  gatekeeper         [.] encapsulate                                                                                                                      ▒
   0.82%  lcore-slave-1  gatekeeper         [.] process_pkts_front                                                                                                               ▒
   0.79%  lcore-slave-1  gatekeeper         [.] adjust_pkt_len                                                                                                                   ▒
   0.50%  lcore-slave-1  gatekeeper         [.] memcmp@plt
......

Single function profiling

We may also be interested in annotating the overhead of a function, assuming we are interested in encapsulate(), we can choose to annotate it as follows:

encapsulate  /home/qiaobin/gatekeeper/build/gatekeeper
Percent│                     outer_ip4hdr->src_addr = iface->ip4_addr.s_addr;
  0.53 │       mov    0x258(%rbp),%edx
       │       mov    %edx,0xc(%rax)
       │                     outer_ip4hdr->dst_addr = gt_addr->ip.v4.s_addr;
  1.01 │       mov    0x4(%r13),%edx
  0.00 │       mov    %edx,0x10(%rax)
       │                             rte_cpu_to_be_16(pkt->pkt_len - iface->l2_len_out);
 26.21 │       movzwl 0x24(%r12),%edx
       │       sub    %ecx,%edx
       │     rte_arch_bswap16():
       │       xchg   %dl,%dh
       │     encapsulate():
       │                     outer_ip4hdr->hdr_checksum = 0;
       │       xor    %ecx,%ecx
       │                     outer_ip4hdr->total_length =
       │       mov    %dx,0x2(%rax)
       │             return 0;
  0.01 │       xor    %r14d,%r14d
       │                     outer_ip4hdr->hdr_checksum = 0;
       │       mov    %cx,0xa(%rax)
       │                     pkt->l3_len = sizeof(struct rte_ipv4_hdr);
 68.42 │       movzwl 0x58(%r12),%eax
       │     }
......

Note that numbers on the left side of the bar indicate the percentage of total samples that are recorded against that particular instruction. For example, 68.42% samples of encapsulate() were recorded on the movzwl 0x58(%r12),%eax instruction. Perf also shows these numbers in different colors based on how hot the instruction is.

CPU performance counter statistics

Similarly, one can profile cache misses of Gatekeeper, which are important metrics for evaluating the effectiveness of memory prefetching techniques:

 $ sudo perf stat -d -d -d ./build/gatekeeper

An example output is shown below:

 Performance counter stats for './build/gatekeeper --file-prefix gk -w 0000:85:00.0 -w 0000:85:00.1 -- -l testx/gk_2.25_3lcore_16bots_1.log':

    1882191.927242      task-clock (msec)         #    5.641 CPUs utilized          
            96,075      context-switches          #    0.051 K/sec                  
                32      cpu-migrations            #    0.000 K/sec                  
             3,028      page-faults               #    0.002 K/sec                  
 5,626,153,554,672      cycles                    #    2.989 GHz                      (26.67%)
 3,504,193,848,377      stalled-cycles-frontend   #   62.28% frontend cycles idle     (26.67%)
 6,434,258,804,285      instructions              #    1.14  insn per cycle         
                                                  #    0.54  stalled cycles per insn  (33.34%)
   869,134,076,984      branches                  #  461.767 M/sec                    (33.33%)
    10,250,878,156      branch-misses             #    1.18% of all branches          (33.33%)
 2,140,112,178,105      L1-dcache-loads           # 1137.032 M/sec                    (26.66%)
    46,108,065,164      L1-dcache-load-misses     #    2.15% of all L1-dcache hits    (13.33%)
    23,588,717,202      LLC-loads                 #   12.533 M/sec                    (13.33%)
    12,093,886,730      LLC-load-misses           #   51.27% of all LL-cache hits     (20.00%)
   <not supported>      L1-icache-loads                                             
       970,829,847      L1-icache-load-misses                                         (26.67%)
 2,140,378,400,472      dTLB-loads                # 1137.173 M/sec                    (26.66%)
     7,915,973,786      dTLB-load-misses          #    0.37% of all dTLB cache hits   (13.33%)
       184,116,114      iTLB-loads                #    0.098 M/sec                    (13.33%)
         9,409,649      iTLB-load-misses          #    5.11% of all iTLB cache hits   (20.00%)
   <not supported>      L1-dcache-prefetches                                        
     9,212,035,328      L1-dcache-prefetch-misses #    4.894 M/sec                    (26.67%)

     333.635510550 seconds time elapsed

Note that the percentage on the last column of this report is the scaling factor. As documented on the Linux perf wiki multiplexing and scaling events: if there are more events than counters, the kernel uses time multiplexing (switch frequency = HZ, generally 100 or 1000) to give each event a chance to access the monitoring hardware. Multiplexing only applies to PMU events. With multiplexing, an event is not measured all the time. At the end of the run, the tool scales the count based on total time enabled vs time running.