-
Notifications
You must be signed in to change notification settings - Fork 229
Experimenting with Gatekeeper
This page describes how to test Gatekeeper and experiment with its functionality.
There are multiple options for setting up Gatekeeper:
- Use dedicated hardware to natively support Gatekeeper. See our Supported Hardware and Hardware Requirements to obtain NICs that have the properties that DPDK requires.
- Use Amazon Elastic Compute Cloud (EC2) to set up a cloud instance. See Setup on EC2.
- Use a virtual machine with KVM. See Setup on Virtual Machine.
- Use Chameleon testbed to set up Gatekeeper at scale. See Setup on Chameleon.
For more details about how to configure Gatekeeper before it runs and at runtime, see the Configuration page.
To generate packets from an interface bound to a DPDK-compatible driver and to debug Gatekeeper, we recommended the tips on our Tips for Debugging page.
This section describes how to test the functional blocks that compose Gatekeeper's main denial of service defense capabilities. These include the GK (Gatekeeper), GT (Grantor), GT-GK Unit (GGU), and SOL (Solicitor) blocks. Once you successfully configured Gatekeeper, and compiled it. Then, we can generate packets using pktgen to test each functional block. One can find instructions on how to setup the pktgen project.
First, open two terminals. In one terminal (T1), runs the following command to open the Gatekeeper directory:
$ cd gatekeeper
On the other terminal (T2), runs the following command to open the pktgen directory:
$ cd pktgen-dpdk/app/x86_64-native-linuxapp-gcc/
To test the GK bock, one needs to specify the packets' IP destinations, which will be used to lookup the policies in LPM table. Note that the policies maintained by the LPM table can be dynamically configured using Lua scripts. For simplicity, we don't configure any policy here. On terminal T1, we can start Gatekeeper program on port C as the front port and port D as the back port by running the command:
$ sudo ./build/gatekeeper -c 0xff -b 83:00.0 -b 83:00.1 --socket-mem 256
Since we blacklisted ports A and B, we need to modify the lua/if_map.lua to filter the interface mapping for ports A and B:
return { ["ens2f0"] = "0000:04:00.0", ["ens2f1"] = "0000:04:00.1", ["ens2f2"] = "0000:04:00.2", ["ens2f3"] = "0000:04:00.3", -- ["enp131s0f0"] = "0000:83:00.0", -- ["enp131s0f1"] = "0000:83:00.1", ["enp133s0f1"] = "0000:85:00.1", ["enp133s0f0"] = "0000:85:00.0", }
Note that, if you are using different ports names on your machine (e.g., virtual machine), you need to manually adjust the network ports (i.e., front_ports and back_ports) in lua/net.lua.
On terminal T2, we can run the pktgen program to generate test packets on port A using the following command:
$ sudo ./pktgen -c 0xf00 --socket-mem 256 --file-prefix pg1 -b 83:00.1 -b 85:00.0 -b 85:00.1 -- -T -P -m "[9:10].0"
To generate test packets, and send them to port C (i.e., the front port), one can use the following command:
$ Pktgen> set 0 count 1 $ Pktgen> set 0 src ip 10.0.0.2/24 $ Pktgen> set 0 dst ip 10.0.0.1 $ Pktgen> set 0 dst mac e8:ea:6a:06:21:b2 $ Pktgen> start 0
Note that, this packet will be dropped by GK block. One can construct more complex test packets with more knowledge about the GK block. However, the commands are similar but with different parameters for the packets.
The procedure is similar to the one for testing the GK block. However, one needs to send packets to the back port (i.e., port D), since these test packets are generated from GT block running on a Grantor server inside ISP. Specifically, one can use the following commands:
$ sudo ./pktgen -c 0xf00 --socket-mem 256 --file-prefix pg1 -b 83:00.0 -b 85:00.0 -b 85:00.1 -- -T -P -m "[9:10].0" $ Pktgen> set 0 count 1 $ Pktgen> set 0 proto udp $ Pktgen> set 0 dport 45232 $ Pktgen> set 0 sport 41120 $ Pktgen> set 0 src ip 66.9.149.187/32 $ Pktgen> set 0 dst ip 10.0.1.1 $ Pktgen> set 0 dst mac e8:ea:6a:06:21:b3 $ Pktgen> start 0
Note that, 10.0.1.1 is the configured IP address of the back port, specified in lua/net.lua. 45232 (that is 0xB0B0) and 41120 (that is 0xA0A0) are the destination and source ports respectively, which are configured in lua/gt.lua. This UDP packet is just for illustration and doesn't carry any GGU policy decisions.
This section describes how to test the functional blocks that enable Gatekeeper to be setup and function in a network. These include the CPS (Control Plane Services) and LLS (Link Layer Services) blocks.
TODO
For those with access, you can perform stress testing of the GK block on the XIA1 server.
You can find a Git branch configured to work in this environment here:
https://github.com/cjdoucette/gatekeeper/tree/scale_tests
This branch:
- Is configured to work with only a Gatekeeper server. All requests are dropped (as if they are declined) and declined decisions that are expiring are not attempted as requests.
- Includes a dynamic configuration script (lua/examples/add.lua) that adds the FIB entry to be repeatedly looked up.
- Provides GK measurements every 30 seconds.
- Disables VLAN tagging.
- Contains code that reports how specific addresses are mapped to lcores so that flows can be targeted for lcores, if desired.
https://github.com/cjdoucette/gk-test-files
This should be placed alongside the gatekeeper directory (not inside of it). Follow the README in the gk-test-files repository to use it.
To improve the performance of Gatekeeper, one may need to profile Gatekeeper. One option to do the profiling is to use Linux perf command. The detailed command usage is available on its wiki.
To record the call-graph of Gatekeeper, one can use the following command:
$ sudo perf record -g ./build/gatekeeper
After running Gatekeeper for a period of time, one can terminate Gatekeeper using Ctr + C command. Note that, one can specify the running time of the test using scripts. Finally, Linux perf records Gatekeeper profile into perf.data. One can use the following command to check the details of call-graph in Gatekeeper:
$ sudo perf report
Below is an example output after running this command:
Samples: 864K of event 'cycles:ppp', Event count (approx.): 8360255027836
Children Self Command Shared Object Symbol
+ 22.15% 0.00% lcore-slave-4 gatekeeper [.] eal_thread_loop
+ 19.97% 10.06% lcore-slave-4 gatekeeper [.] ggu_proc
+ 17.56% 0.00% gatekeeper [unknown] [.] 0x0000000000000005
+ 16.09% 1.77% gatekeeper gatekeeper [.] lls_proc
+ 11.28% 9.46% lcore-slave-4 gatekeeper [.] ixgbe_recv_scattered_pkts_vec
+ 10.80% 0.00% lcore-slave-5 gatekeeper [.] eal_thread_loop
+ 10.51% 6.65% gatekeeper gatekeeper [.] process_pkts
+ 8.52% 6.64% lcore-slave-5 gatekeeper [.] sol_proc
+ 6.70% 0.00% lcore-slave-2 gatekeeper [.] eal_thread_loop
+ 6.32% 1.31% lcore-slave-2 gatekeeper [.] gk_proc
+ 5.86% 0.00% lcore-slave-3 gatekeeper [.] eal_thread_loop
+ 5.57% 0.65% lcore-slave-3 gatekeeper [.] gk_proc
+ 4.75% 0.00% lcore-slave-1 gatekeeper [.] eal_thread_loop
+ 4.52% 0.49% lcore-slave-1 gatekeeper [.] gk_proc
+ 4.08% 3.39% lcore-slave-1 gatekeeper [.] rte_hash_cuckoo_make_space_mw
......
Note that the first two columns represent the children and self overhead respectively. The overhead means the percentage of overall samples collected in the corresponding function. According to the perf manual page: the self overhead is simply calculated by adding all period values of the entry - usually a function (symbol). This is the value that perf shows traditionally and sum of all the self overhead values should be 100%; The children overhead is calculated by adding all period values of the child functions so that it can show the total overhead of the higher level functions even if they don't directly execute much. Children here means functions that are called from another (parent) function.
We are particularly interested in how GK blocks perform,as they are on the critical path. We can zoom into lcore-slave-1 thread which runs the core function gk_proc(), the results are shown as follows:
Samples: 864K of event 'cycles:ppp', Event count (approx.): 8360255027836, Thread: lcore-slave-1
Children Self Command Shared Object Symbol ◆
+ 4.75% 0.00% lcore-slave-1 gatekeeper [.] eal_thread_loop ▒
+ 4.52% 0.49% lcore-slave-1 gatekeeper [.] gk_proc ▒
+ 4.08% 3.39% lcore-slave-1 gatekeeper [.] rte_hash_cuckoo_make_space_mw ▒
+ 2.40% 1.02% lcore-slave-1 gatekeeper [.] process_pkts_front ▒
......
To record the call-graph of Gatekeeper for memory loads, one can use the following command:
$ sudo perf mem record ./build/gatekeeper
Similarly, one can use the following command to check the details of call-graph in Gatekeeper:
$ sudo perf report
Below is an example output after running this command:
Samples: 2M of event 'cpu/mem-loads,ldlat=30/P', Event count (approx.): 295025233
Overhead Command Shared Object Symbol
11.14% lcore-slave-5 gatekeeper [.] sol_proc
8.96% lcore-slave-1 gatekeeper [.] rte_hash_cuckoo_make_space_mw
7.26% lcore-slave-3 gatekeeper [.] rte_hash_cuckoo_make_space_mw
4.82% lcore-slave-5 gatekeeper [.] common_ring_mp_enqueue
4.30% lcore-slave-5 gatekeeper [.] ixgbe_xmit_pkts
2.15% lcore-slave-2 gatekeeper [.] __rte_hash_del_key_with_hash
1.93% lcore-slave-1 gatekeeper [.] __rte_hash_lookup_bulk_with_hash_l.constprop.29
1.90% lcore-slave-3 gatekeeper [.] __rte_hash_lookup_bulk_with_hash_l.constprop.29
1.87% lcore-slave-2 gatekeeper [.] __rte_hash_lookup_bulk_with_hash_l.constprop.29
We can also zoom into lcore-slave-1 thread, as the GK block core function gk_proc() was running on it:
Samples: 2M of event 'cpu/mem-loads,ldlat=30/P', Event count (approx.): 295025233, Thread: lcore-slave-1
Overhead Command Shared Object Symbol ◆
8.96% lcore-slave-1 gatekeeper [.] rte_hash_cuckoo_make_space_mw ▒
1.93% lcore-slave-1 gatekeeper [.] __rte_hash_lookup_bulk_with_hash_l.constprop.29 ▒
1.35% lcore-slave-1 gatekeeper [.] ixgbe_recv_pkts_vec ▒
1.32% lcore-slave-1 gatekeeper [.] __rte_hash_del_key_with_hash ▒
1.13% lcore-slave-1 gatekeeper [.] gk_process_request.isra.15 ▒
1.11% lcore-slave-1 gatekeeper [.] __rte_hash_add_key_with_hash ▒
0.89% lcore-slave-1 gatekeeper [.] common_ring_mc_dequeue ▒
0.87% lcore-slave-1 gatekeeper [.] extract_packet_info ▒
0.87% lcore-slave-1 gatekeeper [.] encapsulate ▒
0.82% lcore-slave-1 gatekeeper [.] process_pkts_front ▒
0.79% lcore-slave-1 gatekeeper [.] adjust_pkt_len ▒
0.50% lcore-slave-1 gatekeeper [.] memcmp@plt
......
We may also be interested in annotating the overhead of a function, assuming we are interested in encapsulate(), we can choose to annotate it as follows:
encapsulate /home/qiaobin/gatekeeper/build/gatekeeper
Percent│ outer_ip4hdr->src_addr = iface->ip4_addr.s_addr;
0.53 │ mov 0x258(%rbp),%edx
│ mov %edx,0xc(%rax)
│ outer_ip4hdr->dst_addr = gt_addr->ip.v4.s_addr;
1.01 │ mov 0x4(%r13),%edx
0.00 │ mov %edx,0x10(%rax)
│ rte_cpu_to_be_16(pkt->pkt_len - iface->l2_len_out);
26.21 │ movzwl 0x24(%r12),%edx
│ sub %ecx,%edx
│ rte_arch_bswap16():
│ xchg %dl,%dh
│ encapsulate():
│ outer_ip4hdr->hdr_checksum = 0;
│ xor %ecx,%ecx
│ outer_ip4hdr->total_length =
│ mov %dx,0x2(%rax)
│ return 0;
0.01 │ xor %r14d,%r14d
│ outer_ip4hdr->hdr_checksum = 0;
│ mov %cx,0xa(%rax)
│ pkt->l3_len = sizeof(struct rte_ipv4_hdr);
68.42 │ movzwl 0x58(%r12),%eax
│ }
......
Note that numbers on the left side of the bar indicate the percentage of total samples that are recorded against that particular instruction. For example, 68.42% samples of encapsulate() were recorded on the movzwl 0x58(%r12),%eax instruction. Perf also shows these numbers in different colors based on how hot the instruction is.
Similarly, one can profile cache misses of Gatekeeper, which are important metrics for evaluating the effectiveness of memory prefetching techniques:
$ sudo perf stat -d -d -d ./build/gatekeeper
An example output is shown below:
Performance counter stats for './build/gatekeeper --file-prefix gk -w 0000:85:00.0 -w 0000:85:00.1 -- -l testx/gk_2.25_3lcore_16bots_1.log':
1882191.927242 task-clock (msec) # 5.641 CPUs utilized
96,075 context-switches # 0.051 K/sec
32 cpu-migrations # 0.000 K/sec
3,028 page-faults # 0.002 K/sec
5,626,153,554,672 cycles # 2.989 GHz (26.67%)
3,504,193,848,377 stalled-cycles-frontend # 62.28% frontend cycles idle (26.67%)
6,434,258,804,285 instructions # 1.14 insn per cycle
# 0.54 stalled cycles per insn (33.34%)
869,134,076,984 branches # 461.767 M/sec (33.33%)
10,250,878,156 branch-misses # 1.18% of all branches (33.33%)
2,140,112,178,105 L1-dcache-loads # 1137.032 M/sec (26.66%)
46,108,065,164 L1-dcache-load-misses # 2.15% of all L1-dcache hits (13.33%)
23,588,717,202 LLC-loads # 12.533 M/sec (13.33%)
12,093,886,730 LLC-load-misses # 51.27% of all LL-cache hits (20.00%)
<not supported> L1-icache-loads
970,829,847 L1-icache-load-misses (26.67%)
2,140,378,400,472 dTLB-loads # 1137.173 M/sec (26.66%)
7,915,973,786 dTLB-load-misses # 0.37% of all dTLB cache hits (13.33%)
184,116,114 iTLB-loads # 0.098 M/sec (13.33%)
9,409,649 iTLB-load-misses # 5.11% of all iTLB cache hits (20.00%)
<not supported> L1-dcache-prefetches
9,212,035,328 L1-dcache-prefetch-misses # 4.894 M/sec (26.67%)
333.635510550 seconds time elapsed
Note that the percentage on the last column of this report is the scaling factor. As documented on the Linux perf wiki multiplexing and scaling events: if there are more events than counters, the kernel uses time multiplexing (switch frequency = HZ, generally 100 or 1000) to give each event a chance to access the monitoring hardware. Multiplexing only applies to PMU events. With multiplexing, an event is not measured all the time. At the end of the run, the tool scales the count based on total time enabled vs time running.