Skip to content

Commit

Permalink
Merge pull request #152 from bytedance/add-bpf-built-in-rule
Browse files Browse the repository at this point in the history
Add `disallow-load-all-bpf-prog` rule for Seccomp enforcer to prohibit loading any types of ebpf programs.
  • Loading branch information
Danny-Wei authored Dec 26, 2024
2 parents 1672a92 + 156544f commit 2f0932f
Show file tree
Hide file tree
Showing 18 changed files with 261 additions and 87 deletions.
5 changes: 3 additions & 2 deletions docs/guides/policies_and_rules/built_in_rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,16 @@ Note:<br />- The built-in rules supported by different enforcers are still under
| | |Disable the mount system call<br /><br />`disallow-mount`|Privileged|[MOUNT(2)](https://man7.org/linux/man-pages/man2/mount.2.html) is often used for privilege escalation, container escapes, and other attacks. Most microservices applications do not require mount operations. Therefore, it is recommended to use this rule to restrict container processes from using the `mount()` system call.<br /><br />Note: The mount system call will be disabled by default if the `spec.policy.privileged` field is false.|Disable the mount system call.|AppArmor<br />BPF
| | |Disable the umount system call<br /><br />`disallow-umount`|ALL|[UMOUNT(2)](https://man7.org/linux/man-pages/man2/umount.2.html) can be used to remove the attachment of topmost mount points(such as maskedPaths), leading to privilege escalation and information disclosure. Most microservices applications do not require umount operations. Therefore, it is recommended to use this rule to restrict container processes from using the `umount()` system call.|Disable the umount system call.|AppArmor<br />BPF
| | |Prohibit loading kernel modules<br /><br />`disallow-insmod`|Privileged|Attackers may attempt to inject code into the kernel within a container (**w/ CAP_SYS_MODULE**) by executing kernel module loading command.|Disable CAP_SYS_MODULE|AppArmor<br />BPF
| | |Prohibit loading eBPF programs<br /><br />`disallow-load-ebpf`|ALL|Attackers may load eBPF programs within a container (**w/ CAP_SYS_ADMIN & CAP_BPF**) to theft data or create rootkit.<br /><br />Note: CAP_BPF was introduced starting from Linux 5.8.|Disable CAP_SYS_ADMIN & CAP_BPF|AppArmor<br />BPF
| | |Prohibit loading eBPF programs, except for those of the BPF_PROG_TYPE_SOCKET_FILTER and BPF_PROG_TYPE_CGROUP_SKB types.<br /><br />`disallow-load-bpf-prog`, `disallow-load-ebpf`|ALL|Attackers may load eBPF programs within a container (**w/ CAP_SYS_ADMIN, CAP_BPF**) to theft data or create rootkit.<br /><br />Before Linux 5.8, loading eBPF programs, except for those of the BPF_PROG_TYPE_SOCKET_FILTER and BPF_PROG_TYPE_CGROUP_SKB types, needs CAP_SYS_ADMIN. Since Linux 5.8, loading eBPF programs, except for those types, needs CAP_SYS_ADMIN or CAP_BPF. And some types of eBPF programs also require CAP_NET_ADMIN or CAP_PERFMON.<br /><br />It is recommended to use the `disallow-load-all-bpf-prog` rule to prohibit loading any types of eBPF programs to reduce the attack surface of kernel.|Disable CAP_SYS_ADMIN & CAP_BPF|AppArmor<br />BPF
| | |Prohibit accessing process's root directory<br /><br />`disallow-access-procfs-root`|ALL|This policy prohibits processes within containers from accessing the root directory of the process filesystem (i.e., /proc/[PID]/root), preventing attackers from exploiting shared PID namespaces to launch attacks.<br /><br />Attackers may attempt to access the process filesystem outside the container by reading and writing to /proc/*/root in environments where the PID namespace is shared with the host or other containers. This could lead to information disclosure, privilege escalation, lateral movement, and other attacks.|Disable PTRACE_MODE_READ permission |AppArmor<br />BPF
| | |Prohibit accessing kernel exported symbol<br /><br />`disallow-access-kallsyms`|ALL|Attackers may attempt to leak the base address of kernel modules from containers (**w/ CAP_SYSLOG**) by reading the kernel's exported symbol definitions file. This assists attackers in bypassing KASLR protection to exploit kernel vulnerabilities more easily.|Disallow reading /proc/kallsyms file|AppArmor<br />BPF
| |Disable Capabilities|Disable all capabilities<br /><br />`disable-cap-all`|ALL|Disable all capabilities|-|AppArmor<br />BPF
| | |Disable all capabilities except for NET_BIND_SERVICE<br /><br />`disable-cap-all-except-net-bind-service`|ALL|Disable all capabilities except for NET_BIND_SERVICE.<br /><br />This rule complies with the [*Restricted Policy*](https://kubernetes.io/concepts/security/pod-security-standards/#restricted) of the Pod Security Standards.|-|AppArmor<br />BPF
| | |Disable privileged capabilities<br /><br />`disable-cap-privileged`|ALL|Disable all privileged capabilities (those that can directly lead to escapes or affect host availability). Only allow the [default capabilities](https://github.com/containerd/containerd/blob/release/1.7/oci/spec.go#L115).<br /><br />This rule complies with the [*Baseline Policy*](https://kubernetes.io/concepts/security/pod-security-standards/#restricted) of the Pod Security Standards, except for the net_raw capability.|-|AppArmor<br />BPF
| | |Disable specified capability<br /><br />`disable-cap-XXXX`|ALL|Disable any specified capabilities, replacing XXXX with the values from 'capabilities(7),' for example, disable-cap-net-raw.|-|AppArmor<br />BPF
| |Blocking Exploit Vectors|Prohibit abusing user namespaces<br /><br />`disallow-abuse-user-ns`|ALL|User namespaces can be used to enhance container isolation. However, it also increases the kernel's attack surface, making certain kernel vulnerabilities easier to exploit. Attackers can use a container to create a user namespace, gaining full privileges and thereby expanding the kernel's attack surface<br /><br />Disallowing container processes from abusing CAP_SYS_ADMIN privileges via user namespaces can reduce the kernel's attack surface and block certain exploitation paths for kernel vulnerabilities.<br /><br />This rule can be used to harden containers on systems where kernel.unprivileged_userns_clone=0 or user.max_user_namespaces=0 is not set.| Disable CAP_SYS_ADMIN |AppArmor<br />BPF
| | |Prohibit creating user namespace<br /><br />`disallow-create-user-ns`|ALL|User namespaces can be used to enhance container isolation. However, it also increases the kernel's attack surface, making certain kernel vulnerabilities easier to exploit. Attackers can use a container to create a user namespace, gaining full privileges and thereby expanding the kernel's attack surface<br /><br />Disallowing container processes from creating new user namespaces can reduce the kernel's attack surface and block certain exploitation paths for kernel vulnerabilities.<br /><br />This rule can be used to harden containers on systems where kernel.unprivileged_userns_clone=0 or user.max_user_namespaces=0 is not set.| Disallow creating user namespace |Seccomp
| | |Prohibit creating user namespace<br /><br />`disallow-create-user-ns`|ALL|User namespaces can be used to enhance container isolation. However, it also increases the kernel's attack surface, making certain kernel vulnerabilities easier to exploit. Attackers can use a container to create a user namespace, gaining full privileges and thereby expanding the kernel's attack surface<br /><br />Disallowing container processes from creating new user namespaces can reduce the kernel's attack surface and block certain exploitation paths for kernel vulnerabilities.<br /><br />This rule can be used to harden containers on systems where kernel.unprivileged_userns_clone=0 or user.max_user_namespaces=0 is not set.| Disallow creating user namespace. |Seccomp
| | |Prohibit loading any types of eBPF programs<br /><br />`disallow-load-all-bpf-prog`|ALL|Attacker can load BPF_PROG_TYPE_SOCKET_FILTER or BPF_PROG_TYPE_CGROUP_SKB types of eBPF programs without privileged permission. So they may use these types of eBPF programs to sniff network data package, or exploit vulnerabilities of the BPF verifier or JIT engine to achieve container escape.<br /><br />This rule can be used to harden containers on systems where kernel.unprivileged_bpf_disabled=0.|Disallow loading any types of eBPF programs.|Seccomp
|**Attack Protection**|Mitigating Information Leakage|Mitigating ServiceAccount token leakage.<br /><br />`mitigate-sa-leak`|ALL|This rule prohibits container processes from reading sensitive Service Account-related information, including tokens, namespaces, and CA certificates. It helps prevent security risks arising from the leakage of Default ServiceAccount or misconfigured ServiceAccount. In the event that attackers gain access to a container through an RCE vulnerability, they often seek to further infiltrate by leaking ServiceAccount information.<br /><br />In most user scenarios, there is no need for Pods to communicate with the API Server using ServiceAccounts. However, by default, Kubernetes still sets up default ServiceAccounts for Pods that do not require communication with the API Server.|Disallow reading ServiceAccount-related files.|AppArmor<br />BPF
| | |Mitigating host disk device number leakage<br /><br />`mitigate-disk-device-number-leak`|ALL|Attackers may attempt to obtain host disk device numbers for subsequent container escape by reading the container process's mount information.|Disallow reading /proc/[PID]/mountinfo and /proc/partitions files|AppArmor<br />BPF
| | |Mitigating container overlayfs path leakage<br /><br />`mitigate-overlayfs-leak`|ALL|Attackers may attempt to obtain the overlayfs path of the container's rootfs on the host by accessing the container process's mount information, which could be used for subsequent container escape.|Disallow reading /proc/mounts, /proc/[PID]/mounts, and /proc/[PID]/mountinfo files.<br /><br />This rule may impact some functionality of the 'mount' command or syscall within containers|AppArmor<br />BPF
Expand Down
5 changes: 3 additions & 2 deletions docs/guides/policies_and_rules/built_in_rules.zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,16 @@
| | |禁用 mount 系统调用<br /><br />`disallow-mount`|Privileged|[MOUNT(2)](https://man7.org/linux/man-pages/man2/mount.2.html) 常被用于权限提升、容器逃逸等攻击。而几乎所有的微服务应用都无需 mount 操作,因此建议使用此规则限制容器内进程访问 mount 系统调用。<br /><br />注:当 spec.policy.privileged 为 false 时,将默认禁用 mount 系统调用。|禁用 mount 系统调用|AppArmor<br />BPF
| | |禁用 umount 系统调用<br /><br />`disallow-umount`|ALL|[UMOUNT(2)](https://man7.org/linux/man-pages/man2/umount.2.html) 可被用于卸载敏感的挂载点(例如 maskedPaths),从而导致权限提升、信息泄露。而几乎所有的微服务应用都无需 umount 操作,因此建议使用此规则限制容器内进程访问 umount 系统调用。|禁用 umount 系统调用|AppArmor<br />BPF
| | |禁止加载内核模块<br /><br />`disallow-insmod`|Privileged|攻击者可能会在特权容器中(**w/ CAP_SYS_MODULE**),通过执行内核模块加载命令 insmod,向内核中注入代码。|禁用 CAP_SYS_MODULE|AppArmor<br />BPF
| | |禁止加载 ebpf Program<br /><br />`disallow-load-ebpf`|ALL|攻击者可能会在特权容器中(**w/ CAP_SYS_ADMIN & CAP_BPF**),加载 ebpf Program 实现数据窃取和隐藏。<br /><br />注:CAP_BPF 自 Linux 5.8 引入。|禁用 CAP_SYS_ADMIN, CAP_BPF|AppArmor<br />BPF
| | |禁止加载除 BPF_PROG_TYPE_SOCKET_FILTER 和 BPF_PROG_TYPE_CGROUP_SKB 类型外的 eBPF 程序。<br /><br />`disallow-load-bpf-prog`, `disallow-load-ebpf`|ALL|攻击者可能会在特权容器中(**w/ CAP_SYS_ADMIN, CAP_BPF**),加载 eBPF 程序实现数据窃取和创建 rootkit 后门。<br /><br />在 Linux 5.8 之前,需要 CAP_SYS_ADMIN 才能加载除 BPF_PROG_TYPE_SOCKET_FILTER 和 BPF_PROG_TYPE_CGROUP_SKB 类型以外的 eBPF 程序。自 Linux 5.8 开始,需要 CAP_SYS_ADMIN 或 CAP_BPF 才能加载这些 eBPF 程序。与此同时,加载某些类型的 eBPF 程序还需要 CAP_NET_ADMIN 或 CAP_PERFMON。<br /><br />推荐您使用内置规则 `disallow-load-all-bpf-prog` 来禁止容器加载任意类型的 eBPF 程序,从而减少内核攻击面。|禁用 CAP_SYS_ADMIN CAP_BPF|AppArmor<br />BPF
| | |禁止访问进程文件系统的根目录<br /><br />`disallow-access-procfs-root`|ALL|本策略禁止容器内进程访问进程文件系统的根目录(即 /proc/[PID]/root),防止攻击者利用共享 pid ns 的进程进行攻击。<br /><br />攻击者可能会在共享了宿主机 pid ns、与其他容器共享 pid ns 的容器环境中,通过读写 /proc/*/root 来访问容器外的进程文件系统,实现信息泄露、权限提升、横向移动等攻击。|禁用 PTRACE_MODE_READ 权限|AppArmor<br />BPF
| | |禁止读取内核符号文件<br /><br />`disallow-access-kallsyms`|ALL|攻击者可能会在特权容器中(**w/ CAP_SYSLOG**),通过读取内核符号文件来获取内核模块地址。从而绕过 KASLR 防护,降低内核漏洞的难度与成本。|禁止读取 /proc/kallsyms 文件|AppArmor<br />BPF
| |禁用 capabilities|禁用所有 capabilities<br /><br />`disable-cap-all`|ALL|禁用所有 capabilities|-|AppArmor<br />BPF
| | |禁用除 net_bind_service 外的 capabilities<br /><br />`disable-cap-all-except-net-bind-service`|ALL|禁用除 net-bind-service 以外的 capabilities.<br /><br />此规则符合 Pod Security Standards 的 [*Restricted Policy*](https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted) 要求。|-|AppArmor<br />BPF
| | |禁用特权 capability<br /><br />`disable-cap-privileged`|ALL|禁用所有的特权 capabilities(可直接造成逃逸、影响宿主机可用性的 capabilities),仅允许运行时的[默认 capabilities](https://github.com/containerd/containerd/blob/release/1.7/oci/spec.go#L115)。<br /><br />此规则符合 Pod Security Standards 的 [*Baseline Policy*](https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted) 要求,但 net_raw capability 除外。|-|AppArmor<br />BPF
| | |禁用任意 capability<br /><br />`disable-cap-XXXX`|ALL|禁用任意指定的 capabilities,请将 XXXX 替换为 man capabilities 中的值,例如 disable-cap-net-raw|-|AppArmor<br />BPF
| |阻断部分内核漏洞利用向量|禁止滥用 User Namespace<br /><br />`disallow-abuse-user-ns`|ALL|User Namespace 可以被用于增强容器隔离性。但它的出现同时也增大了内核的攻击面,或使得某些内核漏洞更容易被利用。攻击者可以在容器内,通过创建 User Namespace 来获取全部特权,从而扩大内核攻击面。<br /><br />禁止容器进程通过 User Namesapce 滥用 CAP_SYS_ADMIN 特权可用于降低内核攻击面,阻断部分内核漏洞的利用路径。<br />在未设置 kernel.unprivileged_userns_clone=0 或 user.max_user_namespaces=0 的系统上,可通过此规则来为容器进行加固。|限制通过 User Namespace 滥用 CAP_SYS_ADMIN |AppArmor<br />BPF
| | |禁止创建 User Namespace<br /><br />`disallow-create-user-ns`|ALL|User Namespace 可以被用于增强容器隔离性。但它的出现同时也增大了内核的攻击面,或使得某些内核漏洞更容易被利用。攻击者可以在容器内,通过创建 User Namespace 来获取全部特权,从而扩大内核攻击面。<br /><br />禁止容器进程创建新的 User Namesapce 从而获取 CAP_SYS_ADMIN 特权可用于降低内核攻击面,阻断部分内核漏洞的利用路径。<br />在未设置 kernel.unprivileged_userns_clone=0 或 user.max_user_namespaces=0 的系统上,可通过此规则来为容器进行加固。|禁止创建 User Namespace|Seccomp
| | |禁止创建 User Namespace<br /><br />`disallow-create-user-ns`|ALL|User Namespace 可以被用于增强容器隔离性。但它的出现同时也增大了内核的攻击面,或使得某些内核漏洞更容易被利用。攻击者可以在容器内,通过创建 User Namespace 来获取全部特权,从而扩大内核攻击面。<br /><br />禁止容器进程创建新的 User Namesapce 从而获取 CAP_SYS_ADMIN 特权可用于降低内核攻击面,阻断部分内核漏洞的利用路径。<br />在未设置 kernel.unprivileged_userns_clone=0 或 user.max_user_namespaces=0 的系统上,可通过此规则来为容器进行加固。|禁止创建 User Namespace|Seccomp
| | |禁止加载任意类型的 eBPF 程序<br /><br />`disallow-load-all-bpf-prog`|ALL|攻击者无需任何特权就可以加载 BPF_PROG_TYPE_SOCKET_FILTER 或 BPF_PROG_TYPE_CGROUP_SKB 类型的 eBPF 程序。因此,攻击者可以尝试使用这些类型的 eBPF 程序进行网络数据包嗅探,或利用 eBPF 验证器和 JIT 引擎的漏洞实现容器逃逸。<br /><br />禁止容器进程加载 eBPF 程序可降低内核攻击面,阻断部分内核漏洞的利用路径。在未设置 kernel.unprivileged_bpf_disabled=0 的系统上,可通过此规则来加固容器。|禁止加载任意类型的 eBPF 程序|Seccomp
|**Attack Protection**|缓解信息泄露|缓解 ServiceAccount 泄露<br /><br />`mitigate-sa-leak`|ALL|此规则禁止容器进程读取 ServiceAccount 相关的敏感信息,包括 token、namespace、ca 证书。避免 default SA 泄漏、错误配置的 SA 泄漏带来的安全风险,攻击者通过 RCE 漏洞获取 k8s 容器内的权限后,常倾向于通过泄漏其 SA 信息来进行进一步的渗透入侵活动。<br /><br />在大部分用户场景中,并不需要使用 SA 与 API Server 进行通信。而默认情况下,k8s 会为不需要与 API Server 通信的 Pod 设置 default SA。|禁止 ServiceAccount 文件的读操作|AppArmor<br />BPF
| | |缓解宿主机磁盘设备号泄露<br /><br />`mitigate-disk-device-number-leak`|ALL|此规则禁止容器进程读取 /proc/[PID]/mountinfo, /proc/partitions。<br /><br />攻击者可能会通过读取容器进程的挂载信息来获取宿主机磁盘设备的设备号,从而用于后续的容器逃逸。|禁止 mountinfo, partitions 的读操作|AppArmor<br />BPF
| | |缓解容器 overlayfs 路径泄露<br /><br />`mitigate-overlayfs-leak`|ALL|禁止读取 /proc/mounts、/proc/[PID]/mounts、/proc/[PID]/mountinfo 文件。<br /><br />攻击者可能会通过获取容器进程的挂载信息来获取容器进程 rootfs 在宿主机中的 overlayfs 路径,从而用于后续的容器逃逸。|禁止 mounts, mountinfo 文件的读操作<br /><br />此规则可能会影响容器内 mount 命令的部分功能|AppArmor<br />BPF
Expand Down
5 changes: 3 additions & 2 deletions docs/guides/policy_advisor.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,13 @@ optional arguments:
* dind: The target application will create a docker in docker container.
* require-sa: The target application needs to interact with API Server.
* bind-privileged-socket-port: The target application needs to listen on a socket port less than 1024.
* load-bpf: The target application needs to load eBPF programs in the container.
For Example: "privileged-container,require-sa,bind-privileged-socket-port"
-c CAPABILITIES The capabilities required by the target application and its containers. Providing the capabilities
needed for the application explicitly helps generate more precise policy templates. For example,
before Linux 5.8, loading BPF programs required sys_admin capability. Since Linux 5.8, loading BPF
programs requires bpf, perfmon or net_admin capabilities. If your application needs to load BPF
before Linux 5.8, loading BPF programs requires sys_admin capability. Since Linux 5.8, loading BPF
programs requires sys_admin or bpf capabilities. If your application needs to load BPF
programs, please add both sys_admin and bpf, that is "sys_admin,bpf". See CAPABILITIES(7).
Available Values: CAPABILITIES(7) without 'CAP_' prefix (they should be combined with commas).
Expand Down
Loading

0 comments on commit 2f0932f

Please sign in to comment.