×

Overview

This topic describes the management of the overall cluster network, including project isolation and outbound traffic control.

Pod-level networking features, such as per-pod bandwidth limits, are discussed in Managing Pods.

Managing Pod Networks

When your cluster is configured to use the ovs-multitenant SDN plug-in, you can manage the separate pod overlay networks for projects using the administrator CLI. See the Configuring the SDN section for plug-in configuration steps, if necessary.

Joining Project Networks

To join projects to an existing project network:

$ oc adm pod-network join-projects --to=<project1> <project2> <project3>

In the above example, all the pods and services in <project2> and <project3> can now access any pods and services in <project1> and vice versa. Services can be accessed either by IP or fully-qualified DNS name (<service>.<pod_namespace>.svc.cluster.local). For example, to access a service named db in a project myproject, use db.myproject.svc.cluster.local.

Alternatively, instead of specifying specific project names, you can use the --selector=<project_selector> option.

To verify the networks you have joined together:

$ oc get netnamespaces

Then look at the NETID column. Projects in the same pod-network will have the same NetID.

Isolating Project Networks

To isolate the project network in the cluster and vice versa, run:

$ oc adm pod-network isolate-projects <project1> <project2>

In the above example, all of the pods and services in <project1> and <project2> can not access any pods and services from other non-global projects in the cluster and vice versa.

Alternatively, instead of specifying specific project names, you can use the --selector=<project_selector> option.

Making Project Networks Global

To allow projects to access all pods and services in the cluster and vice versa:

$ oc adm pod-network make-projects-global <project1> <project2>

In the above example, all the pods and services in <project1> and <project2> can now access any pods and services in the cluster and vice versa.

Alternatively, instead of specifying specific project names, you can use the --selector=<project_selector> option.

Disabling Host Name Collision Prevention For Routes and Ingress Objects

In OKD, host name collision prevention for routes and ingress objects is enabled by default. This means that users without the cluster-admin role can set the host name in a route or ingress object only on creation and cannot change it afterwards. However, you can relax this restriction on routes and ingress objects for some or all users.

Because OKD uses the object creation timestamp to determine the oldest route or ingress object for a given host name, a route or ingress object can hijack a host name of a newer route if the older route changes its host name, or if an ingress object is introduced.

As an OKD cluster administrator, you can edit the host name in a route even after creation. You can also create a role to allow specific users to do so:

$ oc create clusterrole route-editor --verb=update --resource=routes.route.openshift.io/custom-host

You can then bind the new role to a user:

$ oc adm policy add-cluster-role-to-user route-editor user

You can also disable host name collision prevention for ingress objects. Doing so lets users without the cluster-admin role edit a host name for ingress objects after creation. This is useful to OKD installations that depend upon Kubernetes behavior, including allowing the host names in ingress objects be edited.

  1. Add the following to the master.yaml file:

    admissionConfig:
      pluginConfig:
        openshift.io/IngressAdmission:
          configuration:
            apiVersion: v1
            allowHostnameChanges: true
            kind: IngressAdmissionConfig
          location: ""
  2. Restart the master services for the changes to take effect:

    $ master-restart api
    $ master-restart controllers

Controlling Egress Traffic

As a cluster administrator you can allocate a number of static IP addresses to a specific node at the host level. If an application developer needs a dedicated IP address for their application service, they can request one during the process they use to ask for firewall access. They can then deploy an egress router from the developer’s project, using a nodeSelector in the deployment configuration to ensure that the pod lands on the host with the pre-allocated static IP address.

The egress pod’s deployment declares one of the source IPs, the destination IP of the protected service, and a gateway IP to reach the destination. After the pod is deployed, you can create a service to access the egress router pod, then add that source IP to the corporate firewall. The developer then has access information to the egress router service that was created in their project, for example, service.project.cluster.domainname.com.

When the developer needs to access the external, firewalled service, they can call out to the egress router pod’s service (service.project.cluster.domainname.com) in their application (for example, the JDBC connection information) rather than the actual protected service URL.

You can also assign static IP addresses to projects, ensuring that all outgoing external connections from the specified project have recognizable origins. This is different from the default egress router, which is used to send traffic to specific destinations.

See the Enabling Fixed IPs for External Project Traffic section for more information.

As an OKD cluster administrator, you can control egress traffic in these ways:

Firewall

Using an egress firewall allows you to enforce the acceptable outbound traffic policies, so that specific endpoints or IP ranges (subnets) are the only acceptable targets for the dynamic endpoints (pods within OKD) to talk to.

Router

Using an egress router allows you to create identifiable services to send traffic to certain destinations, ensuring those external destinations treat traffic as though it were coming from a known source. This helps with security, because it allows you to secure an external database so that only specific pods in a namespace can talk to a service (the egress router), which proxies the traffic to your database.

iptables

In addition to the above OKD-internal solutions, it is also possible to create iptables rules that will be applied to outgoing traffic. These rules allow for more possibilities than the egress firewall, but cannot be limited to particular projects.

Using an Egress Firewall to Limit Access to External Resources

As an OKD cluster administrator, you can use egress firewall policy to limit the external IP addresses that some or all pods can access from within the cluster. Egress firewall policy supports the following scenarios:

  • A pod can only connect to internal hosts, and cannot initiate connections to the public Internet.

  • A pod can only connect to the public Internet, and cannot initiate connections to internal hosts that are outside the OKD cluster.

  • A pod cannot reach specified internal subnets or hosts that should be unreachable.

Egress policies can be set by specifying an IP address range in CIDR format or by specifying a DNS name. For example, you can allow <project_A> access to a specified IP range but deny the same access to <project_B>. Alternatively, you can restrict application developers from updating from (Python) pip mirrors, and force updates to only come from approved sources.

You must have the ovs-multitenant or ovs-networkpolicy plug-in enabled in order to limit pod access via egress policy.

If you are using the ovs-networkpolicy plug-in, egress policy is compatible with only one policy per project, and will not work with projects that share a network, such as global projects.

Project administrators can neither create EgressNetworkPolicy objects, nor edit the ones you create in their project. There are also several other restrictions on where EgressNetworkPolicy can be created:

  • The default project (and any other project that has been made global via oc adm pod-network make-projects-global) cannot have egress policy.

  • If you merge two projects together (via oc adm pod-network join-projects), then you cannot use egress policy in any of the joined projects.

  • No project may have more than one egress policy object.

Violating any of these restrictions results in broken egress policy for the project, and may cause all external network traffic to be dropped.

Use the oc command or the REST API to configure egress policy. You can use oc [create|replace|delete] to manipulate EgressNetworkPolicy objects. The api/swagger-spec/oapi-v1.json file has API-level details on how the objects actually work.

To configure egress policy:

  1. Navigate to the project you want to affect.

  2. Create a JSON file with the policy configuration you want to use, as in the following example:

    {
        "kind": "EgressNetworkPolicy",
        "apiVersion": "v1",
        "metadata": {
            "name": "default"
        },
        "spec": {
            "egress": [
                {
                    "type": "Allow",
                    "to": {
                        "cidrSelector": "1.2.3.0/24"
                    }
                },
                {
                    "type": "Allow",
                    "to": {
                        "dnsName": "www.foo.com"
                    }
                },
                {
                    "type": "Deny",
                    "to": {
                        "cidrSelector": "0.0.0.0/0"
                    }
                }
            ]
        }
    }

    When the example above is added to a project, it allows traffic to IP range 1.2.3.0/24 and domain name www.foo.com, but denies access to all other external IP addresses. Traffic to other pods is not affected because the policy only applies to external traffic.

    The rules in an EgressNetworkPolicy are checked in order, and the first one that matches takes effect. If the three rules in the above example were reversed, then traffic would not be allowed to 1.2.3.0/24 and www.foo.com because the 0.0.0.0/0 rule would be checked first, and it would match and deny all traffic.

    Domain name updates are polled based on the TTL (time to live) value of the domain returned by the local non-authoritative servers. The pod should also resolve the domain from the same local nameservers when necessary, otherwise the IP addresses for the domain perceived by the egress network policy controller and the pod will be different, and the egress network policy may not be enforced as expected. Since egress network policy controller and pod are asynchronously polling the same local nameserver, there could be a race condition where pod may get the updated IP before the egress controller. Due to this current limitation, domain name usage in EgressNetworkPolicy is only recommended for domains with infrequent IP address changes.

The egress firewall always allows pods access to the external interface of the node the pod is on for DNS resolution. If your DNS resolution is not handled by something on the local node, then you will need to add egress firewall rules allowing access to the DNS server’s IP addresses if you are using domain names in your pods.

  1. Use the JSON file to create an EgressNetworkPolicy object:

    $ oc create -f <policy>.json

Exposing services by creating routes will ignore EgressNetworkPolicy. Egress network policy service endpoint filtering is done at the node kubeproxy. When the router is involved, kubeproxy is bypassed and egress network policy enforcement is not applied. Administrators can prevent this bypass by limiting access to create routes.

Using an Egress Router to Allow External Resources to Recognize Pod Traffic

The OKD egress router runs a service that redirects traffic to a specified remote server, using a private source IP address that is not used for anything else. The service allows pods to talk to servers that are set up to only allow access from whitelisted IP addresses.

The egress router is not intended for every outgoing connection. Creating large numbers of egress routers can push the limits of your network hardware. For example, creating an egress router for every project or application could exceed the number of local MAC addresses that the network interface can handle before falling back to filtering MAC addresses in software.

Currently, the egress router is not compatible with Amazon AWS, Azure Cloud, or any other cloud platform that does not support layer 2 manipulations due to their incompatibility with macvlan traffic.

Deployment Considerations

The Egress router adds a second IP address and MAC address to the node’s primary network interface. If you are not running OKD on bare metal, you may need to configure your hypervisor or cloud provider to allow the additional address.

Red Hat OpenStack Platform

If you are deploying OKD on Red Hat OpenStack Platform, you need to whitelist the IP and MAC addresses on your OpenStack environment, otherwise communication will fail:

neutron port-update $neutron_port_uuid \
  --allowed_address_pairs list=true \
  type=dict mac_address=<mac_address>,ip_address=<ip_address>
Red Hat Enterprise Virtualization

If you are using Red Hat Enterprise Virtualization, you should set EnableMACAntiSpoofingFilterRules to false.

VMware vSphere

If you are using VMware vSphere, see the VMWare documentation for securing vSphere standard switches. View and change VMWare vSphere default settings by selecting the host’s virtual switch from the vSphere Web Client.

Specifically, ensure that the following are enabled:

Egress Router Modes

The egress router can run in three different modes: redirect mode, HTTP proxy mode and DNS proxy mode. Redirect mode works for all services except for HTTP and HTTPS. For HTTP and HTTPS services, use HTTP proxy mode. For TCP-based services with IP addresses or domain names, use DNS proxy mode.

Deploying an Egress Router Pod in Redirect Mode

In redirect mode, the egress router sets up iptables rules to redirect traffic from its own IP address to one or more destination IP addresses. Client pods that want to make use of the reserved source IP address must be modified to connect to the egress router rather than connecting directly to the destination IP.

  1. Create a pod configuration using the following:

    apiVersion: v1
    kind: Pod
    metadata:
      name: egress-1
      labels:
        name: egress-1
      annotations:
        pod.network.openshift.io/assign-macvlan: "true" (1)
    spec:
      initContainers:
      - name: egress-router
        image: openshift/origin-egress-router
        securityContext:
          privileged: true
        env:
        - name: EGRESS_SOURCE (2)
          value: 192.168.12.99/24
        - name: EGRESS_GATEWAY (3)
          value: 192.168.12.1
        - name: EGRESS_DESTINATION (4)
          value: 203.0.113.25
        - name: EGRESS_ROUTER_MODE (5)
          value: init
      containers:
      - name: egress-router-wait
        image: openshift/origin-pod
      nodeSelector:
        site: springfield-1 (6)
    1 Creates a Macvlan network interface on the primary network interface, and moves it into the pod’s network project before starting the egress-router container. Preserve the quotation marks around "true". Omitting them results in errors. To create the Macvlan interface on a network interface other than the primary one, set the annotation value to the name of that interface. For example, eth1.
    2 IP address from the physical network that the node is on and is reserved by the cluster administrator for use by this pod. Optionally, you can include the subnet length, the /24 suffix, so that a proper route to the local subnet can be set up. If you do not specify a subnet length, then the egress router can access only the host specified with the EGRESS_GATEWAY variable and no other hosts on the subnet.
    3 Same value as the default gateway used by the node.
    4 The external server to direct traffic to. Using this example, connections to the pod are redirected to 203.0.113.25, with a source IP address of 192.168.12.99.
    5 This tells the egress router image that it is being deployed as an "init container". Previous versions of OKD (and the egress router image) did not support this mode and had to be run as an ordinary container.
    6 The pod is only deployed to nodes with the label site=springfield-1.
  2. Create the pod using the above definition:

    $ oc create -f <pod_name>.json

    To check to see if the pod has been created:

    $ oc get pod <pod_name>
  3. Ensure other pods can find the pod’s IP address by creating a service to point to the egress router:

    apiVersion: v1
    kind: Service
    metadata:
      name: egress-1
    spec:
      ports:
      - name: http
        port: 80
      - name: https
        port: 443
      type: ClusterIP
      selector:
        name: egress-1

    Your pods can now connect to this service. Their connections are redirected to the corresponding ports on the external server, using the reserved egress IP address.

The egress router setup is performed by an "init container" created from the openshift/origin-egress-router image, and that container is run privileged so that it can configure the Macvlan interface and set up iptables rules. After it finishes setting up the iptables rules, it exits and the openshift/origin-pod container will run (doing nothing) until the pod is killed.

The environment variables tell the egress-router image what addresses to use; it will configure the Macvlan interface to use EGRESS_SOURCE as its IP address, with EGRESS_GATEWAY as its gateway.

NAT rules are set up so that connections to any TCP or UDP port on the pod’s cluster IP address are redirected to the same port on EGRESS_DESTINATION.

If only some of the nodes in your cluster are capable of claiming the specified source IP address and using the specified gateway, you can specify a nodeName or nodeSelector indicating which nodes are acceptable.

Redirecting to Multiple Destinations

In the previous example, connections to the egress pod (or its corresponding service) on any port are redirected to a single destination IP. You can also configure different destination IPs depending on the port:

apiVersion: v1
kind: Pod
metadata:
  name: egress-multi
  labels:
    name: egress-multi
  annotations:
    pod.network.openshift.io/assign-macvlan: "true"
spec:
  initContainers:
  - name: egress-router
    image: openshift/origin-egress-router
    securityContext:
      privileged: true
    env:
    - name: EGRESS_SOURCE (1)
      value: 192.168.12.99/24
    - name: EGRESS_GATEWAY
      value: 192.168.12.1
    - name: EGRESS_DESTINATION (2)
      value: |
        80   tcp 203.0.113.25
        8080 tcp 203.0.113.26 80
        8443 tcp 203.0.113.26 443
        203.0.113.27
    - name: EGRESS_ROUTER_MODE
      value: init
  containers:
  - name: egress-router-wait
    image: openshift/origin-pod
1 IP address from the physical network that the node is on and is reserved by the cluster administrator for use by this pod. Optionally, you can include the subnet length, the /24 suffix, so that a proper route to the local subnet can be set up. If you do not specify a subnet length, then the egress router can access only the host specified with the EGRESS_GATEWAY variable and no other hosts on the subnet.
2 EGRESS_DESTINATION uses YAML syntax for its values, and can be a multi-line string. See the following for more information.

Each line of EGRESS_DESTINATION can be one of three types:

  • <port> <protocol> <IP_address> - This says that incoming connections to the given <port> should be redirected to the same port on the given <IP_address>. <protocol> is either tcp or udp. In the example above, the first line redirects traffic from local port 80 to port 80 on 203.0.113.25.

  • <port> <protocol> <IP_address> <remote_port> - As above, except that the connection is redirected to a different <remote_port> on <IP_address>. In the example above, the second and third lines redirect local ports 8080 and 8443 to remote ports 80 and 443 on 203.0.113.26.

  • <fallback_IP_address> - If the last line of EGRESS_DESTINATION is a single IP address, then any connections on any other port will be redirected to the corresponding port on that IP address (eg, 203.0.113.27 in the example above). If there is no fallback IP address then connections on other ports would simply be rejected.)

Using a ConfigMap to specify EGRESS_DESTINATION

For a large or frequently-changing set of destination mappings, you can use a ConfigMap to externally maintain the list, and have the egress router pod read it from there. This comes with the advantage of project administrators being able to edit the ConfigMap, whereas they may not be able to edit the Pod definition directly, because it contains a privileged container.

  1. Create a file containing the EGRESS_DESTINATION data:

    $ cat my-egress-destination.txt
    # Egress routes for Project "Test", version 3
    
    80   tcp 203.0.113.25
    
    8080 tcp 203.0.113.26 80
    8443 tcp 203.0.113.26 443
    
    # Fallback
    203.0.113.27

    Note that you can put blank lines and comments into this file

  2. Create a ConfigMap object from the file:

    $ oc delete configmap egress-routes --ignore-not-found
    $ oc create configmap egress-routes \
      --from-file=destination=my-egress-destination.txt

    Here egress-routes is the name of the ConfigMap object being created and my-egress-destination.txt is the name of the file the data is being read from.

  3. Create a egress router pod definition as above, but specifying the ConfigMap for EGRESS_DESTINATION in the environment section:

        ...
        env:
        - name: EGRESS_SOURCE (1)
          value: 192.168.12.99/24
        - name: EGRESS_GATEWAY
          value: 192.168.12.1
        - name: EGRESS_DESTINATION
          valueFrom:
            configMapKeyRef:
              name: egress-routes
              key: destination
        - name: EGRESS_ROUTER_MODE
          value: init
        ...
    1 IP address from the physical network that the node is on and is reserved by the cluster administrator for use by this pod. Optionally, you can include the subnet length, the /24 suffix, so that a proper route to the local subnet can be set up. If you do not specify a subnet length, then the egress router can access only the host specified with the EGRESS_GATEWAY variable and no other hosts on the subnet.

The egress router does not automatically update when the ConfigMap changes. Restart the pod to get updates.

Deploying an Egress Router HTTP Proxy Pod

In HTTP proxy mode, the egress router runs as an HTTP proxy on port 8080. This only works for clients talking to HTTP or HTTPS-based services, but usually requires fewer changes to the client pods to get them to work. Programs can be told to use an HTTP proxy by setting an environment variable.

  1. Create the pod using the following as an example:

    apiVersion: v1
    kind: Pod
    metadata:
      name: egress-http-proxy
      labels:
        name: egress-http-proxy
      annotations:
        pod.network.openshift.io/assign-macvlan: "true" (1)
    spec:
      initContainers:
      - name: egress-router-setup
        image: openshift/origin-egress-router
        securityContext:
          privileged: true
        env:
        - name: EGRESS_SOURCE (2)
          value: 192.168.12.99/24
        - name: EGRESS_GATEWAY (3)
          value: 192.168.12.1
        - name: EGRESS_ROUTER_MODE (4)
          value: http-proxy
      containers:
      - name: egress-router-proxy
        image: openshift/origin-egress-http-proxy
        env:
        - name: EGRESS_HTTP_PROXY_DESTINATION (5)
          value: |
            !*.example.com
            !192.168.1.0/24
            *
    1 Creates a Macvlan network interface on the primary network interface, then moves it into the pod’s network project before starting the egress-router container. Preserve the quotation marks around "true". Omitting them results in errors.
    2 IP address from the physical network that the node is on and is reserved by the cluster administrator for use by this pod. Optionally, you can include the subnet length, the /24 suffix, so that a proper route to the local subnet can be set up. If you do not specify a subnet length, then the egress router can access only the host specified with the EGRESS_GATEWAY variable and no other hosts on the subnet.
    3 Same value as the default gateway used by the node itself.
    4 This tells the egress router image that it is being deployed as part of an HTTP proxy, and so it should not set up iptables redirecting rules.
    5 A string or YAML multi-line string specifying how to configure the proxy. Note that this is specified as an environment variable in the HTTP proxy container, not with the other environment variables in the init container.

    You can specify any of the following for the EGRESS_HTTP_PROXY_DESTINATION value. You can also use *, meaning "allow connections to all remote destinations". Each line in the configuration specifies one group of connections to allow or deny:

    • An IP address (eg, 192.168.1.1) allows connections to that IP address.

    • A CIDR range (eg, 192.168.1.0/24) allows connections to that CIDR range.

    • A host name (eg, www.example.com) allows proxying to that host.

    • A domain name preceded by *. (eg, *.example.com) allows proxying to that domain and all of its subdomains.

    • A ! followed by any of the above denies connections rather than allowing them

    • If the last line is *, then anything that hasn’t been denied will be allowed. Otherwise, anything that hasn’t been allowed will be denied.

  2. Ensure other pods can find the pod’s IP address by creating a service to point to the egress router:

    apiVersion: v1
    kind: Service
    metadata:
      name: egress-1
    spec:
      ports:
      - name: http-proxy
        port: 8080 (1)
      type: ClusterIP
      selector:
        name: egress-1
    1 Ensure the http port is always set to 8080.
  3. Configure the client pod (not the egress proxy pod) to use the HTTP proxy by setting the http_proxy or https_proxy variables:

        ...
        env:
        - name: http_proxy
          value: http://egress-1:8080/ (1)
        - name: https_proxy
          value: http://egress-1:8080/
        ...
    1 The service created in step 2.

    Using the http_proxy and https_proxy environment variables is not necessary for all setups. If the above does not create a working setup, then consult the documentation for the tool or software you are running in the pod.

You can also specify the EGRESS_HTTP_PROXY_DESTINATION using a ConfigMap, similarly to the redirecting egress router example above.

Deploying an Egress Router DNS Proxy Pod

In DNS proxy mode, the egress router runs as a DNS proxy for TCP-based services from its own IP address to one or more destination IP addresses. Client pods that want to make use of the reserved, source IP address must be modified to connect to the egress router rather than connecting directly to the destination IP. This ensures that external destinations treat traffic as though it were coming from a known source.

  1. Create the pod using the following as an example:

    apiVersion: v1
    kind: Pod
    metadata:
      name: egress-dns-proxy
      labels:
        name: egress-dns-proxy
      annotations:
        pod.network.openshift.io/assign-macvlan: "true" (1)
    spec:
      initContainers:
      - name: egress-router-setup
        image: openshift/origin-egress-router
        securityContext:
          privileged: true
        env:
        - name: EGRESS_SOURCE (2)
          value: 192.168.12.99/24
        - name: EGRESS_GATEWAY (3)
          value: 192.168.12.1
        - name: EGRESS_ROUTER_MODE (4)
          value: dns-proxy
      containers:
      - name: egress-dns-proxy
        image: openshift/origin-egress-dns-proxy
        env:
        - name: EGRESS_DNS_PROXY_DEBUG (5)
          value: "1"
        - name: EGRESS_DNS_PROXY_DESTINATION (6)
          value: |
            # Egress routes for Project "Foo", version 5
    
            80  203.0.113.25
    
            100 example.com
    
            8080 203.0.113.26 80
    
            8443 foobar.com 443
    1 Using pod.network.openshift.io/assign-macvlan annotation creates a Macvlan network interface on the primary network interface, then moves it into the pod’s network name space before starting the egress-router-setup container. Preserve the quotation marks around "true". Omitting them results in errors.
    2 IP address from the physical network that the node is on and is reserved by the cluster administrator for use by this pod. Optionally, you can include the subnet length, the /24 suffix, so that a proper route to the local subnet can be set up. If you do not specify a subnet length, then the egress router can access only the host specified with the EGRESS_GATEWAY variable and no other hosts on the subnet.
    3 Same value as the default gateway used by the node itself.
    4 This tells the egress router image that it is being deployed as part of a DNS proxy, and so it should not set up iptables redirecting rules.
    5 Optional. Setting this variable will display DNS proxy log output on stdout.
    6 This uses the YAML syntax for a multi-line string. See below for details.

    Each line of EGRESS_DNS_PROXY_DESTINATION can be set in one of two ways:

    • <port> <remote_address> - This says that incoming connections to the given <port> should be proxied to the same TCP port on the given <remote_address>. <remote_address> can be an IP address or DNS name. In case of DNS name, DNS resolution is done at runtime. In the example above, the first line proxies TCP traffic from local port 80 to port 80 on 203.0.113.25. The second line proxies TCP traffic from local port 100 to port 100 on example.com.

    • <port> <remote_address> <remote_port> - As above, except that the connection is proxied to a different <remote_port> on <remote_address>. In the example above, the third line proxies local port 8080 to remote port 80 on 203.0.113.26 and the fourth line proxies local port 8443 to remote port 443 on foobar.com.

  2. Ensure other pods can find the pod’s IP address by creating a service to point to the egress router:

    apiVersion: v1
    kind: Service
    metadata:
      name: egress-dns-svc
    spec:
      ports:
      - name: con1
        protocol: TCP
        port: 80
        targetPort: 80
      - name: con2
        protocol: TCP
        port: 100
        targetPort: 100
      - name: con3
        protocol: TCP
        port: 8080
        targetPort: 8080
      - name: con4
        protocol: TCP
        port: 8443
        targetPort: 8443
      type: ClusterIP
      selector:
        name: egress-dns-proxy

    Pods can now connect to this service. Their connections are proxied to the corresponding ports on the external server, using the reserved egress IP address.

You can also specify the EGRESS_DNS_PROXY_DESTINATION using a ConfigMap, similarly to the redirecting egress router example above.

Enabling Failover for Egress Router Pods

Using a replication controller, you can ensure that there is always one copy of the egress router pod in order to prevent downtime.

  1. Create a replication controller configuration file using the following:

    apiVersion: v1
    kind: ReplicationController
    metadata:
      name: egress-demo-controller
    spec:
      replicas: 1 (1)
      selector:
        name: egress-demo
      template:
        metadata:
          name: egress-demo
          labels:
            name: egress-demo
          annotations:
            pod.network.openshift.io/assign-macvlan: "true"
        spec:
          initContainers:
          - name: egress-demo-init
            image: openshift/origin-egress-router
            env:
            - name: EGRESS_SOURCE (2)
              value: 192.168.12.99/24
            - name: EGRESS_GATEWAY
              value: 192.168.12.1
            - name: EGRESS_DESTINATION
              value: 203.0.113.25
            - name: EGRESS_ROUTER_MODE
              value: init
            securityContext:
              privileged: true
          containers:
          - name: egress-demo-wait
            image: openshift/origin-pod
          nodeSelector:
            site: springfield-1
    1 Ensure replicas is set to 1, because only one pod can be using a given EGRESS_SOURCE value at any time. This means that only a single copy of the router will be running, on a node with the label site=springfield-1.
    2 IP address from the physical network that the node is on and is reserved by the cluster administrator for use by this pod. Optionally, you can include the subnet length, the /24 suffix, so that a proper route to the local subnet can be set up. If you do not specify a subnet length, then the egress router can access only the host specified with the EGRESS_GATEWAY variable and no other hosts on the subnet.
  2. Create the pod using the definition:

    $ oc create -f <replication_controller>.json
  3. To verify, check to see if the replication controller pod has been created:

    $ oc describe rc <replication_controller>

Using iptables Rules to Limit Access to External Resources

Some cluster administrators may want to perform actions on outgoing traffic that do not fit within the model of EgressNetworkPolicy or the egress router. In some cases, this can be done by creating iptables rules directly.

For example, you could create rules that log traffic to particular destinations, or to prevent more than a certain number of outgoing connections per second.

OKD does not provide a way to add custom iptables rules automatically, but it does provide a place where such rules can be added manually by the administrator. Each node, on startup, will create an empty chain called OPENSHIFT-ADMIN-OUTPUT-RULES in the filter table (assuming that the chain does not already exist). Any rules added to that chain by an administrator will be applied to all traffic going from a pod to a destination outside the cluster (and not to any other traffic).

There are a few things to watch out for when using this functionality:

  1. It is up to you to ensure that rules get created on each node; OKD does not provide any way to make that happen automatically.

  2. The rules are not applied to traffic that exits the cluster via an egress router, and they run after EgressNetworkPolicy rules are applied (and so will not see traffic that is denied by an EgressNetworkPolicy).

  3. The handling of connections from pods to nodes or pods to the master is complicated, because nodes have both "external" IP addresses and "internal" SDN IP addresses. Thus, some pod-to-node/master traffic may pass through this chain, but other pod-to-node/master traffic may bypass it.

Enabling Static IPs for External Project Traffic

As a cluster administrator, you can assign specific, static IP addresses to projects, so that traffic is externally easily recognizable. This is different from the default egress router, which is used to send traffic to specific destinations.

Recognizable IP traffic increases cluster security by ensuring the origin is visible. Once enabled, all outgoing external connections from the specified project will share the same, fixed source IP, meaning that any external resources can recognize the traffic.

Unlike the egress router, this is subject to EgressNetworkPolicy firewall rules.

To enable static source IPs:

  1. Update the NetNamespace with the desired IP:

    $ oc patch netnamespace <project_name> -p '{"egressIPs": ["<IP_address>"]}'

    For example, to assign the MyProject project to an IP address of 192.168.1.100:

    $ oc patch netnamespace MyProject -p '{"egressIPs": ["192.168.1.100"]}'

    The egressIPs field is an array. While in earlier releases it could only contain a single IP address, as of OKD version 3.10 egressIPs can be set to two or more IP addresses on different nodes to provide high availability. If multiple egress IP addresses are set, pods use the first IP in the list for egress, but if the node hosting that IP address fails, pods will switch to using the next IP in the list after a short delay.

  2. Manually assign the egress IP to the desired node hosts. Set the egressIPs field on the HostSubnet object on the node host. Include as many IPs as you want to assign to that node host:

    $ oc patch hostsubnet <node_name> -p \
      '{"egressIPs": ["<IP_address_1>", "<IP_address_2>"]}'

    For example, to say that node1 should have the egress IPs 192.168.1.100, 192.168.1.101, and 192.168.1.102:

    $ oc patch hostsubnet node1 -p \
      '{"egressIPs": ["192.168.1.100", "192.168.1.101", "192.168.1.102"]}'

    Egress IPs are implemented as additional IP addresses on the primary network interface, and must be in the same subnet as the node’s primary IP. Additionally, any external IPs should not be configured in any Linux network configuration files, such as ifcfg-eth0.

    Allowing additional IP addresses on the primary network interface might require extra configuration when using some cloud or VM solutions.

If the above is enabled for a project, all egress traffic from that project will be routed to the node hosting that egress IP, then connected (using NAT) to that IP address. If egressIPs is set on a NetNamespace, but there is no node hosting that egress IP, then egress traffic from the namespace will be dropped.

Enabling Multicast

At this time, multicast is best used for low bandwidth coordination or service discovery and not a high-bandwidth solution.

Multicast traffic between OKD pods is disabled by default. If you are using the ovs-multitenant or ovs-networkpolicy plugin, you can enable multicast on a per-project basis by setting an annotation on the project’s corresponding netnamespace object:

$ oc annotate netnamespace <namespace> \
    netnamespace.network.openshift.io/multicast-enabled=true

Disable multicast by removing the annotation:

$ oc annotate netnamespace <namespace> \
    netnamespace.network.openshift.io/multicast-enabled-

When using the ovs-multitenant plugin:

  1. In an isolated project, multicast packets sent by a pod will be delivered to all other pods in the project.

  2. If you have joined networks together, you will need to enable multicast in each project’s netnamespace in order for it to take effect in any of the projects. Multicast packets sent by a pod in a joined network will be delivered to all pods in all of the joined-together networks.

  3. To enable multicast in the default project, you must also enable it in the kube-service-catalog project and all other projects that have been made global. Global projects are not "global" for purposes of multicast; multicast packets sent by a pod in a global project will only be delivered to pods in other global projects, not to all pods in all projects. Likewise, pods in global projects will only receive multicast packets sent from pods in other global projects, not from all pods in all projects.

When using the ovs-networkpolicy plugin:

  1. Multicast packets sent by a pod will be delivered to all other pods in the project, regardless of NetworkPolicy objects. (Pods may be able to communicate over multicast even when they can’t communicate over unicast.)

  2. Multicast packets sent by a pod in one project will never be delivered to pods in any other project, even if there are NetworkPolicy objects allowing communication between the to projects.

Enabling NetworkPolicy

The ovs-subnet and ovs-multitenant plug-ins have their own legacy models of network isolation and do not support Kubernetes NetworkPolicy. However, NetworkPolicy support is available by using the ovs-networkpolicy plug-in.

The v1 NetworkPolicy features are available only in OKD. This means that egress policy types, IPBlock, and combining podSelector and namespaceSelector are not available in OKD.

Do not apply NetworkPolicy features on default OKD projects, because they can disrupt communication with the cluster.

In a cluster configured to use the ovs-networkpolicy plug-in, network isolation is controlled entirely by NetworkPolicy objects. By default, all pods in a project are accessible from other pods and network endpoints. To isolate one or more pods in a project, you can create NetworkPolicy objects in that project to indicate the allowed incoming connections. Project administrators can create and delete NetworkPolicy objects within their own project.

Pods that do not have NetworkPolicy objects pointing to them are fully accessible, whereas, pods that have one or more NetworkPolicy objects pointing to them are isolated. These isolated pods only accept connections that are accepted by at least one of their NetworkPolicy objects.

Following are a few sample NetworkPolicy object definitions supporting different scenarios:

  • Deny All Traffic

    To make a project "deny by default" add a NetworkPolicy object that matches all pods but accepts no traffic.

    kind: NetworkPolicy
    apiVersion: networking.k8s.io/v1
    metadata:
      name: deny-by-default
    spec:
      podSelector:
      ingress: []
  • Only Accept connections from pods within project

    To make pods accept connections from other pods in the same project, but reject all other connections from pods in other projects:

    kind: NetworkPolicy
    apiVersion: networking.k8s.io/v1
    metadata:
      name: allow-same-namespace
    spec:
      podSelector:
      ingress:
      - from:
        - podSelector: {}
  • Only allow HTTP and HTTPS traffic based on pod labels

    To enable only HTTP and HTTPS access to the pods with a specific label (role=frontend in following example), add a NetworkPolicy object similar to:

    kind: NetworkPolicy
    apiVersion: networking.k8s.io/v1
    metadata:
      name: allow-http-and-https
    spec:
      podSelector:
        matchLabels:
          role: frontend
      ingress:
      - ports:
        - protocol: TCP
          port: 80
        - protocol: TCP
          port: 443

NetworkPolicy objects are additive, which means you can combine multiple NetworkPolicy objects together to satisfy complex network requirements.

For example, for the NetworkPolicy objects defined in previous samples, you can define both allow-same-namespace and allow-http-and-https policies within the same project. Thus allowing the pods with the label role=frontend, to accept any connection allowed by each policy. That is, connections on any port from pods in the same namespace, and connections on ports 80 and 443 from pods in any namespace.

Using NetworkPolicy Efficiently

NetworkPolicy objects allow you to isolate pods that are differentiated from one another by labels, within a namespace.

It is inefficient to apply NetworkPolicy objects to large numbers of individual pods in a single namespace. Pod labels do not exist at the IP level, so NetworkPolicy objects generate a separate OVS flow rule for every single possible link between every pod selected with podSelector.

For example, if the spec podSelector and the ingress podSelector within a NetworkPolicy object each match 200 pods, then 40000 (200*200) OVS flow rules are generated. This might slow down the machine.

To reduce the amount of OVS flow rules, use namespaces to contain groups of pods that need to be isolated.

NetworkPolicy objects that select a whole namespace, by using namespaceSelectors or empty podSelectors, only generate a single OVS flow rule that matches the VXLAN VNID of the namespace.

Keep the pods that do not need to be isolated in their original namespace, and move the pods that require isolation into one or more different namespaces.

Create additional targeted cross-namespace policies to allow the specific traffic that you do want to allow from the isolated pods.

NetworkPolicy and Routers

When using the ovs-multitenant plug-in, traffic from the routers is automatically allowed into all namespaces. This is because the routers are usually in the default namespace, and all namespaces allow connections from pods in that namespace. With the ovs-networkpolicy plug-in, this does not happen automatically. Therefore, if you have a policy that isolates a namespace by default, you need to take additional steps to allow routers to access it.

One option is to create a policy for each service, allowing access from all sources. for example,

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-to-database-service
spec:
  podSelector:
    matchLabels:
      role: database
  ingress:
  - ports:
    - protocol: TCP
      port: 5432

This allows routers to access the service, but will also allow pods in other users' namespaces to access it as well. This should not cause any issues, as those pods can normally access the service by using the public router.

Alternatively, you can create a policy allowing full access from the default namespace, as in the ovs-multitenant plug-in:

  1. Add a label to the default namespace.

    If you labeled the default project with the default label in a previous procedure, then skip this step. The cluster administrator role is required to add labels to namespaces.

    $ oc label namespace default name=default
  2. Create policies allowing connections from that namespace.

    Perform this step for each namespace you want to allow connections into. Users with the Project Administrator role can create policies.

    kind: NetworkPolicy
    apiVersion: networking.k8s.io/v1
    metadata:
      name: allow-from-default-namespace
    spec:
      podSelector:
      ingress:
      - from:
        - namespaceSelector:
            matchLabels:
              name: default

Setting a Default NetworkPolicy for New Projects

The cluster administrators can modify the default project template to enable automatic creation of default NetworkPolicy objects (one or more), whenever a new project is created. To do this:

  1. Create a custom project template and configure the master to use it, as described in Modifying the Template for New Projects.

  2. Label the default project with the default label:

    If you labeled the default project with the default label in a previous procedure, then skip this step. The cluster administrator role is required to add labels to namespaces.

    $ oc label namespace default name=default
  3. Edit the template to include the desired NetworkPolicy objects:

    $ oc edit template project-request -n default

    To include NetworkPolicy objects into existing template, use the oc edit command. Currently, it is not possible to use oc patch to add objects to a Template resource.

    1. Add each default policy as an element in the objects array:

      objects:
      ...
      - apiVersion: networking.k8s.io/v1
        kind: NetworkPolicy
        metadata:
          name: allow-from-same-namespace
        spec:
          podSelector:
          ingress:
          - from:
            - podSelector: {}
      - apiVersion: networking.k8s.io/v1
        kind: NetworkPolicy
        metadata:
          name: allow-from-default-namespace
        spec:
          podSelector:
          ingress:
          - from:
            - namespaceSelector:
                matchLabels:
                  name: default
      ...

Enabling HTTP Strict Transport Security

HTTP Strict Transport Security (HSTS) policy is a security enhancement, which ensures that only HTTPS traffic is allowed on the host. Any HTTP requests are dropped by default. This is useful for ensuring secure interactions with websites, or to offer a secure application for the user’s benefit.

When HSTS is enabled, HSTS adds a Strict Transport Security header to HTTPS responses from the site. You can use the insecureEdgeTerminationPolicy value in a route to redirect to send HTTP to HTTPS. However, when HSTS is enabled, the client changes all requests from the HTTP URL to HTTPS before the request is sent, eliminating the need for a redirect. This is not required to be supported by the client, and can be disabled by setting max-age=0.

HSTS works only with secure routes (either edge terminated or re-encrypt). The configuration is ineffective on HTTP or passthrough routes.

To enable HSTS to a route, add the haproxy.router.openshift.io/hsts_header value to the edge terminated or re-encrypt route:

apiVersion: v1
kind: Route
metadata:
  annotations:
    haproxy.router.openshift.io/hsts_header: max-age=31536000;includeSubDomains;preload

Ensure there are no spaces and no other values in the parameters in the haproxy.router.openshift.io/hsts_header value. Only max-age is required.

The required max-age parameter indicates the length of time, in seconds, the HSTS policy is in effect for. The client updates max-age whenever a response with a HSTS header is received from the host. When max-age times out, the client discards the policy.

The optional includeSubDomains parameter tells the client that all subdomains of the host are to be treated the same as the host.

If max-age is greater than 0, the optional preload parameter allows external services to include this site in their HSTS preload lists. For example, sites such as Google can construct a list of sites that have preload set. Browsers can then use these lists to determine which sites to only talk to over HTTPS, even before they have interacted with the site. Without preload set, they need to have talked to the site over HTTPS to get the header.

Troubleshooting Throughput Issues

Sometimes applications deployed through OKD can cause network throughput issues such as unusually high latency between specific services.

Use the following methods to analyze performance issues if pod logs do not reveal any cause of the problem:

  • Use a packet analyzer, such as ping or tcpdump to analyze traffic between a pod and its node.

    For example, run the tcpdump tool on each pod while reproducing the behavior that led to the issue. Review the captures on both sides to compare send and receive timestamps to analyze the latency of traffic to/from a pod. Latency can occur in OKD if a node interface is overloaded with traffic from other pods, storage devices, or the data plane.

    $ tcpdump -s 0 -i any -w /tmp/dump.pcap host <podip 1> && host <podip 2> (1)
    1 podip is the IP address for the pod. Run the following command to get the IP address of the pods:
    # oc get pod <podname> -o wide

    tcpdump generates a file at /tmp/dump.pcap containing all traffic between these two pods. Ideally, run the analyzer shortly before the issue is reproduced and stop the analyzer shortly after the issue is finished reproducing to minimize the size of the file. You can also run a packet analyzer between the nodes (eliminating the SDN from the equation) with:

    # tcpdump -s 0 -i any -w /tmp/dump.pcap port 4789
  • Use a bandwidth measuring tool, such as iperf, to measure streaming throughput and UDP throughput. Run the tool from the pods first, then from the nodes to attempt to locate any bottlenecks. The iperf3 tool is included as part of RHEL 7.