×

Overview

The OpenShift SDN enables communication between pods across the OKD cluster, establishing a pod network. Three SDN plug-ins are currently available (ovs-subnet, ovs-multitenant, and ovs-networkpolicy), which provide different methods for configuring the pod network.

Available SDN Providers

The upstream Kubernetes project does not come with a default network solution. Instead, Kubernetes has developed a Container Network Interface (CNI) to allow network providers for integration with their own SDN solutions.

There are several OpenShift SDN plug-ins available out of the box from Red Hat, as well as third-party plug-ins.

Red Hat has worked with a number of SDN providers to certify their SDN network solution on OKD via the Kubernetes CNI interface, including a support process for their SDN plug-in through their product’s entitlement process. Should you open a support case with OpenShift, Red Hat can facilitate an exchange process so that both companies are involved in meeting your needs.

The following SDN solutions are validated and supported on OKD directly by the third-party vendor:

  • Cisco ACI (™)

  • Juniper Contrail (™)

  • Nokia Nuage (™)

  • Tigera Calico (™)

  • VMware NSX-T (™)

Installing VMware NSX-T (™) on OKD

VMware NSX-T (™) provides an SDN and security infrastructure to build cloud-native application environments. In addition to vSphere hypervisors (ESX), these environments include KVM and native public clouds.

The current integration requires a new install of both NSX-T and OKD. Currently, NSX-T version 2.4 is supported, and only supports the use of ESXi and KVM hypervisors at this time.

Configuring the Pod Network with Ansible

For initial cluster installations, the ovs-subnet plug-in is installed and configured by default, though it can be overridden during installation using the os_sdn_network_plugin_name parameter, which is configurable in the Ansible inventory file.

For example, to override the standard ovs-subnet plug-in and use the ovs-multitenant plug-in instead:

# Configure the multi-tenant SDN plugin (default is 'redhat/openshift-ovs-subnet')
os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'

See Configuring Cluster Variables for descriptions of networking-related Ansible variables that can be set in your inventory file.

Configuring the Pod Network on Masters

The cluster administrators can control pod network settings on master hosts by modifying parameters in the networkConfig section of the master configuration file (located at /etc/origin/master/master-config.yaml by default):

Configuring a pod network for a single CIDR
networkConfig:
  clusterNetworks:
  - cidr: 10.128.0.0/14 (1)
    hostSubnetLength: 9 (2)
  networkPluginName: "redhat/openshift-ovs-subnet" (3)
  serviceNetworkCIDR: 172.30.0.0/16 (4)
1 Cluster network for node IP allocation
2 Number of bits for pod IP allocation within a node
3 Set to redhat/openshift-ovs-subnet for the ovs-subnet plug-in, redhat/openshift-ovs-multitenant for the ovs-multitenant plug-in, or redhat/openshift-ovs-networkpolicy for the ovs-networkpolicy plug-in
4 Service IP allocation for the cluster

Alternatively, you can create a pod network with multiple CIDR ranges by adding separate ranges into the clusterNetworks field with the range and the hostSubnetLength.

Multiple ranges can be used at once, and the range can be expanded or contracted. Nodes can be moved from one range to another by evacuating a node, then deleting and re-creating the node. See the Managing Nodes section for more information. Node allocations occur in the order listed, then when the range is full, move to the next on the list.

Configuring a pod network for multiple CIDRs
networkConfig:
  clusterNetworks:
  - cidr: 10.128.0.0/14 (1)
    hostSubnetLength: 9 (2)
  - cidr: 10.132.0.0/14
    hostSubnetLength: 9
  externalIPNetworkCIDRs: null
  hostSubnetLength: 9
  ingressIPNetworkCIDR: 172.29.0.0/16
  networkPluginName: redhat/openshift-ovs-multitenant (3)
  serviceNetworkCIDR: 172.30.0.0/16
1 Cluster network for node IP allocation.
2 Number of bits for pod IP allocation within a node.
3 Set to redhat/openshift-ovs-subnet for the ovs-subnet plug-in, redhat/openshift-ovs-multitenant for the ovs-multitenant plug-in, or redhat/openshift-ovs-networkpolicy for the ovs-networkpolicy plug-in.

You can add elements to the clusterNetworks value, or remove them if no node is using that CIDR range.

The hostSubnetLength value cannot be changed after the cluster is first created, A cidr field can only be changed to be a larger network that still contains the original network if nodes are allocated within it’s range , and serviceNetworkCIDR can only be expanded. For example, given the typical value of 10.128.0.0/14, you could change cidr to 10.128.0.0/9 (i.e., the entire upper half of net 10) but not to 10.64.0.0/16, because that does not overlap the original value.

You can change serviceNetworkCIDR from 172.30.0.0/16 to 172.30.0.0/15, but not to 172.28.0.0/14, because even though the original range is entirely inside the new range, the original range must be at the start of the CIDR. See Expanding the service network for more information.

Ensure that you restart the API and master services for any changes to take effect:

$ master-restart api
$ master-restart controllers

The pod network settings on the nodes must match the pod network settings configured by the networkConfig.clusterNetworks parameter on the masters. This can be done by modifying parameters in the networkConfig section of the appropriate node configuration map:

proxyArguments:
  cluster-cidr:
  - 10.128.0.0/12 (1)
1 The CIDR value must encompass all the cluster network CIDR ranges defined at the master level but not conflict with other IP ranges, such as for nodes and services.

After the master services have been restarted the configuration must be propagated to the nodes. On each node the atomic-openshift-node service and ovs pod must be restarted. To avoid downtime follow the steps defined in Managing Nodes and described in the following procedure for each node or group of nodes at a time:

  1. Mark the node as unschedulable:

    # oc adm manage-node <node1> <node2> --schedulable=false
  2. Drain the node:

    # oc adm drain <node1> <node2>
  3. Restart the node:

    # reboot
  4. Mark the node as schedulable again:

    #  oc adm manage-node <node1> <node2> --schedulable

Changing the VXLAN PORT for the cluster network

As a cluster administrator, you can change the VXLAN port the system uses.

Because you cannot change the VXLAN port of a running clusternetwork object, you must delete any existing network configurations and create a new configuration by editing the vxlanPort variable in the master configuration file.

  1. Delete the existing clusternetwork:

    # oc delete clusternetwork default
  2. Edit the master configuration file located at /etc/origin/master/master-config.yaml by default creating the new clusternetwork:

    networkConfig:
      clusterNetworks:
      - cidr: 10.128.0.0/14
        hostSubnetLength: 9
      - cidr: 10.132.0.0/14
        hostSubnetLength: 9
      externalIPNetworkCIDRs: null
      hostSubnetLength: 9
      ingressIPNetworkCIDR: 172.29.0.0/16
      networkPluginName: redhat/openshift-ovs-multitenant
      serviceNetworkCIDR: 172.30.0.0/16
      vxlanPort: 4889 (1)
    1 Set to the value used by the nodes for the VXLAN Port. It can be an integer between 1-65535. The default value is 4789.
  3. Add the new port to the iptables rule on each cluster node:

    # iptables -A OS_FIREWALL_ALLOW -p udp -m state --state NEW -m udp --dport 4889 -j ACCEPT (1)
    1 4889 is the vxlanPort value that you set in the master configuration file.
  4. Restart the master services:

    # master-restart api
    # master-restart controllers
  5. Delete any old SDN pods to propagate new pods with the new change:

    # oc delete pod -l app=sdn -n openshift-sdn

Configuring the Pod Network on Nodes

The cluster administrators can control pod network settings on nodes by modifying parameters in the networkConfig section of the appropriate node configuration map:

networkConfig:
  mtu: 1450 (1)
  networkPluginName: "redhat/openshift-ovs-subnet" (2)
1 Maximum transmission unit (MTU) for the pod overlay network
2 Set to redhat/openshift-ovs-subnet for the ovs-subnet plug-in, redhat/openshift-ovs-multitenant for the ovs-multitenant plug-in, or redhat/openshift-ovs-networkpolicy for the ovs-networkpolicy plug-in

You must change the MTU size on all masters and nodes that are part of the OKD SDN. Also, the MTU size of the tun0 interface must be the same across all nodes that are part of the cluster.

Expanding the service network

If you are running low on addresses in your service network, you can expand the range as long as you ensure that the current range is at the beginning of the new range.

The service network can only be expanded, it cannot be changed or contracted.

  1. Change the serviceNetworkCIDR and servicesSubnet parameters in the configuration files for all masters (/etc/origin/master/master-config.yaml by default). Change only the number following the / to a smaller number.

  2. Delete the clusterNetwork default object:

    $ oc delete clusternetwork default
  3. Restart the controllers component on all masters:

    # master-restart controllers
  4. Update the value of the openshift_portal_net variable in the Ansible inventory file to the new CIDR:

    # Configure SDN cluster network and kubernetes service CIDR blocks. These
    # network blocks should be private and should not conflict with network blocks
    # in your infrastructure that pods may require access to. Can not be changed
    # after deployment.
    openshift_portal_net=172.30.0.0/<new_CIDR_range>

For each node in the cluster, complete the following steps:

Migrating Between SDN Plug-ins

If you are already using one SDN plug-in and want to switch to another:

  1. Change the networkPluginName parameter on all masters and nodes in their configuration files.

  2. Restart the API and master services on all masters:

    # master-restart api
    # master-restart controllers
  3. Stop the node service on all masters and nodes:

    # systemctl stop atomic-openshift-node.service
  4. If you are switching between OpenShift SDN plug-ins, restart OpenShift SDN on all masters and nodes.

    oc delete pod --all -n openshift-sdn
  5. Restart the node service on all masters and nodes:

    # systemctl restart atomic-openshift-node.service
  6. If you are switching from an OpenShift SDN plug-in to a third-party plug-in, then clean up OpenShift SDN-specific artifacts:

    $ oc delete clusternetwork --all
    $ oc delete hostsubnets --all
    $ oc delete netnamespaces --all

Additionally, after switching to ovs-multitenant, the users can no longer provision services using the Service Catalog. The same applies for openshift-monitoring. To correct this, make these projects global:

$ oc adm pod-network make-projects-global kube-service-catalog
$ oc adm pod-network make-projects-global openshift-monitoring

This problem does not appear if the cluster was initially installed with ovs-multitenant, because these commands were executed as part of the Ansible playbooks.

When switching from the ovs-subnet to the ovs-multitenant OpenShift SDN plug-in, all the existing projects in the cluster will be fully isolated (assigned unique VNIDs). The cluster administrators can choose to modify the project networks using the administrator CLI.

Check VNIDs by running:

$ oc get netnamespace

Migrating from ovs-multitenant to ovs-networkpolicy

The v1 NetworkPolicy features are available only in OKD. This means that egress policy types, IPBlock, and combining podSelector and namespaceSelector are not available in OKD.

Do not apply NetworkPolicy features on default OKD projects, because they can disrupt communication with the cluster.

In addition to the generic plug-in migration steps above in the Migrating between SDN plug-ins section, there is one additional step when migrating from the ovs-multitenant plug-in to the ovs-networkpolicy plug-in; you must ensure that every namespace has a unique NetID. This means that if you have previously joined projects together or made projects global, you will need to undo that before switching to the ovs-networkpolicy plug-in, or the NetworkPolicy objects may not function correctly.

A helper script is available that fixes NetID’s, creates NetworkPolicy objects to isolate previously-isolated namespaces, and enables connections between previously-joined namespaces.

Use the following steps to migrate to the ovs-networkpolicy plug-in, by using this helper script, while still running the ovs-multitenant plug-in:

  1. Download the script and add the execution file permission:

    $ curl -O https://raw.githubusercontent.com/openshift/origin/release-3.11/contrib/migration/migrate-network-policy.sh
    $ chmod a+x migrate-network-policy.sh
  2. Run the script (requires the cluster administrator role).

    $ ./migrate-network-policy.sh

After running this script, every namespace is fully isolated from every other namespace, therefore connection attempts between pods in different namespaces will fail until you complete the migration to the ovs-networkpolicy plug-in.

If you want newly-created namespaces to also have the same policies by default, you can set default NetworkPolicy objects to be created matching the default-deny and allow-from-global-namespaces policies created by the migration script.

In case of script failures or other errors, or if you later decide you want to revert back to the ovs-multitenant plug-in, you can use the un-migration script. This script undoes the changes made by the migration script and re-joins previously-joined namespaces.

External Access to the Cluster Network

If a host that is external to OKD requires access to the cluster network, you have two options:

  1. Configure the host as an OKD node but mark it unschedulable so that the master does not schedule containers on it.

  2. Create a tunnel between your host and a host that is on the cluster network.

Both options are presented as part of a practical use-case in the documentation for configuring routing from an edge load-balancer to containers within OpenShift SDN.

Using Flannel

As an alternate to the default SDN, OKD also provides Ansible playbooks for installing flannel-based networking. This is useful if running OKD within a cloud provider platform that also relies on SDN, such as Red Hat OpenStack Platform, and you want to avoid encapsulating packets twice through both platforms.

Flannel uses a single IP network space for all of the containers allocating a contiguous subset of the space to each instance. Consequently, nothing prevents a container from attempting to contact any IP address in the same network space. This hinders multi-tenancy because the network cannot be used to isolate containers in one application from another.

Depending on whether you prefer mutli-tenancy isolation or performance, you should determine the appropriate choice when deciding between OpenShift SDN (multi-tenancy) and flannel (performance) for internal networks.

The current version of Neutron enforces port security on ports by default. This prevents the port from sending or receiving packets with a MAC address different from that on the port itself. Flannel creates virtual MACs and IP addresses and must send and receive packets on the port, so port security must be disabled on the ports that carry flannel traffic.

To enable flannel within your OKD cluster:

  1. Neutron port security controls must be configured to be compatible with Flannel. The default configuration of Red Hat OpenStack Platform disables user control of port_security. Configure Neutron to allow users to control the port_security setting on individual ports.

    1. On the Neutron servers, add the following to the /etc/neutron/plugins/ml2/ml2_conf.ini file:

      [ml2]
      ...
      extension_drivers = port_security
    2. Then, restart the Neutron services:

      service neutron-dhcp-agent restart
      service neutron-ovs-cleanup restart
      service neutron-metadata-agentrestart
      service neutron-l3-agent restart
      service neutron-plugin-openvswitch-agent restart
      service neutron-vpn-agent restart
      service neutron-server  restart
  2. When creating the OKD instances on Red Hat OpenStack Platform, disable both port security and security groups in the ports where the container network flannel interface will be:

    neutron port-update $port --no-security-groups --port-security-enabled=False

    Flannel gather information from etcd to configure and assign the subnets in the nodes. Therefore, the security group attached to the etcd hosts should allow access from nodes to port 2379/tcp, and nodes security group should allow egress communication to that port on the etcd hosts.

    1. Set the following variables in your Ansible inventory file before running the installation:

      openshift_use_openshift_sdn=false (1)
      openshift_use_flannel=true (2)
      flannel_interface=eth0
      1 Set openshift_use_openshift_sdn to false to disable the default SDN.
      2 Set openshift_use_flannel to true to enable flannel in place.
    2. Optionally, you can specify the interface to use for inter-host communication using the flannel_interface variable. Without this variable, the OKD installation uses the default interface.

      Custom networking CIDR for pods and services using flannel will be supported in a future release. BZ#1473858

  3. After the OKD installation, add a set of iptables rules on every OKD node:

    iptables -A DOCKER -p all -j ACCEPT
    iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE

    To persist those changes in the /etc/sysconfig/iptables use the following command on every node:

    cp /etc/sysconfig/iptables{,.orig}
    sh -c "tac /etc/sysconfig/iptables.orig | sed -e '0,/:DOCKER -/ s/:DOCKER -/:DOCKER ACCEPT/' | awk '"\!"p && /POSTROUTING/{print \"-A POSTROUTING -o eth1 -j MASQUERADE\"; p=1} 1' | tac > /etc/sysconfig/iptables"

    The iptables-save command saves all the current in memory iptables rules. However, because Docker, Kubernetes and OKD create a high number of iptables rules (services, etc.) not designed to be persisted, saving these rules can become problematic.

To isolate container traffic from the rest of the OKD traffic, Red Hat recommends creating an isolated tenant network and attaching all the nodes to it. If you are using a different network interface (eth1), remember to configure the interface to start at boot time through the /etc/sysconfig/network-scripts/ifcfg-eth1 file:

DEVICE=eth1
TYPE=Ethernet
BOOTPROTO=dhcp
ONBOOT=yes
DEFTROUTE=no
PEERDNS=no