Standalone Networking Configuration

https://bugs.launchpad.net/ironic/+bug/2113769

This document proposes a mechanism for configuring switch port attributes during the management of baremetal nodes. The aim is to enable Ironic to support this functionality without delegating networking tasks to Neutron. This new mechanism, initiated by the existing network driver interface in the Conductor, would delegate actual switch port configuration to a standalone service. This service could be co-located with existing Ironic services or managed independently, on a separate server, by an operator group dedicated to network equipment.

Problem description

Operators seek to automate network configuration to enable node deployment without manual pre-configuration of switch ports. Nodes connect to switches via one or more network interfaces, each requiring specific neighboring switch configurations for their intended use. These interfaces may require access port or trunk port configurations. Additionally, link aggregation might be required to enhance a node’s networking capacity or redundancy. Normally, the configuration operations must be completed manually before the nodes can be deployed by Ironic.

Managing nodes and networking equipment often falls to separate organizational groups. This division of responsibility introduces security concerns regarding who can modify switch configurations and to what extent. Since fine-grained authorization varies across switch vendors, this service should offer basic controls to address operator security concerns; thereby limiting what operations could be attempted by Ironic.

Proposed change

The implementation proposed is to introduce a new standalone service to execute the interactions with the network switch equipment. This new service includes a JSON RPC front end API to be accessed by a network driver running within the context of the Conductor. During the normal management lifecycle of baremetal nodes, any ports defined on a node will be processed by the network driver as is already the case in the current implementation.

It is the responsibility of the driver to ensure that related switch configurations are updated according to the switchport and portchannel information contained within the extra property of the port. Existing network drivers can interface with the Neutron subsystem to manage switch configurations, but the solution proposed in this design eliminates the need to include other OpenStack components and instead provides an end-to-end Ironic only solution.

Note

The usage of the extra property is an interim approach to enable the new functionality. The long term goal is to replace these properties with a new set of API models and endpoints that are specific to the new functionality. The intent is to explore the use of this new functionality using the extra property as a means to validate the new functionality. Once the usecases are validated and better understood we can promote the new contents of the extra property to a new set of API models and endpoints.

Note

The standalone service may need to consider the implications of issuing commands concurrently to the same switch. This could lead to unexpected results under certain circumstances. The standalone service should be designed to handle this scenario gracefully.

┌──────────────────────┐
│                      │
│    Ironic API        │
│                      │
└──────────┬───────────┘
           │
           │                 ┌────────────────────────┐
┌──────────▼───────────┐     │                        │
│                      │     │   Ironic Standalone    │
│   Ironic Conductor   │     │   Networking Service   │     ┌───────────────┐
│          ┌───────────┤     │           ┌────────────┤     │               │
│          │  Network  ┼─────►           │  Switch    ┼─────►  TOR Switch   │
│          │  Driver   │     │           │  Driver    │     │               │
└──────────└───────────┘     └───────────└────────────┘     └───────────────┘

Alternatives

If we could assume that all network switches implemented some form of fine grained access controls that could be applied to specific user credentials then we could forgo building an access control mechanism and instead rely on the access rights granted to the user credentials assigned to the Ironic service. This would eliminate the need to implement this mechanism as a standalone service and instead we could simply embed the functionality directly into the Conductor by way of a custom network driver that could directly interact with the network switches.

┌──────────────────────┐
│                      │
│    Ironic API        │
│                      │
└──────────┬───────────┘
           │
           │
┌──────────▼───────────┐
│                      │
│   Ironic Conductor   │        ┌───────────────────┐
│          ┌───────────┤        │                   │
│          │   Network ┼────────►─  TOR Switch      │
│          │   Driver  │        │                   │
└──────────└───────────┘        └───────────────────┘

Of course, the access control requirement is only one reason for having a separate service and we would need to assess whether other requirements would also push us towards a standalone service (e.g., would it be beneficial to have the actual switch interactions performed on a separate service for performance/scalability reasons? Assuming the driver operations are synchronous to this other service presumably performance/scalability would not be improved by having a separate service).

Data model impact

There are no planned changes to existing data models. Switch port configuration details are to be stored in the extra property of existing port objects; therefore, existing port related data model schemas can be used without modification.

State Machine Impact

None

REST API impact

There are no planned changes to existing API schemas or endpoints. Switch port configuration details are stored in the extra property of a port or portgroup; therefore, existing port and portgroup related API endpoints and schemas can be used without modification. The extra property should be populated with a dictionary conforming to this schema when related to a port object.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Switchport Configuration",
  "description": "Schema for defining switchport configurations based on
    mode (trunk or access)",
  "type": "object",
  "properties": {
    "switchport": {
      "type": "object",
      "properties": {
        "mode": {
          "type": "string",
          "enum": ["trunk", "access"]
        },
        "native_vlan": {
          "type": "integer",
          "minimum": 1,
          "description": "The native VLAN ID for the switchport.  If not
            supplied then the switch global default VLAN ID is used."
        },
        "allowed_vlans": {
          "type": "array",
          "items": {
            "type": "integer",
            "minimum": 1
          },
          "minItems": 1,
          "description": "List of allowed VLANs for trunk mode.  Only
            applicable, and is mandatory, if mode=trunk."
        }
      },
      "required": [
        "mode",
      ]
    }
  },
  "required": [
    "switchport",
  ],
}

For example, the following data is stored in the extra property of a port to specify that its switch port must be configured as a trunk port having a specific default VLAN and a set of allowed VLANs.

{'switchport':
  {'mode': 'trunk',
   'native_vlan': 1,
   'allowed_vlans': [2, 3, 4]
  }
}

The following data is stored in the extra property of a port to specify that its switch port must be configured as an access port having a specific default VLAN.

{'switchport':
  {'mode': 'access',
   'native_vlan': 1,
  }
}

If related to a portgroup object then a similar schema is supported but with differences applicable to switch port channels only.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Switchport Configuration",
  "description": "Schema for defining switch port channel configurations
    based on mode (e.g., trunk or access) and aggregation mode (e.g., LACP
    or static)",
  "type": "object",
  "properties": {
    "portchannel": {
      "type": "object",
      "properties": {
        "mode": {
          "type": "string",
          "enum": ["trunk", "access"]
        },
        "native_vlan": {
          "type": "integer",
          "minimum": 1,
          "description": "The native VLAN ID for the switchport.  If not
            supplied then the switch global default VLAN ID is used."
        },
        "allowed_vlans": {
          "type": "array",
          "items": {
            "type": "integer",
            "minimum": 1
          },
          "minItems": 1,
          "description": "List of allowed VLANs for trunk mode.  Only
            applicable, and is mandatory, if mode=trunk."
        },
        "aggregation_mode": {
          "type": "string",
          "enum": ["lacp", "static"],
        },
      },
      "required": [
        "mode",
        "aggregation_mode",
      ]
    }
  },
  "required": [
    "portchannel",
  ],
}

For example, the following data is stored in the extra property of a portgroup to specify that a corresponding portchannel must be created and managed on the switch. The portchannel should be configured as a trunk port having a specific default VLAN and a set of allowed VLANs, and operate in the LACP link aggregation mode.

{'portchannel':
  {'mode': 'trunk',
   'native_vlan': 1,
   'allowed_vlans': [2, 3, 4],
   'aggregation_mode': 'lacp'
  }
}

The following data is stored in the extra property of a portgroup to specify that a corresponding portchannel must be created and managed on the switch. The portchannel must be configured as an access port having a specific default VLAN, and operation in the static link aggregation mode.

{'portchannel':
  {'mode': 'access',
   'native_vlan': 1,
   'aggregation_mode': 'static'
  }
}

Client (CLI) impact

None

“openstack baremetal” CLI

None

“openstacksdk”

None

RPC API impact

  1. Conductor RPC API: None

  2. Standalone Service API: The RPC API used between the driver and the standalone networking service can be defined as follows

Method

Signature

update_port

{"switch_id": "xx:xx:xx:xx:xx:xx",
 "port_name": "xxx",
 "description": "xxx",
 "mode": "[access|trunk]",
 "default_vlan": n,
 "allowed_vlans": [x, y, z],
 "portchannel_name": "xxx"}

reset_port

{"switch_id": "xx:xx:xx:xx:xx:xx",
 "port_name": "xxx"}

disable_port

{"switch_id": "xx:xx:xx:xx:xx:xx",
 "port_name": "xxx"}

create_portchannel

{"switch_ids": ["xx:xx:xx:xx:xx:xx",...]
 "portchannel_name": "xxx",
 "description": "xxx",
 "mode": "[access|trunk]",
 "default_vlan": {1 to 4094},
 "allowed_vlans": [x, y, z],
 "aggregation_mode": "[lacp|static]"}

delete_portchannel

{"switch_ids": ["xx:xx:xx:xx:xx:xx",...],
 "port_name": "xxx"}

Driver API impact

No changes to the existing driver interfaces. It is expected that this functionality can be implemented within the existing definition of the Networking Driver API. This means that the existing Conductor workflow does need to be modified. The new networking driver must implement the defined abstract interfaces of the base network interface in a way that ensures that switch port configurations are updated correctly as the baremetal node transitions through the full management life cycle state machine.

Proper operation of the driver depends on the port being populated with LLDP information in the form of the link_local_connection property. This ensures that the driver can associate the configuration to the correct port on the switch.

Interface

Actions

port_changed

Update switch port configuration to match if necessary.

portgroup_changed

Update switch port configuration to match if necessary.

vif_attach

N/A. VIF attachments not expected or supported by this driver.

vif_detach

N/A. VIF attachments not expected or supported by this driver.

vif_list

N/A. VIF attachments not expected or supported by this driver.

get_current_vif

N/A. VIF attachments not expected or supported by this driver.

add_provisioning_network

Configure each ports according to the provisioning network configured in the config file; otherwise, configure according to its defined switchport extra property.

remove_provisioning_network

Reset port back to switch port defaults

configure_tenant_networks

Configure each ports according to the switchport configuration defined in its extra property.

unconfigure_tenant_networks

Reset port back to switch port defaults

add_cleaning_network

Configure each ports according to the cleaning network configured in the config file; otherwise, configure according to its defined switchport extra property.

remove_cleaning_network

Reset port back to switch port defaults

validate_rescue

N/A

add_rescuing_network

Configure each ports according to the rescuing network configured in the config file; otherwise, configure according to its defined switchport extra property.

remove_rescuing_network

Reset port back to switch port defaults

validate_inspection

N/A

add_inspection_network

Configure each ports according to the inspection network configured in the config file; otherwise, configure according to its defined switchport extra property.

remove_inspection_network

Reset port back to switch port defaults

need_power_on

False

get_node_network_data

Build network data from configured ports

add_servicing_network

Configure each ports according to the servicing network configured in the config file; otherwise, configure according to its defined switchport extra property.

remove_servicing_network

Reset port back to switch port defaults

Nova driver impact

None

Ramdisk impact

None

Security impact

  • RPC API Authentication/Authorization: The intent of this design is to create a standalone service that exists as part of the Ironic subsystem but that could be run and managed separately by a group dedicated to managing networking equipment. In this capacity the process must have sufficient security controls such that only authorized users have access to its configuration file and RPC API. A discussion is needed to settle on the approach to be used.

  • Access control over switch resources: Ideally, an ACL mechanism native to the switch operating system would be used to restrict access to switch resources for the user entity assigned to the standalone networking service, but since an objective of this design is to not assume that all switch vendors support fine grained control over access to resources it may be beneficial to add the ability to control access to resources directly from configuration data stored in the configuration file for the standalone service.

    Resources to be controlled could include:

    1. Allowed/Denied VLAN IDs

    2. Allowed/Denied port list

    3. Allowed port channel management

Other end user impact

None

Scalability impact

Needing to perform switch operations as part of the management of baremetal nodes will no doubt increase the time required to complete node operations. Exactly how much additional time will depend entirely on the responsiveness of the management software running on the networking switch. Having switch ports configured before booting the node is a definite requirement; therefore, it is likely not feasible to decouple the Conductor process from the configuration of the switch by making it an asynchronous operation.

This will have to be evaluated as the implementation progresses.

Performance Impact

See “Scalability impact” above.

Other deployer impact

To define the switch port configuration details to be used for provider networks the following config file attributes must be used. For each of the network classes if no value is provided in the config file then the port’s extra.switchport or extra.portchannel property will be used, if the port or portgroup does not contain any such attribute then it will be ignored by the driver. The following attributes can be added to the main ironic.conf file.

[standalone_networking]
enabled = <[true|false]>
provisioning_network = <[access|trunk]/vlan-id={1 to 4094}>
cleaning_network = <[access|trunk]/vlan-id={1 to 4094}>
servicing_network = <[access|trunk]/vlan-id={1 to 4094}>
# Optionally, for inspection if a separate network is used
# inspection_network = <[access|trunk]/vlan-id={1 to 4094}>

Example:

[standalone_networking]
enabled = true
provisioning_network = access/vlan-id=10
cleaning_network = access/vlan-id=11
servicing_network = trunk/vlan-id=12
# Optionally, for inspection if a separate network is used
# inspection_network = access/vlan-id=13

To propagate switch port configuration details to switches, the new networking service must be provided with details needed to access and configure the switches. This includes user credentials, network address, and any other attributes required to access the switch.

The following example is a hybrid based on the configuration requirements of two Neutron ML2 mechanism drivers ([1], [2]) combined with some additional access control information to limit access to switch resources. It contains provisions for eventually allowing different types of driver implementations to interact with switches. These attributes can be added to the config file supplied to the standalone networking service.

[DEFAULT]
enabled_devices = <list of switch names>
allowed_vlans = <list of allowed vlans, takes precedence over denied_vlans
  if provided>
denied_vlans = <list of denied vlans, superseded by allowed_vlans if
  provided>
allowed_ports = <list of allowed port names, takes precedence over
  denied_ports if provided>
denied_ports = <list of denied port names, superseded by allowed_ports if
  provided>
portchannels_allowed = [true/false]

[<switch name>]
driver = <driver name>
switch_id = <switch mac address>
host = <switch management ip address>
username = <username>
password = <password, if driver support basic auth>
key_filename = <ssh private key file absolute path, if driver supports ssh>
hostkey_verify = <[true|false]>
allowed_vlans = <list of allowed vlans, takes precedence over denied_vlans
  if provided. Overrides global value if supplied>
denied_vlans = <list of denied vlans, superseded by allowed_vlans if
  provided. Overrides global value if supplied>
allowed_ports = <list of allowed port names, takes precedence over
  denied_ports if provided. Overrides global value if supplied>
denied_ports = <list of denied port names, superseded by allowed_ports if
  provided, Overrides global value if supplied>
portchannels_allowed = [true/false, Overrides global value if supplied]

Example:

[DEFAULT]
enabled_devices = netconf-based-device.example.net,cli-based-device.example.net
allowed_vlans = 3,4,5
denied_vlans = 6,7,8

[netconf-based-device.example.net]
driver = netconf-openconfig
switch_id = <switch mac address>
host = <switch management ip address>
username = user
key_filename = /etc/ironic/ssh_keys/device_a_sshkey
hostkey_verify = false

[cli-based-device.example.net]
driver = netmiko_cisco_ios
switch_id = <switch mac address>
host = <switch management ip address>
username = user
password = secret

Developer impact

None

Implementation

Assignee(s)

Primary assignee:

alegacy

Other contributors:

n/a

Work Items

  • Define RPC API for standalone service

  • Split networking-generic-switch into a reusable library without any Neutron entanglements.

  • Implement standalone service

  • Implement network driver to interface with standalone service

  • Implement switch driver (using refactored parts of the networking-generic-switch package, see above) to interface with networking equipment

  • Revisit the use of the extra properties and replace with new API models and endpoints.

Dependencies

  • The network-generic-switch library contains code to implement interactions with network switches. Unfortunately, that code is entangled with Neutron API dependencies. It is desirable to reuse as much of that library as possible to implement part of this service, but the Neutron specific aspects of its API and implementation must be removed. A possible path forward is to refactor that library so that the Neutron specific parts remain in the actual networking-generic-switch library while the switch specific parts get moved into a new library that can be referenced from the networking-generic-switch project and this new standalone service separately.

Testing

  • Unit testing

  • Possibly a Devstack setup with a stubbed-out driver to simulate switch configurations.

Upgrades and Backwards Compatibility

None

Documentation Impact

TBD

References