Perform one of the following actions:
- 
Visually select the time range by clicking and dragging on the plot horizontally. 
- 
Use the menu to select the time range. 
OKD Virtualization provides metrics that you can use to monitor the consumption of cluster infrastructure resources, including vCPU, network, storage, and guest memory swapping. You can also use metrics to query live migration status.
To use the vCPU metric, the schedstats=enable kernel argument must be applied to the MachineConfig object. This kernel argument enables scheduler statistics used for debugging and performance tuning and adds a minor additional load to the scheduler. For more information, see Adding kernel arguments to nodes.
For guest memory swapping queries to return data, memory swapping must be enabled on the virtual guests.
You can use the OKD metrics query browser to run Prometheus Query Language (PromQL) queries to examine metrics visualized on a plot. This functionality provides information about the state of a cluster and any user-defined workloads that you are monitoring.
As a cluster administrator or as a user with view permissions for all projects, you can access metrics for all default OKD and user-defined projects in the Metrics UI.
The Metrics UI includes predefined queries, for example, CPU, memory, bandwidth, or network packet for all projects. You can also run custom Prometheus Query Language (PromQL) queries.
You have access to the cluster as a user with the cluster-admin cluster role or with view permissions for all projects.
You have installed the OpenShift CLI (oc).
In the OKD web console, click Observe → Metrics.
To add one or more queries, perform any of the following actions:
| Option | Description | 
|---|---|
| Select an existing query. | From the Select query drop-down list, select an existing query. | 
| Create a custom query. | Add your Prometheus Query Language (PromQL) query to the Expression field. As you type a PromQL expression, autocomplete suggestions appear in a drop-down list. These suggestions include functions, metrics, labels, and time tokens. Use the keyboard arrows to select one of these suggested items and then press Enter to add the item to your expression. Move your mouse pointer over a suggested item to view a brief description of that item. | 
| Add multiple queries. | Click Add query. | 
| Duplicate an existing query. | Click the options menu  | 
| Disable a query from being run. | Click the options menu  | 
To run queries that you created, click Run queries. The metrics from the queries are visualized on the plot. If a query is invalid, the UI shows an error message.
| 
 | 
Optional: Save the page URL to use this set of queries again in the future.
Explore the visualized metrics. Initially, all metrics from all enabled queries are shown on the plot. Select which metrics are shown by performing any of the following actions:
| Option | Description | 
|---|---|
| Hide all metrics from a query. | Click the options menu  | 
| Hide a specific metric. | Go to the query table and click the colored square near the metric name. | 
| Zoom into the plot and change the time range. | Perform one of the following actions: 
 | 
| Reset the time range. | Click Reset zoom. | 
| Display outputs for all queries at a specific point in time. | Hover over the plot at the point you are interested in. The query outputs appear in a pop-up box. | 
| Hide the plot. | Click Hide graph. | 
You can use the OKD metrics query browser to run Prometheus Query Language (PromQL) queries to examine metrics visualized on a plot. This functionality provides information about any user-defined workloads that you are monitoring.
As a developer, you must specify a project name when querying metrics. You must have the required privileges to view metrics for the selected project.
The Metrics UI includes predefined queries, for example, CPU, memory, bandwidth, or network packet. These queries are restricted to the selected project. You can also run custom Prometheus Query Language (PromQL) queries for the project.
You have access to the cluster as a developer or as a user with view permissions for the project that you are viewing metrics for.
You have enabled monitoring for user-defined projects.
You have deployed a service in a user-defined project.
You have created a ServiceMonitor custom resource definition (CRD) for the service to define how the service is monitored.
In the OKD web console, click Observe → Metrics.
To add one or more queries, perform any of the following actions:
| Option | Description | 
|---|---|
| Select an existing query. | From the Select query drop-down list, select an existing query. | 
| Create a custom query. | Add your Prometheus Query Language (PromQL) query to the Expression field. As you type a PromQL expression, autocomplete suggestions appear in a drop-down list. These suggestions include functions, metrics, labels, and time tokens. Use the keyboard arrows to select one of these suggested items and then press Enter to add the item to your expression. Move your mouse pointer over a suggested item to view a brief description of that item. | 
| Add multiple queries. | Click Add query. | 
| Duplicate an existing query. | Click the options menu  | 
| Disable a query from being run. | Click the options menu  | 
To run queries that you created, click Run queries. The metrics from the queries are visualized on the plot. If a query is invalid, the UI shows an error message.
| 
 | 
Optional: Save the page URL to use this set of queries again in the future.
Explore the visualized metrics. Initially, all metrics from all enabled queries are shown on the plot. Select which metrics are shown by performing any of the following actions:
| Option | Description | 
|---|---|
| Hide all metrics from a query. | Click the options menu  | 
| Hide a specific metric. | Go to the query table and click the colored square near the metric name. | 
| Zoom into the plot and change the time range. | Perform one of the following actions: 
 | 
| Reset the time range. | Click Reset zoom. | 
| Display outputs for all queries at a specific point in time. | Hover over the plot at the point you are interested in. The query outputs appear in a pop-up box. | 
| Hide the plot. | Click Hide graph. | 
// // * virt/support/virt-prometheus-queries.adoc
The following metric descriptions include example Prometheus Query Language (PromQL) queries. These metrics are not an API and might change between versions. For a complete list of virtualization metrics, see KubeVirt components metrics.
| The following examples use  | 
The following query can identify virtual machines that are waiting for Input/Output (I/O):
kubevirt_vmi_vcpu_wait_seconds_totalReturns the wait time (in seconds) on I/O for vCPUs of a virtual machine. Type: Counter.
A value above '0' means that the vCPU wants to run, but the host scheduler cannot run it yet. This inability to run indicates that there is an issue with I/O.
| To query the vCPU metric, the  | 
kubevirt_vmi_vcpu_delay_seconds_totalReturns the cumulative time, in seconds, that a vCPU was enqueued by the host scheduler but could not run immediately. This delay appears to the virtual machine as steal time, which is CPU time lost when the host runs other workloads. Steal time can impact performance and often indicates CPU overcommitment or contention on the host. Type: Counter.
Example vCPU delay query
irate(kubevirt_vmi_vcpu_delay_seconds_total[5m]) > 0.05 (1)| 1 | This query returns the average per-second delay over a 5-minute period. A high value may indicate CPU overcommitment or contention on the node. | 
Example vCPU wait time query
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_vcpu_wait_seconds_total[6m]))) > 0 (1)| 1 | This query returns the top 3 VMs waiting for I/O at every given moment over a six-minute time period. | 
The following queries can identify virtual machines that are saturating the network:
kubevirt_vmi_network_receive_bytes_totalReturns the total amount of traffic received (in bytes) on the virtual machine’s network. Type: Counter.
kubevirt_vmi_network_transmit_bytes_totalReturns the total amount of traffic transmitted (in bytes) on the virtual machine’s network. Type: Counter.
Example network traffic query
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_network_receive_bytes_total[6m])) + sum by (name, namespace) (rate(kubevirt_vmi_network_transmit_bytes_total[6m]))) > 0 (1)| 1 | This query returns the top 3 VMs transmitting the most network traffic at every given moment over a six-minute time period. | 
The following queries can identify VMs that are writing large amounts of data:
kubevirt_vmi_storage_read_traffic_bytes_totalReturns the total amount (in bytes) of the virtual machine’s storage-related traffic. Type: Counter.
kubevirt_vmi_storage_write_traffic_bytes_totalReturns the total amount of storage writes (in bytes) of the virtual machine’s storage-related traffic. Type: Counter.
Example storage-related traffic query
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_storage_read_traffic_bytes_total[6m])) + sum by (name, namespace) (rate(kubevirt_vmi_storage_write_traffic_bytes_total[6m]))) > 0 (1)| 1 | This query returns the top 3 VMs performing the most storage traffic at every given moment over a six-minute time period. | 
kubevirt_vmsnapshot_disks_restored_from_sourceReturns the total number of virtual machine disks restored from the source virtual machine. Type: Gauge.
kubevirt_vmsnapshot_disks_restored_from_source_bytesReturns the amount of space in bytes restored from the source virtual machine. Type: Gauge.
Examples of storage snapshot data queries
kubevirt_vmsnapshot_disks_restored_from_source{vm_name="simple-vm", vm_namespace="default"} (1)| 1 | This query returns the total number of virtual machine disks restored from the source virtual machine. | 
kubevirt_vmsnapshot_disks_restored_from_source_bytes{vm_name="simple-vm", vm_namespace="default"} (1)| 1 | This query returns the amount of space in bytes restored from the source virtual machine. | 
The following queries can determine the I/O performance of storage devices:
kubevirt_vmi_storage_iops_read_totalReturns the amount of write I/O operations the virtual machine is performing per second. Type: Counter.
kubevirt_vmi_storage_iops_write_totalReturns the amount of read I/O operations the virtual machine is performing per second. Type: Counter.
Example I/O performance query
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_storage_iops_read_total[6m])) + sum by (name, namespace) (rate(kubevirt_vmi_storage_iops_write_total[6m]))) > 0 (1)| 1 | This query returns the top 3 VMs performing the most I/O operations per second at every given moment over a six-minute time period. | 
The following queries can identify which swap-enabled guests are performing the most memory swapping:
kubevirt_vmi_memory_swap_in_traffic_bytesReturns the total amount (in bytes) of memory the virtual guest is swapping in. Type: Gauge.
kubevirt_vmi_memory_swap_out_traffic_bytesReturns the total amount (in bytes) of memory the virtual guest is swapping out. Type: Gauge.
Example memory swapping query
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_memory_swap_in_traffic_bytes[6m])) + sum by (name, namespace) (rate(kubevirt_vmi_memory_swap_out_traffic_bytes[6m]))) > 0 (1)
+| 1 | This query returns the top 3 VMs where the guest is performing the most memory swapping at every given moment over a six-minute time period. | 
| Memory swapping indicates that the virtual machine is under memory pressure. Increasing the memory allocation of the virtual machine can mitigate this issue. | 
The following metrics are exposed by the Application Aware Quota (AAQ) controller for monitoring resource quotas:
kube_application_aware_resourcequotaReturns the current quota usage and the CPU and memory limits enforced by the AAQ Operator resources. Type: Gauge.
kube_application_aware_resourcequota_creation_timestampReturns the time, in UNIX timestamp format, when the AAQ Operator resource is created. Type: Gauge.
The following metrics can be queried to show live migration status:
kubevirt_vmi_migration_data_processed_bytesThe amount of guest operating system data that has migrated to the new virtual machine (VM). Type: Gauge.
kubevirt_vmi_migration_data_remaining_bytesThe amount of guest operating system data that remains to be migrated. Type: Gauge.
kubevirt_vmi_migration_memory_transfer_rate_bytesThe rate at which memory is becoming dirty in the guest operating system. Dirty memory is data that has been changed but not yet written to disk. Type: Gauge.
kubevirt_vmi_migrations_in_pending_phaseThe number of pending migrations. Type: Gauge.
kubevirt_vmi_migrations_in_scheduling_phaseThe number of scheduling migrations. Type: Gauge.
kubevirt_vmi_migrations_in_running_phaseThe number of running migrations. Type: Gauge.
kubevirt_vmi_migration_succeededThe number of successfully completed migrations. Type: Gauge.
kubevirt_vmi_migration_failedThe number of failed migrations. Type: Gauge.