level=error msg="Error backing up item" backup=velero/monitoring error="timed out waiting for all PodVolumeBackups to complete"
Extending a timeout allows complex or resource-intensive processes to complete successfully without premature termination. This configuration can reduce errors, retries, or failures.
Ensure that you balance timeout extensions in a logical manner so that you do not configure excessively long timeouts that might hide underlying issues in the process. Consider and monitor an appropriate timeout value that meets the needs of the process and the overall system performance.
The following OADP timeouts show instructions of how and when to implement these parameters:
The spec.configuration.nodeAgent.timeout
parameter defines the Restic timeout. The default value is 1h
.
Use the Restic timeout
parameter in the nodeAgent
section for the following scenarios:
For Restic backups with total PV data usage that is greater than 500GB.
If backups are timing out with the following error:
level=error msg="Error backing up item" backup=velero/monitoring error="timed out waiting for all PodVolumeBackups to complete"
Edit the values in the spec.configuration.nodeAgent.timeout
block of the DataProtectionApplication
custom resource (CR) manifest, as shown in the following example:
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
name: <dpa_name>
spec:
configuration:
nodeAgent:
enable: true
uploaderType: restic
timeout: 1h
# ...
resourceTimeout
defines how long to wait for several Velero resources before timeout occurs, such as Velero custom resource definition (CRD) availability, volumeSnapshot
deletion, and repository availability. The default is 10m
.
Use the resourceTimeout
for the following scenarios:
For backups with total PV data usage that is greater than 1TB. This parameter is used as a timeout value when Velero tries to clean up or delete the Container Storage Interface (CSI) snapshots, before marking the backup as complete.
A sub-task of this cleanup tries to patch VSC and this timeout can be used for that task.
To create or ensure a backup repository is ready for filesystem based backups for Restic or Kopia.
To check if the Velero CRD is available in the cluster before restoring the custom resource (CR) or resource from the backup.
Edit the values in the spec.configuration.velero.resourceTimeout
block of the DataProtectionApplication
CR manifest, as in the following example:
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
name: <dpa_name>
spec:
configuration:
velero:
resourceTimeout: 10m
# ...
defaultItemOperationTimeout
defines how long to wait on asynchronous BackupItemActions
and RestoreItemActions
to complete before timing out. The default value is 1h
.
Use the defaultItemOperationTimeout
for the following scenarios:
Only with Data Mover 1.2.x.
To specify the amount of time a particular backup or restore should wait for the Asynchronous actions to complete. In the context of OADP features, this value is used for the Asynchronous actions involved in the Container Storage Interface (CSI) Data Mover feature.
When defaultItemOperationTimeout
is defined in the Data Protection Application (DPA) using the defaultItemOperationTimeout
, it applies to both backup and restore operations. You can use itemOperationTimeout
to define only the backup or only the restore of those CRs, as described in the following "Item operation timeout - restore", and "Item operation timeout - backup" sections.
Edit the values in the spec.configuration.velero.defaultItemOperationTimeout
block of the DataProtectionApplication
CR manifest, as in the following example:
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
name: <dpa_name>
spec:
configuration:
velero:
defaultItemOperationTimeout: 1h
# ...
timeout
is a user-supplied timeout to complete VolumeSnapshotBackup
and VolumeSnapshotRestore
. The default value is 10m
.
Use the Data Mover timeout
for the following scenarios:
If creation of VolumeSnapshotBackups
(VSBs) and VolumeSnapshotRestores
(VSRs), times out after 10 minutes.
For large scale environments with total PV data usage that is greater than 500GB. Set the timeout for 1h
.
With the VolumeSnapshotMover
(VSM) plugin.
Only with OADP 1.1.x.
Edit the values in the spec.features.dataMover.timeout
block of the DataProtectionApplication
CR manifest, as in the following example:
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
name: <dpa_name>
spec:
features:
dataMover:
timeout: 10m
# ...
CSISnapshotTimeout
specifies the time during creation to wait until the CSI VolumeSnapshot
status becomes ReadyToUse
, before returning error as timeout. The default value is 10m
.
Use the CSISnapshotTimeout
for the following scenarios:
With the CSI plugin.
For very large storage volumes that may take longer than 10 minutes to snapshot. Adjust this timeout if timeouts are found in the logs.
Typically, the default value for |
Edit the values in the spec.csiSnapshotTimeout
block of the Backup
CR manifest, as in the following example:
apiVersion: velero.io/v1
kind: Backup
metadata:
name: <backup_name>
spec:
csiSnapshotTimeout: 10m
# ...
ItemOperationTimeout
specifies the time that is used to wait for RestoreItemAction
operations. The default value is 1h
.
Use the restore ItemOperationTimeout
for the following scenarios:
Only with Data Mover 1.2.x.
For Data Mover uploads and downloads to or from the BackupStorageLocation
. If the restore action is not completed when the timeout is reached, it will be marked as failed. If Data Mover operations are failing due to timeout issues, because of large storage volume sizes, then this timeout setting may need to be increased.
Edit the values in the Restore.spec.itemOperationTimeout
block of the Restore
CR manifest, as in the following example:
apiVersion: velero.io/v1
kind: Restore
metadata:
name: <restore_name>
spec:
itemOperationTimeout: 1h
# ...
ItemOperationTimeout
specifies the time used to wait for asynchronous
BackupItemAction
operations. The default value is 1h
.
Use the backup ItemOperationTimeout
for the following scenarios:
Only with Data Mover 1.2.x.
For Data Mover uploads and downloads to or from the BackupStorageLocation
. If the backup action is not completed when the timeout is reached, it will be marked as failed. If Data Mover operations are failing due to timeout issues, because of large storage volume sizes, then this timeout setting may need to be increased.
Edit the values in the Backup.spec.itemOperationTimeout
block of the Backup
CR manifest, as in the following example:
apiVersion: velero.io/v1
kind: Backup
metadata:
name: <backup_name>
spec:
itemOperationTimeout: 1h
# ...