Terminology:
- Source: A host accessing the production data from one or more LUNs on the storage array is called a production host, and these LUNs are known as source LUNs (devices/volumes), production LUNs, or simply the source.
- Target: A LUN (or LUNs) on which the production data is replicated, is called the target LUN or simply the target or replica.
- Point-in-Time (PIT) and continuous replica: Replicas can be either a PIT or a continuous copy. The PIT replica is an identical image of the source at some specific timestamp. For example, if a replica of a file system is created at 4:00 p.m. on Monday, this replica is the Monday 4:00 p.m. PIT copy. On the other hand, the continuous replica is in-sync with the production data at all times.
- Recoverability and restartability: Recoverability enables restoration of data from the replicas to the source if data loss or corruption occurs. Restartability enables restarting business operations using the replicas. The replica must be consistent with the source so that it is usable for both recovery and restart operations.
Local Replica Uses:
One or more local replicas of the source data may be created for various purposes, including the following:
- Alternative source for backup: Under normal backup operations, data is read from the production volumes (LUNs) and written to the backup device. This places an additional burden on the production infrastructure because production LUNs are simultaneously involved in production operations and servicing data for backup operations. The local replica contains an exact point-in-time (PIT) copy of the source data, and therefore can be used as a source to perform backup operations. This alleviates the backup I/O workload on the production volumes. Another benefit of using local replicas for backup is that it reduces the backup window to zero.
- Fast recovery: If data loss or data corruption occurs on the source, a local replica might be used to recover the lost or corrupted data. If a complete failure of the source occurs, some replication solutions enable a replica to be used to restore data onto a different set of source devices, or production can be restarted on the replica. In either case, this method provides faster recovery and minimal RTO compared to traditional recovery from tape backups. In many instances, business operations can be started using the source device before the data is completely copied from the replica.
- Decision-support activities, such as reporting or data warehousing: Running the reports using the data on the replicas greatly reduces the I/O burden placed on the production device. Local replicas are also used for data-warehousing applications. The data-warehouse application may be populated by the data on the replica and thus avoid the impact on the production environment.
- Testing platform: Local replicas are also used for testing new applications or upgrades. For example, an organization may use the replica to test the production application upgrade; if the test is successful, the upgrade may be implemented on the production environment.
- Data migration: Another use for a local replica is data migration. Data migrations are performed for various reasons, such as migrating from a smaller capacity LUN to one of a larger capacity for newer versions of the application.
LVM-Based Replication
In LVM-based replication, the logical volume manager is responsible for creating and controlling the host-level logical volumes. An LVM has three components: physical volumes (physical disk), volume groups, and logical volumes. A volume group is created by grouping one or more physical volumes. Logical volumes are created within a given volume group. A volume group can have multiple logical volumes.
In LVM-based replication, each logical block in a logical volume is mapped to two physical blocks on two different physical volumes, as shown in Figure 11.5. An application write to a logical volume is written to the two physical volumes by the LVM device driver. This is also known as LVM mirroring. Mirrors can be split, and the data contained therein can be independently accessed.
Advantage: Not dependent upon vendor specific items.
Disadvantage: Each write requires two writes.
A file system (FS) snapshot is a pointer-based replica that requires a fraction of the space used by the production FS. This snapshot can be implemented by either FS or by LVM. It uses the Copy on First Write (CoFW) principle to create snapshots.
When a snapshot is
created, a bitmap and blockmap are created in the metadata of the Snap
FS. The bitmap is used to keep track of blocks that are changed on the
production FS after the snap creation. The blockmap is used to indicate
the exact address from which the data is to be read when the data is
accessed from the Snap FS. Immediately after the creation of the FS
snapshot, all reads from the snapshot are actually served by reading the
production FS. In a CoFW mechanism, if a write I/O is issued to the
production FS for the first time after the creation of a snapshot, the
I/O is held and the original data of production FS corresponding to that
location is moved to the Snap FS. Then, the write is allowed to the
production FS. The bitmap and blockmap are updated accordingly.
Subsequent writes to the same location do not initiate the CoFW
activity. To read from the Snap FS, the bitmap is consulted. If the bit
is 0, then the read is directed to the production FS. If the bit is 1,
then the block address is obtained from the blockmap, and the data is
read from that address on the Snap FS. Read requests from the production
FS work as normal.
Copy on First Access (CoFA)
Another method of array-based local replication is pointer-based full-volume replication. Similar to full-volume mirroring, this technology can provide full copies of the source data on the targets. Unlike full-volume mirroring, the target is immediately accessible by the BC host after the replication session is activated. Therefore, data synchronization and detachment of the target is not required to access it. Here, the time of replication session activation defines the PIT copy of the source.
Pointer-based, full-volume replication can be activated in either Copy on First Access (CoFA)
mode or Full Copy mode. In either case, at the time of activation, a
protection bitmap is created for all data on the source devices. The
protection bitmap keeps track of the changes at the source device. The
pointers on the target are initialized to map the corresponding data
blocks on the source. The data is then copied from the source to the
target based on the mode of activation.
In CoFA,
after the replication session is initiated, the data is copied from the
source to the target only when the following condition occurs:
- A write I/O is issued to a specific address on the source for the first time.
- A read or write I/O is issued to a specific address on the target for the first time.
A stripe depth of 32 KB has been assigned to a five-disk RAID 5 set. What is the stripe size?
Select one:
An
application uses ten, 15 GB devices. A pointer-based full volume
replica of the application is required. The replica will be kept for 24
hours and the data changes by 10% every 24 hours.
How much storage should be allocated for the replication?
How much storage should be allocated for the replication?
Select one:
Remote Replication:
The two basic modes of remote replication are synchronous and asynchronous. In synchronous remote replication, writes must be committed to the source and remote replica (or target), prior to acknowledging “write complete” to the host (see Figure 12.1). Additional writes on the source cannot occur until each preceding write has been completed and acknowledged. This ensures that data is identical on the source and replica at all times. Further, writes are transmitted to the remote site exactly in the order in which they are received at the source. Therefore, write ordering is maintained. If a source-site failure occurs, synchronous remote replication provides zero or near-zero recovery-point objective (RPO).
In asynchronous remote replication, a write is committed to the source and immediately acknowledged to the host. In this mode, data is buffered at the source and transmitted to the remote site later
The remote site will be behind the source by at least as much as the buffer. Provides a non zero RPO.
In the cascade/multihop three-site replication, data flows from the source
to the intermediate storage array, known as a bunker, in the first hop, and
then from a bunker to a storage array at a remote site in the second hop.
Replication between the source and the remote sites can be performed in two
ways: synchronous + asynchronous or synchronous + disk buffered. Replication
between the source and bunker occurs synchronously, but replication between the
bunker and the remote site can be achieved either as disk-buffered mode or
asynchronous mode.
Synchronous plus disk buffered--RPO matter of minutes at remote site
Minimum of 3 storage devices required.
If disaster at source, production ops failed over to bunker site with
near zero loss of data. Bunker is middle
site.
Synchronous + Disk Buffered: Local
and remote site replication technologies.
Synchrnous Site to Bunker. Data
from Bunker replica to remote replica.
Four Storage devices required
Three site Triangle/Multitarget--ability to failover to either of the two
remote sites.
In array-based synchronous remote replication, writes must be committed to
the source and the target prior to acknowledging “write complete” to the
production host. Additional writes on that source cannot occur until each
preceding write has been completed and acknowledged.
In array-based asynchronous remote replication mode a write is committed to
the source and immediately acknowledged to the host.
Disk-buffered replication is a combination of local and remote replication
technologies. A consistent PIT local replica of the source device is first
created. This is then replicated to a remote replica on the target array.
In normal operation, CDP remote replication provides any-point-in-time
recovery capability, which enables the target LUNs to be rolled back to any
previous point in time. Similar to CDP local replication, CDP remote
replication typically uses a journal volume, CDP appliance, or CDP software
installed on a separate host (host-based CDP), and a write splitter to perform
replication between sites. The CDP appliance is maintained at both source and
remote sites.
In the asynchronous mode, the local CDP appliance instantly acknowledges a
write as soon as it is received. In the synchronous replication mode, the host
application waits for an acknowledgment from the CDP appliance at the remote
site before initiating the next write. The synchronous replication mode impacts
the application's performance under heavy write loads.
Another method of array-based local replication is pointer-based full-volume replication. Similar to full-volume mirroring, this technology can provide full copies of the source data on the targets. Unlike full-volume mirroring, the target is immediately accessible by the BC host after the replication session is activated. Therefore, data synchronization and detachment of the target is not required to access it. Here, the time of replication session activation defines the PIT copy of the source.
Another method of array-based local replication is pointer-based full-volume replication. Similar to full-volume mirroring, this technology can provide full copies of the source data on the targets. Unlike full-volume mirroring, the target is immediately accessible by the BC host after the replication session is activated. Therefore, data synchronization and detachment of the target is not required to access it. Here, the time of replication session activation defines the PIT copy of the source.