The process of storing objects in OSD is illustrated in below. The data storage process in an OSD system is as follows:
1. The application server presents the file to be stored to the OSD node.
2. The OSD node divides the file into two parts: user data and metadata.
3. The OSD node generates the object ID
using a specialized algorithm. The algorithm is executed against the
contents of the user data to derive an ID unique to this data.
4. For future access, the OSD node stores the metadata and object ID using the metadata service.
5. The OSD node stores the user data (objects) in the storage device using the storage service.
6. An acknowledgment is sent to the application server stating that the object is stored.
After an object is
stored successfully, it is available for retrieval. A user accesses the
data stored on OSD by the same filename. The application server
retrieves the stored content using the object ID. This process is
transparent to the user.
The process of retrieving objects in OSD is illustrated in below. The process of data retrieval from OSD is as follows:
1. The application server sends a read request to the OSD system.
2. The metadata service retrieves the object ID for the requested file.
3. The metadata service sends the object ID to the application server.
4. The application server sends the object ID to the OSD storage service for object retrieval.
5. The OSD storage service retrieves the object from the storage device.
6. The OSD storage service sends the file to the application server.
A data archival solution is a promising use case for OSD. Data integrity and protection is the primary requirement for any data archiving solution. Traditional archival solutions—CD and DVD-ROM—do not provide scalability and performance. OSD stores data in the form of objects, associates them with a unique object ID, and ensures high data integrity. Along with integrity, it provides scalability and data protection. These capabilities make OSD a viable option for long term data archiving for fixed content. Content addressed storage (CAS) is a special type of object-based storage device purposely built for storing fixed content. CAS is covered in the following section.
Another use case for
OSD is cloud-based storage. OSD uses a web interface to access storage
resources. OSD provides inherent security, scalability, and automated
data management. It also enables data sharing across heterogeneous
platforms or tenants while ensuring integrity of data. These
capabilities make OSD a strong option for cloud-based storage. Cloud
service providers can leverage OSD to offer storage-as-a-service.
OSD supports web service access via representational state transfer (REST) and simple object access protocol (SOAP). REST and SOAP APIs can be easily integrated with business applications that access OSD over the web.
Content Addressed Storage (CAS) is an object-based storage device
designed for secure online storage and retrieval of fixed content. CAS
stores user data and its attributes as an object. The stored object is
assigned a globally unique address, known as a content address
(CA). This address is derived from the object's binary representation.
CAS provides an optimized and centrally managed storage solution. Data
access in CAS differs from other OSD devices. In CAS, the application
server access the CAS device only via the CAS API running on the
application server. However, the way CAS stores data is similar to the
other OSD systems.
CAS provides all the features required for storing fixed content. The key features of CAS are as follows:
- Content authenticity: It assures the genuineness of stored content. This is achieved by generating a unique content address for each object and validating the content address for stored objects at regular intervals. Content authenticity is assured because the address assigned to each object is as unique as a fingerprint. Every time an object is read, CAS uses a hashing algorithm to recalculate the object's content address as a validation step and compares the result to its original content address. If the object fails validation, CAS rebuilds the object using a mirror or parity protection scheme.
- Content integrity: It provides assurance that the stored content has not been altered. CAS uses a hashing algorithm for content authenticity and integrity. If the fixed content is altered, CAS generates a new address for the altered content, rather than overwrite the original fixed content.
- Location independence: CAS uses a unique content address, rather than directory path names or URLs, to retrieve data. This makes the physical location of the stored data irrelevant to the application that requests the data.
- Single-instance storage (SIS): CAS uses a unique content address to guarantee the storage of only a single instance of an object. When a new object is written, the CAS system is polled to see whether an object is already available with the same content address. If the object is available in the system, it is not stored; instead, only a pointer to that object is created.
- Retention enforcement: Protecting and retaining objects is a core requirement of an archive storage system. After an object is stored in the CAS system and the retention policy is defined, CAS does not make the object available for deletion until the policy expires.
- Data protection: CAS ensures that the content stored on the CAS system is available even if a disk or a node fails. CAS provides both local and remote protection to the data objects stored on it. In the local protection option, data objects are either mirrored or parity protected. In mirror protection, two copies of the data object are stored on two different nodes in the same cluster. This decreases the total available capacity by 50 percent. In parity protection, the data object is split in multiple parts and parity is generated from them. Each part of the data and its parity are stored on a different node. This method consumes less capacity to protect the stored data, but takes slightly longer to regenerate the data if corruption of data occurs.
In the remote replication option, data
objects are copied to a secondary CAS at the remote location. In this
case, the objects remain accessible from the secondary CAS if the
primary CAS system fails.
- Fast record retrieval: CAS stores all objects on disks, which provides faster access to the objects compared to tapes and optical discs.
- Load balancing: CAS distributes objects across multiple nodes to provide maximum throughput and availability.
- Scalability: CAS allows the addition of more nodes to the cluster without any interruption to data access and with minimum administrative overhead.
- Event notification: CAS continuously monitors the state of the system and raises an alert for any event that requires the administrator's attention. The event notification is communicated to the administrator through SNMP, SMTP, or e-mail.
- Self diagnosis and repair: CAS automatically detects and repairs corrupted objects and alerts the administrator about the potential problem. CAS systems can be configured to alert remote support teams who can diagnose and repair the system remotely.
- Audit trails: CAS keeps track of management activities and any access or disposition of data. Audit trails are mandated by compliance requirements.
The storage controller
provides block-level access to application servers through iSCSI, FC,
or FCoE protocols. It contains iSCSI, FC, and FCoE front-end ports for
direct block access. The storage controller is also responsible for
managing the back-end storage pool in the storage system. The controller
configures LUNs and presents them to application servers, NAS heads,
and OSD nodes. The LUNs presented to the application server appear as
local physical disks. A file system is configured on these LUNs and is
made available to applications for storing data.
A NAS head is a
dedicated file server that provides file access to NAS clients. The NAS
head is connected to the storage via the storage controller typically
using a FC or FCoE connection. The system typically has two or more NAS
heads for redundancy. The LUNs presented to the NAS head appear as
physical disks. The NAS head configures the file systems on these disks,
creates a NFS, CIFS, or mixed share, and exports the share to the NAS
clients.
The OSD node
accesses the storage through the storage controller using a FC or FCoE
connection. The LUNs assigned to the OSD node appear as physical disks.
These disks are configured by the OSD nodes, enabling them to store the
data from the web application servers.