The recovery time objective (RTO) is the maximum time period within which a business process must be restored to a designated service level after a disaster to avoid unacceptable consequences associated with a break in business continuity.
The work recovery time (WRT) is the remainder of the overall MTD value after the RTO has passed.
The recovery point objective (RPO) is the acceptable amount of data loss measured in time.
The RTO, RPO, and WRT values are critical to understand because they will be the basic foundational metrics used when determining the type of recovery solutions a company must put into place.
The BCP team has to figure out what the company needs to do to actually recover the processes and services it has identified as being so important to the organization overall.
The team needs to actually define the recovery processes, which are sets of predefined activities that will be implemented and carried out in response to a disaster. More importantly, these processes must be constantly re-evaluated and updated as necessary to ensure that the organization meets or exceeds the MTDs.
Business Process Recovery
A business process is a set of interrelated steps linked through specific decision activities to accomplish a specific task.
The BCP team must understand the following about critical business processes:
-
Required roles -
Required resources -
Input and output mechanisms -
Workflow steps -
Required time for completion -
Interfaces with other processes
Recovery Site Strategies
Disruptions, in BCP terms, are of three main types: nondisasters, disasters, and catastrophes. A nondisaster is a disruption in service that has significant but limited impact on the conduct of business processes at a facility. A disaster is an event that causes the entire facility to be unusable for a day or longer. A catastrophe is a major disruption that destroys the facility altogether.
The organization has three basic options: select a dedicated site that the organization operates itself; lease a commercial facility, such as a “hot site” that contains all the equipment and data needed to quickly restore operations; or enter into a formal agreement with another facility, such as a service bureau, to restore its operations.
Companies can choose from three main types of leased or rented offsite facilities:
-
Hot site A facility that is leased or rented and is fully configured and ready to operate within a few hours.These sites are a good choice for a company that needs to ensure a site will be available for it as soon as possible. -
Warm site A leased or rented facility that is usually partially configured with some equipment, such as HVAC, and foundational infrastructure components, but not the actual computers. -
Cold site A leased or rented facility that supplies the basic environment, electrical wiring, air conditioning, plumbing, and flooring, but none of the equipment or additional services.
A service bureau is a company that has additional space and capacity to provide applications and services such as call centers.
Hot Site Advantages:
-
Ready within hours for operation -
Highly available -
Usually used for short-term solutions, but available for longer stays -
Annual testing available
Hot Site Disadvantages:
-
Very expensive -
Limited on hardware and software choices
Warm and Cold Site Advantages:
-
Less expensive -
Available for longer timeframes because of the reduced costs -
Practical for proprietary hardware or software use
Warm and Cold Site Disadvantages:
-
Operational testing not usually available -
Resources for operations not immediately available
Reciprocal Agreements
Another approach to alternate offsite facilities is to establish a reciprocal agreement with another company, usually one in a similar field or that has similar technological infrastructure. This means that company A agrees to allow company B to use its facilities if company B is hit by a disaster, and vice versa.
A variation on a reciprocal agreement is a consortium, or mutual aid agreement. In this case, more than two organizations agree to help one another in case of an emergency.
Redundant Sites
Some companies choose to have redundant sites, or mirrored sites, meaning one site is equipped and configured exactly like the primary site, which serves as a redundant environment.
A hot site is a subscription service. A redundant site, in contrast, is a site owned and maintained by the company, eaning the company does not pay anyone else for the site.
Another type of facility-backup option is a rolling hot site, or mobile hot site, where the back of a large truck or a trailer is turned into a data processing or working area.
Another, similar solution is a prefabricated building that can be easily and quickly put together.
Another option for organizations is to have multiple processing sites.
Supply and Technology Recovery
The BCP should also include backup solutions for the following:
-
Network and computer equipment -
Voice and data communications resources -
Human resources -
Transportation of equipment and personnel -
Environment issues (HVAC) -
Data and personnel security issues -
Supplies (paper, forms, cabling, and so on) -
Documentation
The organization must make sure that the outsourced company is financially viable and has a solid record in BCP.
The organization can take the following steps to better ensure the continuity of its outsourcing:
Make the ability of such companies to reliably assure continuity of products and services part of any work proposals. Make sure that business continuity planning is included in contracts with such companies, and that their responsibilities and levels of service are clearly spelled out. Draw up realistic and reasonable service levels that the outsourced firm will meet during an incident. If possible, have the outsourcing companies take part in BCP awareness programs, training, and testing.
The organization’s current technical environment must be understood. This means the planners have to know the intimate details of the network, communications technologies, computers, network equipment, and software requirements that are necessary to get the critical functions up and running.
Hardware Backups
The BCP needs to identify the equipment required to keep the critical functions up and running.
The BCP also needs to be based on accurate estimates of how long it will take for new equipment to arrive.
Software Backups
The BCP team should make sure there are at least two copies of the company’s operating system software and critical applications.
The protection mechanism that bank A should implement is called software escrow, in which a third party holds the source code, backups of the compiled code, manuals, and other supporting materials.
Backup Storage Strategies
The BCP team’s responsibility is to provide solutions to protect this data and identify ways to restore it after a disaster.
The first step is to do a full backup, which is just what it sounds like—all data is backed up and saved to some type of storage media.
A differential process backs up the files that have been modified since the last full backup.
An incremental process backs up all the files that have changed since the last full or incremental backup and sets the archive bit to 0.
Critical data should be backed up and stored at an onsite area and an offsite area.
The onsite backup information should be stored in a fire-resistant, heat-resistant, and waterproof safe.
A backup strategy must take into account that failure can take place at any step of the process, so if there is a problem during the backup or restoration process that could corrupt the data, there should be a graceful way of backing out or reconstructing the data from the beginning.
Electronic Backup Solutions
Disk duplexing means there is more than one disk controller.
Disk shadowing is used to ensure the availability of data and to provide a fault-tolerant solution by duplicating hardware and maintaining more than one copy of the information.
If only disk mirroring is used, then each disk would have a corresponding mirrored disk that contains the exact same information.
Disk shadowing provides online backup storage, which can either reduce or replace the need for periodic offline manual backup operations.
Electronic vaulting makes copies of files as they are modified and periodically transmits them to an offsite backup site.
Remote journaling is another method of transmitting data offsite, but this usually only includes moving the journal or transaction logs to the offsite facility, not the actual files.
Remote journaling takes place in real time and transmits only the file deltas.
Electronic vaulting takes place in batches and moves the entire file that has been updated.
With automatic tape vaulting, the data is sent over a serial line to a backup tape system at the offsite facility.
Data repositories commonly have replication capabilities, so that when changes take place to one repository, they are replicated to all of the other repositories within the organization. Replication can be asynchronous or synchronous. Asynchronous replication means the primary and secondary data volumes are out of sync. With synchronous replication, the primary and secondary repositories are always in sync, which provides true real-time duplication.
Choosing a Software Backup Facility
A company needs to address several issues and ask specific questions when it is deciding upon a storage facility for its backup materials. The following provides a list of just some of the issues that need to be thought through before committing to a specific vendor for this service:
-
Can the media be accessed in the necessary timeframe? -
Is the facility closed on weekends and holidays, and does it only operate during specific hours of the day? -
Are the access control mechanisms tied to an alarm and/or the police station? -
Does the facility have the capability to protect the media from a variety of threats? -
What is the availability of a bonded transport service? -
Are there any geographical environmental hazards such as floods, earthquakes, tornadoes, and so on that might affect the facility? -
Does the facility have a fire detection and suppression system? -
Does the facility provide temperature and humidity monitoring and control? -
What type of physical, administrative, and logical access controls are used?
Documentation
The dreaded task of documentation may be the saving grace one day. It is an essential piece of business, and therefore an essential piece in disaster recovery and business continuity.
It is important to make one or more roles responsible for proper documentation.
Human Resources
The area of human resources is a critical component to any recovery and continuity process, and it needs to be fully thought out and integrated into the plan.
The BCP project expands job responsibilities, descriptions, hours, and even workplaces.
Multiple people should be trained in executing the duties and procedures spelled out in the plan so that one person can fill another’s shoes in an emergency. Clear documentation is vital in such cross-training. The HR department normally manages the availability of personnel for the continuity process.
Organizations should already have executive succession planning in place. This means that if someone in a senior executive position retires, leaves the company, or is killed, the organization has predetermined steps to carry out to protect the company.
Often, larger organizations also have a policy indicating that two or more of the senior staff cannot be exposed to a particular risk at the same time.
End-User Environment
The first issue pertaining to users is how they will be notified of the disaster and who will tell them where to go and when.
The BCP committee identified the most critical functions of the company during the analysis stage, and the employees who carry out those functions must be put back to work first.
The BCP team needs to identify user requirements, such as whether users can work on standalone PCs or need to be connected in a network to fulfill specific tasks.
The BCP team also needs to identify how current automated tasks can be carried out manually if that becomes ecessary.
Availability
High availability (HA) is a combination of technologies and processes that work together to ensure that some specific thing is always up and running
The SLA may also include one or more specifications for quality of service (QoS).
Redundancy is commonly built into the network at a routing protocol level.
If a technology has a failover capability, this means that if there is a failure that cannot be handled through normal means, then processing is “switched over” to a working system.
Fault tolerance is the capability of a technology to continue to operate as expected even if something unexpected takes place (a fault).
Fault tolerance means that when a fault happens, there’s a system in place to ensure services remain uninterrupted. Resiliency means that the system continues to function, albeit in a degraded fashion, when a fault is encountered.
reliability is the probability that a system performs the necessary function for a specified period under defined conditions.
Availability of each of the following items must be thought through and planned:
-
Facility (cold, warm, hot, redundant, rolling, reciprocal sites) -
Infrastructure (redundancy, fault tolerance) -
Storage (RAID, SAN, mirroring, disk shadowing, cloud) -
Server (clustering, load balancing) -
Data (tapes, backups, vaulting, online replication) -
Business processes -
People
原文始发于微信公众号(debugeeker):CISSP考试指南笔记:7.9 灾难恢复
- 左青龙
- 微信扫一扫
- 右白虎
- 微信扫一扫
评论