Don’t Be Fooled by Misleading Data Egress Announcements
Data Movement charges in the Public Cloud and how recent announcements didn’t really make much impact on what customers are charged by different Cloud Service Providers (CSPs) when they move their data around in the Cloud. Oracle Cloud has always taken a much more flexible approach when it comes to allowing customers to move their data without incurring excessive data transfer costs.
In today’s Part 2, we take a closer look at the Disaster Recovery (DR) scenario we mentioned previously, i.e. the impact of Data Movement charges on the cost of maintaining a second copy of a critical database for DR purposes. This is an area of particular interest from a database perspective, and one where data movement charges frequently make a significant, and often unexpected, impact. Maintaining additional copies of business-critical databases in close synchronization across multiple data centers has long been a common requirement when any potential downtime must be minimized.
When such databases are moved to the Cloud, this typically translates to maintaining copies across multiple Availability Domains, Regions, or even across multiple Clouds. To assess the economic benefit of moving to the Cloud it’s critical to consider the resulting data movement costs of keeping these copies in sync within the Cloud(s) to which you’ve chosen to migrate. And remember – the more frequently you update such a database, the more frequently those updates must be propagated to the other copies, and the higher your resulting data movement costs may be. As we’ll see, Oracle Cloud Infrastructure (OCI) offers compelling advantages to help minimize your data transfer costs, and therefore your cost for DR.
Disaster Recovery: Just How Paranoid Do I Need to Be… And What’s It Going to Cost Me?
When it comes to DR planning, every organization needs to consider the business impact of downtime and/or data loss and weigh that impact against the costs required to reach your goals. Two common objectives in DR terms are to minimize Recovery Point Objective (RPO) and Recovery Time Objective (RTO).
RPO is an expression of how far behind you can afford to be after a disaster results in the need to put your DR plan into operation. If you adopt an RPO of 1 hour you’re saying it’s ok for your DR environment to be as much as an hour behind where your Production environment was at the time of failure. In effect, you are willing to lose the last hour of changes to your database. An RPO of 0 means you need it to be up to date all the time, so you don’t lose any data if you need to failover.
RTO is an expression of how long it would take you to switch over to your DR copy and restore normal operations in the event of a failure. The ideal solution would be to minimize both RPO and RTO for your truly business-critical systems, but doing so comes with associated costs. This means that in practice most businesses may have different RPO and RTO requirements for different systems or databases, with the most critical ones having objectives near zero. The lower you’re able to keep the cost of maintaining your databases in close synchronization, the more you’re able to afford to minimize the impact of potential downtime (RTO) and/or lost data (RPO) to your business when/if disaster does strike. In Cloud, minimizing the Data Transfer costs you incur allows you to better protect your business.
Is It Enough to Have Redundancy Across Availability Zones / Domains?
Cloud Regions often consist of more than one datacenter located in close proximity. Each of these datacenters is commonly referred to as an Availability Zone (AZ) or Availability Domain (AD). In concert, they provide resiliency in the event of an interruption of service that impacts an entire datacenter.
One approach to database DR is to keep a copy of the database in multiple AZ/ADs within the same Region. This may be the least expensive option as data transfer between AZ/ADs in the same Region is not charged at all by some CSPs and is generally charged at lower rates by those who do. Oracle historically has not charged for data movement between Availability Domains within the same Region. On May 21, 2024, Microsoft announced that they would begin following a similar policy. GCP might or might not charge for data transfer between zones in the same region, depending on the services and networking utilized. While there are some AWS services for which there is no data transfer charge between AZs to keep copies of a database in sync (for example, RDS Multi-AZ Instances or Multi-AZ Clusters), there are many other examples where AWS data transfer charges do apply.
One thing that is common across all clouds, however, is that maintaining a copy of your database in another AZ/AD within the same Region provides no protection against a Region-level failure or service interruption. As we noted in Part 1 of this blog, while such occurrences are rare, they do periodically happen. Therefore, in most cases where the database in question is truly business-critical, you’ll want to go at least one step further and maintain a DR copy in another Region.
What about Region to Region?
In addition to providing redundancy in the event of a failure or service interruption at the Region level, maintaining such an additional copy in another geography can also have the benefit of reducing latency for users who are closer to that physical location, and are therefore able to access that copy more efficiently. This presumes that the copy in the additional Region is available for, at a minimum, read access, which may be dependent on the mechanism used for data replication.
It’s not unusual to maintain copies in more than one additional region because of this benefit. For example, a primary database that resides in a cloud region in North America might be replicated to secondary regions in Europe and APAC. This yields reduced latency for users on a global basis vs. everyone accessing the primary copy, and also provides the benefit of protecting against multiple simultaneous region failures – e.g. if 2 regions become unavailable at the same time, it may still be possible to failover such that all users access the 3rd copy until the issues have been resolved.
Costs are typically based on egress from the source Region, but since rates often vary from one region to another, it’s important when considering the cost of data transfer for DR to factor in what happens to those costs if/when you failover and your secondary region becomes your primary for some period of time. You’d start being charged transfer at the rates applicable to that region, which could be much higher as illustrated in Table 2. This can make a big difference, as reflected in Figure 2 as compared to Figure 1.
Figure 1 illustrates the monthly data transfer cost across regions from the least expensive source region. Figure 2 shows the same costs, but from the most expensive region. Note the difference between the two charts, which reflects the range of rates as shown in Table 2. One thing that remains consistent across all these scenarios is the much lower cost of data transfer with OCI.
What If a Failure Impacts an Entire Cloud (or your ability to use that Cloud)?
While events that impact the ability to utilize an entire Cloud are rare, they have happened and tend to attract a lot of attention when they do. Earlier this year an apparent misconfiguration resulted in Google Cloud inadvertently removing the subscription of an Australian financial services customer. The customer in question “had duplication in two geographies as a protection against outages and loss.” However, when the deletion of the subscription occurred, “…it caused deletion across both of these geographies.” Even more recently, on Jul 30, 2024, both AWS and Azure experienced service interruptions with widespread impact.
To defend against this type of scenario, you’d need a copy of your data maintained entirely outside the cloud where the primary database resides. The article linked above regarding the Google incident states that luckily, the customer had backups at another cloud provider from which they were ultimately able to restore, though after a multi-day outage for their end users. Backup and Restore is one of the most basic forms of DR but, as this case illustrates, potentially has a longer than ideal RTO. If you want faster recovery for DR purposes, you’d need to keep a copy of your data available in closer synchronization. Whether that copy is kept in your own on-premises datacenter or in another cloud, every update to the primary database still needs to be propagated to the DR copy. Either way, that replication constitutes Data Egress from the primary cloud.
Cloud to On-premises
If you want to keep the copy on-premises you could use a VPN to transfer data from the source cloud over the Internet, but doing so will incur data egress charges as shown in Table 3.
In this scenario, all the cloud hyperscalers provide a monthly quota of data that’s allowed to be transferred out from a Region for free, before costs begin to accrue, as shown in the table above. Two things that should immediately stand out from this table are the significantly higher amount of data Oracle allows to be transferred before any charges apply, and how much lower the rates are once they do begin.
Each hyperscaler also offers the ability to put in place a dedicated circuit between customer premises and their cloud. These come with different names, such as FastConnect in OCI, ExpressRoute in Azure, Direct Connect in AWS, and Dedicated Interconnect in GCP. They’re typically priced with an hourly port charge based on speed of the connection, and all the other hyperscalers except OCI also charge data transfer fees.
This means that if you wanted to keep your primary database in Cloud with a secondary on-prem copy for DR, with any of the hyperscalers other than OCI you’d pay data egress charges for each update. These charges can accumulate quickly for an active database with frequent updates. If you wanted to do the reverse and have a primary database on-prem with a DR copy in Cloud the impact would be less under normal operation, since none of the hyperscalers charge for data ingress. However, don’t forget to factor in what happens when/if you ever needed to failover and temporarily promote your DR copy to primary – while you operate in this mode any updates that need to be propagated back down to on-prem would be subject to potential data egress charges unless the cloud in question is OCI.
Multicloud
The final scenario we’ll examine is Multicloud, i.e., keeping a primary copy of your business-critical database in one cloud, with a DR copy in another such that you’re protected against any service interruptions, accidents, or failures that impact the operations of a global hyperscaler. This represents the current ideal in cloud-based DR, but the associated data transfer costs depend very much on the combination of clouds you select.
Last year Oracle and Microsoft established a Multicloud partnership that enables not only interconnecting OCI and Azure, but also introduced a new service, Oracle Database@Azure, which actually creates an OCI child region containing Exadata infrastructure collocated within certain Azure datacenters. In June of this year, Oracle and Google announced a similar partnership, enabling interconnect between OCI and Google Cloud, as well as the forthcoming Oracle Database@Google Cloud. Both partnerships include provisions that either eliminate or waive data transfer costs for movement of data between the respective partner clouds.
With other combinations of hyperscaler clouds data egress costs follow the familiar model of charging for data egress that exceeds a monthly free allowance. Rates for dedicated circuits between the clouds vary, but if no dedicated circuit is put in place between the clouds the rates reflected in Table 3 above would be applicable based on the source region from which data egress occurs. As shown in Figures 3 and 4, these charges can add up quickly!
As with our previous example of data transfer between regions, note the cost difference between the two charts. Also note that in both charts you once again don’t see the red bar representing OCI in the 5TB Data Transfer scenario. That’s because until you pass 10TB monthly with OCI, there are no charges.
Resiliency vs. Cost
For most organizations the choice of DR plan(s) is driven by the desire to balance the risk and anticipated business impact an outage would have against the cost of being able to overcome such an outage quickly and with minimal data loss. An organization may have different tiers of criticality for their data, and accordingly different plans to preserve their ability to access that data in the event of an outage.
If you’re able to spend less to implement DR, you can either save money outright or achieve greater resiliency for the same budget, whether that means achieving near-zero RPO and RTO for your most critical data or achieving a less ambitious goal, but for a broader subset of your data. In the Cloud, one of the most impactful ways to reduce your cost to implement DR is to minimize the data transfer costs required to keep copies of your databases in sync.
As we’ve seen in this post, with much higher free allowances before data transfer charges begin, much lower rates when they do apply, and partnerships with other Cloud providers that eliminate data transfer charges between clouds, OCI provides compelling cost advantages that enable you to get much greater value for your DR spend.