Oracle and Microsoft announced in June 2019 a cloud interoperability partnership which enables workloads across Microsoft Azure and Oracle Cloud. By creating a first joint multi-cloud solution, the software giants can each continue to provide the best of their services. At the same time customers do not need to decide which vendor they opt-out when moving their on-premises constructs.
Being myself an Oracle Database Administrator, this article aims to check what is the impact of distributing resources in multiple clouds, with databases remaining on Oracle Cloud Infrastructure.
It is not aim of the article to discuss the costs of resources on any of the clouds.
A quick overview of the architecture is shown below. On Microsoft side I built a VNet with two subnets. One VM with public IP was the bastion. On Oracle side, three subnets. One public subnet with bastion VM; one private with the application; another private with DBs. When creating the ExpressRoute, a new GatewaySubnet was automatically created on Azure.
Differences between Oracle Cloud and Azure
Azure portal is amazingly fast and comfortable to navigate. For instance, the last used elements are always shown and there is easy way to start and stop all instances in one go. There is no concept of private subnet in Azure. The public IP is an element per se that you can assign to a compute node at any moment. Spot VMs, which use unused resources at very discounted price, are excellent to perform tests. These resources can be stopped by Azure at any time with very short warning. The troubleshooting tools like ability to see “effective routes” or “effective policies” of a VM, help a lot.
On OCI, the VM Image available with Oracle Database comes directly with the latest RU and one database created. This is very useful for quick tests. The VCN wizard makes the basic creation of a virtual network very easy. Boot disks of a VM have a very good performance.
The lowest bandwidth on FastConnect configuration is 1Gbps. On Azure side, the ExpressRoute starts at 50 Mbps! It is quite incomprehensible this difference of standards when promoting a consolidated solution. The setup was initially done with this configuration. Later tests were also performed using a 100 Mbps ExpressRoute, but it did not change de results.
On Azure, VMs of type D4as_v4 were used. On Oracle, shape VM.Standard2.2 was used. The table below compares the characteristics of these VMs.
|Azure D4as_v4||OCI VM.Standard2.2|
|CPU||4 vCPUs||2 oCPU|
|Memory||16 GB||30 GB|
|NIC||2000 Mbps||2 Gbps|
|OS Disk||120 IOPS / 25 MB/s||15000 IOPS / 150 MB/s|
|Extra disk (paid extra)||5000 IOPS / 200 MB/s|
Oracle OS disk is 8x bigger (250 vs 30 GB) and much faster. As the OS disk at Azure was having impact on the results, I added there an extra 1TB disk, from where tests were run (and datafiles placed for DB tests).
The most recent Azure VM image included Oracle Database software version 19.3. I did patch with 19.7 RU for the tests. Above that I manually create a database, using default values, except filesystem_options (setall) and processes (500). Memory was set to exact same values as on Oracle VMs (4.5G SGA + 1.7G PGA).
OCI image comes already with latest RU, a database created which seems parameter-wise optimized for the Oracle Cloud environment.
Ping tests were performed with 10 probe counts using the IP of destination VM. The results of several runs of tests were consistent. The values presented below are the ones having the smallest standard deviation.
Application-DB assessments were done using Swingbench 2.6 and its jdbc Order Entry test. This test is the one which does a greater number of calls through the network. To make sure the impact was not on CPU, disks or network bandwidth, the comparison is done between the results of tests with 4 and 8 clients. When using more clients, the waits started to be either on local disk or on the database side. Each of the tests run for couple of minutes and several tests were done on each of the configurations.
The first tests from Azure showed a very bad performance. This was seen to be related to the performance of the OS disk. I decided to add a second disk with higher performance and install Swingbench on the new partition.
The tests against Autonomous Database (ATP) were done using 1 and 2 oCPUs, but this did not change the results. The service used was <db>_tp.
Only a summary of the main results is shown.
The latency between Oracle VMs is two times smaller than between Azure VMs.
Between the clouds, the latency is little big bigger than 2.5 milliseconds. This is 4 to 8 times more than intra-cloud pings.
The latency plays a big role on lights Swingbench tests. We can see that also here, Inter-cloud architecture is at least 2.5 times slower than running on the same cloud. The results with Azure VMs are using the extra faster disk, except when mentioned.
|4 threads||8 threads|
|OCI VM – OCI VM||440||603|
|Azure VM – Azure VM (1)||24||38|
|Azure VM – Azure VM||540||949|
|OCI VM – OCI ATP||263||378|
|Azure VM – OCI ATP||95||191|
Note: (1) using OS disks
It is with great satisfaction that I see Oracle and Microsoft cooperating.
There is clear a trade-off to consider when going for a multi-cloud solution. While the latency per se is very small, at around 2.5 ms, this value is 4 to 8 times larger than intra-cloud latency.
The swingbench tests also show the performance difference when we try to abstract from all other elements. But it is not possible to skip the fact that on each of the clouds VMs have different specifications as well as the resource management limits, mainly seen on the SSD performance, have an impact on the results.
I was surprised with the results of the ATP tests, where I expected better performance. We should not forget however that ATP brings many other advantages that are a big plus for several applications.
In all the cases, I still believe that for most of the applications the current latency between Oracle Cloud and Azure is good enough and for well behaving applications (forget fetch one row at the time!) it should not be a problem.