Bringing Valuable Information to DevOps Professionals

DevOps Journal

Subscribe to DevOps Journal: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get DevOps Journal: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


DevOpsJournal Authors: Mehdi Daoudi, Elizabeth White, Stackify Blog, Liz McMillan, Dalibor Siroky

Related Topics: Cloud Computing, Cloud Hosting & Service Providers Journal, Cloud Backup and Recovery Journal, Amazon Cloud Journal, Big Data on Ulitzer, DevOps Journal

Amazon Cloud Journal: Article

XFS vs EXT4 – Comparing MongoDB Performance on AWS EC2

Which Linux file system performs best performance for your applications

AWS is an extremely popular and trusted cloud platform for managing MongoDB deployments, but the question of XFS vs EXT4 has many developers wondering which Linux file system will give them the best performance for their applications. MongoDB's official guide on deploying to production recommends using the XFS file system on Linux, especially when deploying the WiredTiger storage engine.  The recommendation, however, doesn't tell us why we should expect a performance boost or what kind of performance gains we'll experience. We decided to get to the bottom of it by quantitatively investigating MongoDB performance on XFS so you can compare whether EXT4 is a better choice for your AWS EC2 instances.

XFS File System
XFS
is a highly scalable, high-performance 64-bit journaling file system developed at SGI in 1993 and ported to Linux in 2002. It supports highly parallel I/O and filesystem sizes up to 9 Exabytes, and journals only the file system metadata, not the user data.  Some key performance enhancing features of XFS are:

    • Parallelized access via allocation groups ensures multiple threads can perform I/O simultaneously on the same volume.
    • Extent based allocation reduces fragmentation, metadata size, and improves I/O performance by allowing fewer and larger I/O operations.
    • Delayed allocation improves data contiguity and performance. Fragmentation is reduced by combining writes and allocating extents in large chunks, and files written randomly (such as those that are memory mapped) can be allocated contiguously

There are many more XFS features to explore, and you can learn more on XFS's website and the XFS User Guide.

Running Performance Tests on MongoDB
As you may have learned in our previous posts, we've been using YCSB to benchmark MongoDB performance, including a detailed comparison of MMAP backed MongoDB performance across various cloud providers. We decided to use the same workload of YCSB that we have been using earlier: Workload A (Update heavy: 50% reads + 50% updates). The insert phase of the workload measures the performance of 100% write workloads, while the load phase will measure the performance against the actual workload (50/50% read/update). Our tests were run on the sync MongoDB driver, and the Linux distro was Amazon Linux (4.4.44-39.55.amzn1.x86_64). We picked up MongoDB version 3.2.10 running WiredTiger for our tests since WT is where better gains were expected and ran the tests on 2 different hardware rigs:

  • High-speed disks: AWS EC2 c3.large instance where MongoDB was using SSD disks in RAID 0 configuration for storage (maps to ScaleGrid cluster size HighPerfLarge).
  • Medium speed disks: AWS EC2 m3.medium instance where MongoDB was using EBS (Elastic Block Store) IOPS provisioned disk set at 300 IOPS (maps to ScaleGrid cluster size Medium).

Note: Any kind of performance testing in virtualized environments should be taken with a grain of salt. Our aim here isn't to benchmark performance numbers on these environments, but to provide some quantitative measurement of performance differences between EXT4 and XFS in the same virtualized environment.

High-Speed SSD Disk
We ran the following test on our high-performance rig:

  1. Inserted 6 million records at various server loads (by varying the number of YCSB client threads).
  2. Ran workload at operation count of 10 million records at various server loads.

SSD Disk Performance Results
Throughput/latency characteristics for 6M record insertion on the high-performance configuration:

MongoDB Performance on AWS EC2 - XFS vs EXT4: Insert 6M High Perf - ScaleGrid

Throughput/latency characteristics for 10M write/update operations on the high-performance configuration:

MongoDB Performance on AWS EC2 - XFS vs EXT4: Workload 10M High Perf - ScaleGrid

SSD Disk Observations

  • XFS is spectacularly fast during both the insertion phase and the workload execution. On lower thread counts, it's as much as 50% faster than EXT4. As the load increased, both of the filesystems were limited by the throughput of the underlying hardware, but XFS still maintained its lead.
  • Latency for both XFS and EXT4 were comparable in both of the runs. Note that all numbers are in micro seconds.

Slower EBS Provisioned IOPS Disk (300 IOPS)
The following test was executed on our medium sized performance rig:

  1. Inserted 3 million records at various server loads (by varying the number of YCSB client threads).
  2. Ran workload at operation count of 5 million records at various server loads.

Given our experience on the high-end configuration, we expected XFS to have a decent lead in this rig as well.

IOPS Disk Performance Results

Throughput/latency characteristics for 3M record insertion on the medium configuration:


MongoDB Performance on AWS EC2 - XFS vs EXT4: Insert 3M Medium - ScaleGrid

Throughput/latency characteristics for 5M write/update operations on the medium configuration:

MongoDB Performance on AWS EC2 - XFS vs EXT4: Workload 5M Medium - ScaleGrid

IOPS Disk Observations

  • XFS is comparable, though slightly behind EXT4 on the medium sized configuration. It seems that at this level of system resources, the performance optimizations of XFS aren't really making a difference. This is an important observation if you're considering deploying XFS on smaller instances in the hope of improved performance.

XFS vs EXT4 on AWS EC2
In performance terms, XFS is indeed a force multiplier when paired with high speed disks that it can take real advantage from. For low to mid-end systems, it doesn't seem to be able to do much to improve your performance.

More Stories By Vaibhaw Pandey

Vaibhaw Pandey is a Software Developer with interests in Distributed Systems, Databases and Web-scale technologies.