In October 2014, Databricks participated in the Sort Benchmark and set a new world record for sorting 100 terabytes (TB) of data, or 1 trillion 100-byte records. The team used Apache Spark on 207 EC2 virtual machines and sorted 100 TB of data in 23 minutes.
In comparison, the previous world record set by Hadoop MapReduce used 2100 machines in a private data center and took 72 minutes. This entry tied with a UCSD research team building high performance systems and we jointly set a new world record.
Read the full article here by Tux Machines
No comments:
Post a Comment