Thursday, 15 January 2015

World record set for 100 TB sort by open source and public cloud team


In October 2014, Databricks participated in the Sort Benchmark and set a new world record for sorting 100 terabytes (TB) of data, or 1 trillion 100-byte records. The team used Apache Spark on 207 EC2 virtual machines and sorted 100 TB of data in 23 minutes.


In comparison, the previous world record set by Hadoop MapReduce used 2100 machines in a private data center and took 72 minutes. This entry tied with a UCSD research team building high performance systems and we jointly set a new world record.


Read more


read more






Read the full article here by Tux Machines

No comments: