Amazon Web Services said it has released version 5.0.0 of its Elastic MapReduce (EMR) service, which Amazon said quickly processes very large amounts of data — useful when analyzing scientific data or processing clickstream logs — by distributing the computing load across multiple virtual servers managed by the Hadoop open-source framework.
The new version contains updates to eight of the 16 Hadoop ecosystem projects encompassed by EMR, and they have such great names that one is compelled to list them: HBase, Hive and HCatalog, Hue, Pig, Presto, Spark, Tez and Zeppelin.
The upgraded version of Spark includes an API for structured streaming and improved SQL support, AWS said. The new version of Tez becomes the default execution engine for the updated versions of SQL-like interface Hive and dataflow scripting language Pig, resulting in improved performance over Hadoop MapReduce, which it replaces.
A webinar is scheduled for Aug. 23 to discuss the new release.