Tuesday 25 November 2014

Dealing with RDDs on Spark

In have uploaded on GitHub a sample code of Apache Spark most common Operations and Actions over RDDs. It also covers examples such as reading and writing to files (text, sequence), a word count function and a simple PageRank implementation.

Download Repository

This code has been written in Java and compiled with Maven. I have been following Holden Karau's book: "Learning Spark. Lightning-Fast Big Data Analytics" which gives very useful advice about Spark engine.