Try Apache Spark's shell using Docker
Dec 18, 2014

Ever wanted to try out Apache Spark without actually having to install anything ? Well if you've got Docker, I've got a christmas present for you, a Docker image you can pull to try and run Spark commands in the Spark shell REPL. The image has been pushed to the Docker Hub here and can be easily pulled using Docker.
So exactly what is this image, and how can I use it ?
Well, all you need is to execute these few commands : [code language="bash"] > docker pull ogirardot/spark-docker-shell [/code]
I'll try to keep this image up-to-date with future releases of Spark, so if you want to test against a specific version, all you have to do is pull (or directly run) the image with the corresponding tag like that : [code language="bash"] > docker pull ogirardot/spark-docker-shell:1.1.1 [/code]
And then after Docker will have downloaded the full image, using the run command you will have access to a stand-alone spark-shellthat will allow you to try and learn Spark's API in a sandboxed environment, here's what a correct launch looks like :
[code language="scala"] > docker run -t -i ogirardot/spark-docker-shell Spark assembly has been built with Hive, including Datanucleus jars on classpath Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/12/11 20:33:14 INFO SecurityManager: Changing view acls to: root 14/12/11 20:33:14 INFO SecurityManager: Changing modify acls to: root 14/12/11 20:33:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 14/12/11 20:33:14 INFO HttpServer: Starting HTTP Server 14/12/11 20:33:14 INFO Utils: Successfully started service 'HTTP class server' on port 50535. Welcome to ____ __ / / ___ / / \ \/ _ \/ _ `/ __/ '/ // .__/\,// //\\ version 1.1.1 // Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_65) Type in expressions to have them evaluated. Type
Once you reach this scala prompt, you're practically done, and you can use your available SparkContext (variable sc )with simple examples : [code language="scala"] scala> sc.parallelize(1 until 1000).map(_ * 2).filter(_ < 10 ).reduce(_ + _) res0: Int = 20 [/code]
If you've got this right, you're all set ! Plus, as this is a Scala prompt, using <tab> you'll have access to all the auto-completion magic a strong type-system can bring you.
So enjoy, take your time and be bold.