Monday, 19 June 2017

Spark on windows & Mac installation

Spark on Windows - Installation Steps

Step 1: Download and  install JDK


1. Download the appropriate version based on the Operating System installed on your machine
2. Install the JDK, by following the wizard

Step 2: Download Spark

Choose the following options for the download
1. Choose a Spark release
2. Pre-built for Hadoop 2.4 or later
3. Direct Download
4. Click on the spark-1.4.1-bin-hadoop2.4.tgz


This is a tar file, to unzip this file, you would need a utility like winrar.

Download and install winrar

Step 3: Extract the tar file

Step 4: Copy the contents of the tar files into

C:\spark\ folder

Step 5: Update the to set the messages to WARN


Set the property - log4j.rootCategory=WARN

Save the file as

Step 6: Download winutils.exe from the course resources folder

1. Create a folder C:\winutils\bin
2. Copy the winutils.exe file into C:\bin\winutils.exe

Step 7: Set the environment variables (Inform Windows where is Spark)

SPARK_HOME =  C:\spark
JAVA_HOME = C:\Program Files\Java\jdk1.8.0_71
HADOOP_HOME = C:\winutils
PATH = %SPARK_HOME%\bin;%JAVA_HOME%\bin (append to the existing path)

In Windows

1) Right click on Start Menu
2) Click on Control Panel
3) Click on System and Security
4) Click on System
5) Click on Advanced System Settings
6) Click on Environment Variable button
1. Add New Environment Variable
2. Variable Name: SPARK_HOME, Variable Value: C:\spark, Save the variable
3. Variable Name: JAVA_HOME, Variable Value: C:\Program Files\Java\jdk1.8.0_71
4. Variable Name: PATH, Variable Value (append to the end of the string) %SPARK_HOME%\bin;%JAVA_HOME%\bin
5. Variable Name: HADOOP_HOME, Variable Value: C:\winutils

In Mac

Step 8: Open a terminal and start the spark shell

1. Command Prompt App
2. cd C:\spark\bin
3. spark-shell --master "local [4]" (or)
        4. spark-shell