Spark on Windows - Installation Steps
=====================================
Step 1: Download and install JDK
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
Note:
1. Download the appropriate version based on the Operating System installed on your machine
2. Install the JDK, by following the wizard
Step 2: Download Spark
http://spark.apache.org/downloads.html
Choose the following options for the download
1. Choose a Spark release
2. Pre-built for Hadoop 2.4 or later
3. Direct Download
4. Click on the spark-1.4.1-bin-hadoop2.4.tgz
Note:
This is a tar file, to unzip this file, you would need a utility like winrar.
Download and install winrar
http://www.rarlab.com/download.htm
Step 3: Extract the tar file
Step 4: Copy the contents of the tar files into
C:\spark\ folder
Step 5: Update the log4j.properties to set the messages to WARN
C:\spark\conf\log4j.properties.template
Set the property - log4j.rootCategory=WARN
Save the file as log4j.properties
Step 6: Download winutils.exe from the course resources folder
1. Create a folder C:\winutils\bin
2. Copy the winutils.exe file into C:\bin\winutils.exe
Step 7: Set the environment variables (Inform Windows where is Spark)
SPARK_HOME = C:\spark
JAVA_HOME = C:\Program Files\Java\jdk1.8.0_71
HADOOP_HOME = C:\winutils
PATH = %SPARK_HOME%\bin;%JAVA_HOME%\bin (append to the existing path)
In Windows
==========
1) Right click on Start Menu
2) Click on Control Panel
3) Click on System and Security
4) Click on System
5) Click on Advanced System Settings
6) Click on Environment Variable button
1. Add New Environment Variable
2. Variable Name: SPARK_HOME, Variable Value: C:\spark, Save the variable
3. Variable Name: JAVA_HOME, Variable Value: C:\Program Files\Java\jdk1.8.0_71
4. Variable Name: PATH, Variable Value (append to the end of the string) %SPARK_HOME%\bin;%JAVA_HOME%\bin
5. Variable Name: HADOOP_HOME, Variable Value: C:\winutils
In Mac
======
http://hathaway.cc/post/69201163472/how-to-edit-your-path-environment-variables-on-mac
http://osxdaily.com/2014/08/14/add-new-path-to-path-command-line/
Step 8: Open a terminal and start the spark shell
1. Command Prompt App
2. cd C:\spark\bin
3. spark-shell --master "local [4]" (or)
4. spark-shell
=====================================
Step 1: Download and install JDK
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
Note:
1. Download the appropriate version based on the Operating System installed on your machine
2. Install the JDK, by following the wizard
Step 2: Download Spark
http://spark.apache.org/downloads.html
Choose the following options for the download
1. Choose a Spark release
2. Pre-built for Hadoop 2.4 or later
3. Direct Download
4. Click on the spark-1.4.1-bin-hadoop2.4.tgz
Note:
This is a tar file, to unzip this file, you would need a utility like winrar.
Download and install winrar
http://www.rarlab.com/download.htm
Step 3: Extract the tar file
Step 4: Copy the contents of the tar files into
C:\spark\ folder
Step 5: Update the log4j.properties to set the messages to WARN
C:\spark\conf\log4j.properties.template
Set the property - log4j.rootCategory=WARN
Save the file as log4j.properties
Step 6: Download winutils.exe from the course resources folder
1. Create a folder C:\winutils\bin
2. Copy the winutils.exe file into C:\bin\winutils.exe
Step 7: Set the environment variables (Inform Windows where is Spark)
SPARK_HOME = C:\spark
JAVA_HOME = C:\Program Files\Java\jdk1.8.0_71
HADOOP_HOME = C:\winutils
PATH = %SPARK_HOME%\bin;%JAVA_HOME%\bin (append to the existing path)
In Windows
==========
1) Right click on Start Menu
2) Click on Control Panel
3) Click on System and Security
4) Click on System
5) Click on Advanced System Settings
6) Click on Environment Variable button
1. Add New Environment Variable
2. Variable Name: SPARK_HOME, Variable Value: C:\spark, Save the variable
3. Variable Name: JAVA_HOME, Variable Value: C:\Program Files\Java\jdk1.8.0_71
4. Variable Name: PATH, Variable Value (append to the end of the string) %SPARK_HOME%\bin;%JAVA_HOME%\bin
5. Variable Name: HADOOP_HOME, Variable Value: C:\winutils
In Mac
======
http://hathaway.cc/post/69201163472/how-to-edit-your-path-environment-variables-on-mac
http://osxdaily.com/2014/08/14/add-new-path-to-path-command-line/
Step 8: Open a terminal and start the spark shell
1. Command Prompt App
2. cd C:\spark\bin
3. spark-shell --master "local [4]" (or)
4. spark-shell
Thanks for sharing valuable information
ReplyDeleteGood work
ReplyDeleteThank you
DeleteThis comment has been removed by the author.
ReplyDelete