Monday 19 June 2017

Spark on windows & Mac installation

Spark on Windows - Installation Steps
=====================================

Step 1: Download and  install JDK

http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

Note:

1. Download the appropriate version based on the Operating System installed on your machine
2. Install the JDK, by following the wizard

Step 2: Download Spark

http://spark.apache.org/downloads.html

Choose the following options for the download
1. Choose a Spark release
2. Pre-built for Hadoop 2.4 or later
3. Direct Download
4. Click on the spark-1.4.1-bin-hadoop2.4.tgz

Note:

This is a tar file, to unzip this file, you would need a utility like winrar.

Download and install winrar

http://www.rarlab.com/download.htm

Step 3: Extract the tar file

Step 4: Copy the contents of the tar files into

C:\spark\ folder

Step 5: Update the log4j.properties to set the messages to WARN

C:\spark\conf\log4j.properties.template

Set the property - log4j.rootCategory=WARN

Save the file as log4j.properties

Step 6: Download winutils.exe from the course resources folder

1. Create a folder C:\winutils\bin
2. Copy the winutils.exe file into C:\bin\winutils.exe

Step 7: Set the environment variables (Inform Windows where is Spark)


SPARK_HOME =  C:\spark
JAVA_HOME = C:\Program Files\Java\jdk1.8.0_71
HADOOP_HOME = C:\winutils
PATH = %SPARK_HOME%\bin;%JAVA_HOME%\bin (append to the existing path)

In Windows
==========

1) Right click on Start Menu
2) Click on Control Panel
3) Click on System and Security
4) Click on System
5) Click on Advanced System Settings
6) Click on Environment Variable button
1. Add New Environment Variable
2. Variable Name: SPARK_HOME, Variable Value: C:\spark, Save the variable
3. Variable Name: JAVA_HOME, Variable Value: C:\Program Files\Java\jdk1.8.0_71
4. Variable Name: PATH, Variable Value (append to the end of the string) %SPARK_HOME%\bin;%JAVA_HOME%\bin
5. Variable Name: HADOOP_HOME, Variable Value: C:\winutils

In Mac
======

http://hathaway.cc/post/69201163472/how-to-edit-your-path-environment-variables-on-mac
http://osxdaily.com/2014/08/14/add-new-path-to-path-command-line/


Step 8: Open a terminal and start the spark shell

1. Command Prompt App
2. cd C:\spark\bin
3. spark-shell --master "local [4]" (or)
        4. spark-shell





4 comments: