Spark/Hadoop Cluster: Difference between revisions
No edit summary |
No edit summary |
||
Line 8: | Line 8: | ||
Once everything is up and running, these URL's should be available: | Once everything is up and running, these URL's should be available: | ||
[http://spark1.lab.bpopp.net:8080 http://spark1.lab.bpopp.net:8080] | * [http://spark1.lab.bpopp.net:8080 http://spark1.lab.bpopp.net:8080] | ||
[http://spark1.lab.bpopp.net:50070 http://spark1.lab.bpopp.net:50070] | * [http://spark1.lab.bpopp.net:50070 http://spark1.lab.bpopp.net:50070] | ||
= Passwordless SSH from Master = | = Passwordless SSH from Master = |
Revision as of 06:50, 29 January 2024
Getting Started
This assumes the spark/hadoop cluster were configured in a particular way. You can see the general configuration from the Foreman page, but in general, spark was configured in the /usr/local/spark directory and hadoop was installed to /usr/local/hadoop.
This is a good guide for general setup of a single-node cluster
Once everything is up and running, these URL's should be available:
Passwordless SSH from Master
To allow the spark master user to ssh to itself (for a local worker) and also the workers, you need ssh passwordless to be enabled. This can be done by logging into the spark user on the master server and doing:
ssh-keygen -t rsa -P ""
Once the key has been generated, it will be in /home/spark/.ssh/id_rsa (by default). Copy it to the authorized hosts file (to allow spark to ssh to itself):
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Or, for each worker, do something like:
ssh-copy-id -i ~/.ssh/id_rsa.pub spark@localhost ssh-copy-id -i ~/.ssh/id_rsa.pub spark@spark2.lab.bpopp.net
Starting Spark
su spark cd /usr/local/spark/sbin ./start-all.sh
Starting Hadoop
Note that the namenode needs to be formatted prior to startup or it will not work.
(assuming still spark user)
hdfs namenode -format cd /usr/local/hadoop/sbin ./start-all.sh