沈阳网站建设哪里好,网站建设与管理是什么意思,教务处网站建设方案,天元建设集团有限公司企查查参考#xff1a;http://ihoge.cn/2018/anacondaPyspark.html
前言
首次安装的环境搭配是这样的#xff1a; jdk8 hadoop2.6.5 spark2.1 scala2.12.4 Anaconda3-5.1.0 一连串的报错让人惊喜无限#xff0c;尽管反复调整配置始终无法解决。 坑了一整天后最后最终发现…参考http://ihoge.cn/2018/anacondaPyspark.html
前言
首次安装的环境搭配是这样的 jdk8 hadoop2.6.5 spark2.1 scala2.12.4 Anaconda3-5.1.0 一连串的报错让人惊喜无限尽管反复调整配置始终无法解决。 坑了一整天后最后最终发现是版本不兼容再次提醒自己一定要重视各组件版本的问题。这里最主要的是spark和Anaconda版本的兼容问题为了兼容python3尽量用新版的spark。最终解决方案的版本搭配如下 jdk8 hadoop2.7.5 spark2.3.0 scala2.11.12 Anaconda3-5.1.0
一、VM安装Ubuntu16.04虚拟机
sudo apt-get update
sudo apt-get install vim
sudo apt-get install openssh-server# 配置ssh免密登陆
ssh localhost
ssh-keygen -t rsa //一路回车
cat id_rsa.pub authorized_keyssudo vi /etc/hosts //添加各个节点ip
192.168.221.132 master
192.168.221.133 slave1
192.168.221.134 slave2# sudo vi /etc/hostname
master
二、配置profile环境变量
#Java
export JAVA_HOME/home/hadoop/jdk1.8.0_161
export PATH$PATH:$JAVA_HOME/bin:$JAVA_HOME/jar
#Hadoop
export HADOOP_HOME/home/hadoop/hadoop
export PATH$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
#Scala
export SCALA_HOME/home/hadoop/scala
export PATH$PATH:$SCALA_HOME/bin
#Anaconda
export PATH/home/hadoop/anaconda3/bin:$PATH
export PYSPARK_DRIVER_PYTHON/home/hadoop/anaconda3/bin/jupyter
export PYSPARK_DRIVER_PYTHON_OPTSnotebook
export PYSPARK_PYTHON/home/hadoop/anaconda3/bin/python
#Spark
export SPARK_HOME/home/hadoop/spark
export PATH$PATH:$SPARK_HOME/bin
三、hadoop 六个配置文件
# hadoop-env.sh
export JAVA_HOME/home/hadoop/hadoop/jdk1.8.0_161# core-site.xml
configurationpropertynamehadoop.tmp.dir/namevaluefile:/home/hadoop/hadoop/tmp/valuedescriptionAbase for other temporary directories./description/propertypropertynamefs.defaultFS/namevaluehdfs://master:9000/value/property
/configuration# hdfs-site.xml
configurationpropertynamedfs.namenode.secondary.http-address/namevaluemaster:50090/value/propertypropertynamedfs.replication/namevalue3/value/propertypropertynamedfs.namenode.name.dir/namevaluefile:/home/hadoop/hadoop/tmp/dfs/name/value/propertypropertynamedfs.datanode.data.dir/namevaluefile:/home/hadoop/hadoop/tmp/dfs/data/value/property
/configuration# mapred-site.xml
configurationpropertynamemapreduce.framework.name/namevalueyarn/value/propertypropertynamemapreduce.jobhistory.address/namevaluemaster:10020/value/propertypropertynamemapreduce.jobhistory.webapp.address/namevaluemaster:19888/value/property
/configuration# yarn-site.xml
configurationpropertynameyarn.resourcemanager.hostname/namevaluemaster/value/propertypropertynameyarn.nodemanager.aux-services/namevaluemapreduce_shuffle/value/property
/configuration# slaves
slave1
slave2
三、spark两个配置文件
# spark-env.sh
#java
export JAVA_HOME/home/hadoop/jdk1.8.0_161
#scala
export SCALA_HOME/home/hadoop/scala
#hadoop
export HADOOP_HOME/home/hadoop/hadoop
export HADOOP_CONF_DIR/home/hadoop/hadoop/etc/hadoop
export YARN_CONF_DIR/home/hadoop/hadoop/etc/hadoop
#spark
export SPARK_HOME/home/hadoop/spark
export SPARK_LOCAL_DIRS/home/hadoop/spark
export SPARK_DIST_CLASSPATH$(/home/hadoop/hadoop/bin/hadoop classpath)
export SPARK_WORKER_CORES1
export SPARK_WORKER_INSTANCES1
export SPARK_WORKER_MEMORY1g
export SPARK_MASTER_IPmaster
export SPARK_LIBRARY_PATH.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native# slaves
slave1
slave2
四、解压缩文件
scp jdk-8u161-linux-x64.tar hadoopmaster:~
scp Anaconda3-5.1.0-Linux-x86_64.sh hadoopmaster:~
scp -r hadoop/ hadoopmaster:~
scp -r scala/ hadoopmaster:~
scp -r spark/ hadoopmaster:~tar -xvf jdk-8u161-linux-x64.tar -C ./source ~/.profile
分别查看jdk版本、hadoop版本、scala版本# 集群模式启动spark查看jps
spark-shell --master spark://master:7077 --executor-memory 512m --total-executor-cores 2
五、安装Anaconda
bash Anaconda3-5.1.0-Linux-x86_64.sh -b# 创建配置jupyter_notebook_config.py
jupyter notebook --generate-config
vim ~/.jupyter/jupyter_notebook_config.pyc get_config()
c.IPKernelApp.pylab inline
c.NotebookApp.ip *
c.NotebookApp.open.browser False
c.NotebookApp.password u
c.NotebookApp.port 8888六、关机后克隆出两个新节点并配置相关内容
sudo vi /etc/hostnamesudo vi /etc/hosts
七、远程测试pyspark集群
# 服务器端启动集群
start-all.sh
spark/sbin/start-all.sh# hadoop和spark的进程都显示正常后开始启动pyspark
1、local模式运行
pyspark2、Stand Alone运行模式
MASTERspark://master:7077 pyspark --num-executors 1 --total-executor-cores 3 --executor-memory 512m然后在远程Web端输入192.168.221.132:8888 页面打开后需要输入验证信息第一次验证即可 输入上图token后面的字符串和用户密码 输入sc测试
至此aconda3-5.1.0Python3.6.4 搭建pyspark远程服务器部署成功。
参考http://ihoge.cn/2018/anacondaPyspark.html