商城网站备案要求,个人静态网站,wordpress 分类关键词,单本小说wordpress我正在尝试使用Python编写一个spark作业,它将打开与Impala的jdbc连接,并将Impala直接从Impala加载到Dataframe中.这个问题非常接近,但在scala中#xff1a;Calling JDBC to impala/hive from within a spark job and creating a table我该怎么做呢#xff1f;其他数据源有很多…我正在尝试使用Python编写一个spark作业,它将打开与Impala的jdbc连接,并将Impala直接从Impala加载到Dataframe中.这个问题非常接近,但在scala中Calling JDBC to impala/hive from within a spark job and creating a table我该怎么做呢其他数据源有很多例子,例如MySQL,PostgreSQL等,但我还没有看到一个用于Impala Python Kerberos的数据源.一个例子会有很大的帮助.谢谢尝试使用来自网络的信息,但它没有用.SPARK笔记本#!/bin/bashexport PYSPARK_PYTHON/home/anave/anaconda2/bin/pythonexport HADOOP_CONF_DIR/etc/hive/confexport PYSPARK_DRIVER_PYTHON/home/anave/anaconda2/bin/ipythonexport PYSPARK_DRIVER_PYTHON_OPTSnotebook --ip* --no-browser# use Java8export JAVA_HOME/usr/java/latestexport PATH$JAVA_HOME/bin:$PATH# JDBC Drivers for Impalaexport CLASSPATH/home/anave/impala_jdbc_2.5.30.1049/Cloudera_ImpalaJDBC41_2.5.30/*.jar:$CLASSPATHexport JDBC_PATH/home/anave/impala_jdbc_2.5.30.1049/Cloudera_ImpalaJDBC41_2.5.30# --jars $SRCDIR/spark-csv-assembly-1.4.0-SNAPSHOT.jar \# --conf spark.sql.parquet.binaryAsStringtrue \# --conf spark.sql.hive.convertMetastoreParquetfalsepyspark --master yarn-client \--driver-memory 4G \--executor-memory 2G \# --num-executors 10 \--jars /home/anave/spark-csv_2.11-1.4.0.jar $JDBC_PATH/*.jar--driver-class-path $JDBC_PATH/*.jarPython代码properties  {driver: com.cloudera.impala.jdbc41.Driver,AuthMech: 1,# KrbRealm: EXAMPLE.COM,# KrbHostFQDN: impala.example.com,KrbServiceName: impala}# imp_env is the hostname of the db, works with other impala queries ran inside pythonurl  jdbc:impala:imp_env;authnoSasldb_df  sqlContext.read.jdbc(urlurl, tablesummary, propertiesproperties)我收到此错误消息(Full Error Log)Py4JJavaError调用o42.jdbc时发生错误.java.lang.ClassNotFoundExceptioncom.cloudera.impala.jdbc41.Driver最佳答案 您可以使用--jars $(echo /dir/of/jars/*.jar | tr   ,)代替--jars /home/anave/spark-csv_2.11-1.4.0.jar $JDBC_PATH/*.jar或者对于另一种方法,请参阅我的answer