Three configuration methods of Spark properties in detail

  • 2020-06-23 02:28:56
  • OfStack

As the Spark project matures, more and more configurable parameters are added to Spark. Three places are provided in Spark for configuration:

Spark properties: This controls most of the properties of the application. And can be set by SparkConf object or Java system property;
2. Environment variable (Environment variables) : This can be set separately for each machine, such as IP. This can be set in the $SPARK_HOME/ conf/ spark-env.sh script per machine;
3. Logging: All logging related properties can be set in log4ES23en.properties.

These three attribute Settings are described in detail below.

1. Spark properties

Spark properties controls most of the properties of an application and can be set separately on each application. These properties can be set directly on the SparkConf object, which can be passed to SparkContext. The SparkConf object allows you to set 1 common property (master URL, the name of the application, etc.) that can be passed to any ES43en-ES44en pair of the set() method. As follows:


val
conf =
new SparkConf()
       .setMaster("local")
       .setAppName("CountingSheep")
       .set("spark.executor.memory", "1g")
val
sc =
new SparkContext(conf)

The Spark attribute is loaded dynamically

In 1 scenario, you might want to avoid setting the properties of the SparkConf object dead in your code. For example, you might want to run your application on different master or different memory capacities. Spark allows you to create an empty conf object, as follows:


val
sc =
new SparkContext(new
SparkConf())

You can then configure 1 of the properties from the command line at run time:


./bin/spark-submit --name "My app"
              --master local[4]
              --conf spark.shuffle.spill=false
              --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
                    -XX:+PrintGCTimeStamps"
              myApp.jar

The Spark shell and ES65en-ES66en tools support two ways to dynamically load configuration properties. The first is the command-line approach, such as --master; The ES68en-ES69en tool can receive any Spark attribute via the --conf tag. Run./bin/ spark-ES74en --help will display all options.

The./bin/ ES80en-ES81en tool also reads configuration options from the conf/ ES83en-ES84en.conf profile. In the conf/ ES87en-ES88en.conf configuration file, each line is an ES90en-ES91en pair, which can be separated by a space or by an equal sign. As follows:


spark.master      spark://iteblog.com:7077
spark.executor.memory  512m
spark.eventLog.enabled true
spark.serializer    org.apache.spark.serializer.KryoSerializ

Each value is passed to the application as one flags and the corresponding attribute in the SparkConf objects is merged. Properties configured through the SparkConf object have the highest priority; The second is to configure spark-ES99en or ES100en-ES101en via flags; Finally, the configuration in the ES103en-ES104en.conf file.

Where can I view the configured Spark properties

In the application corresponding to WEB UI (http:// < driver > All Spark configuration options for this application will be displayed under the Environment TAB on :4040). This is useful in cases where you want to make sure your configuration is correct. Note that only properties configured via ES121en-ES122en.conf or SparkConf are displayed on that page. For all other properties that are not shown, you can assume that their values are default.

2. Environment variables

A large part of Spark Settings can be set using environment variables. These environment variables are set in the conf/ spark-ES134en.sh script file (the file name is conf/ spark-ES139en.cmd if you are an windows system). In Standalone and Mesos modes, this file can be configured with 1 machine related information (hostname, for example).

Note that the conf/ ES149en-ES150en.sh file does not exist in the Spark you just installed. But you can create it by copying the conf/ spark-env.sh.template file, which you make sure is running after the copy.

The following properties can be configured in the conf/ ES161en-ES162en.sh file


JAVA_HOME Java Installation directory of 
PYSPARK_PYTHON Python binary executable to use for PySpark.
SPARK_LOCAL_IP IP address of the machine to bind to.
SPARK_PUBLIC_DNS Hostname your Spark program will advertise to other machines.

For standalone mode cluster, in addition to the above properties can be configured, there are a lot of properties can be configured, I will not talk about the details, I will go to the documentation.

3. Log configuration

Spark logs with log4j. You can configure es177EN4ES178en.properties to set different log levels, storage locations, etc. This file by default is not exist, you can copy log4j. properties. template files to get it.

conclusion

That's all for the detailed explanation of Spark3 attribute configuration in this article. I hope it will be helpful to you. Interested friends can continue to refer to this site: Spark es190EN-Means algorithm code examples, talk about 7 common Hadoop and Spark project cases, Spark broadcast variables and accumulator use method code examples, any questions can be left at any time, this site will timely reply to you.


Related articles: