Spring Boot and Spark Cassandra system integration development example
- 2021-01-18 06:30:03
- OfStack
This article demonstrates an example of using Spark as the analysis engine,Cassandra as the data store, and Spring Boot to develop a driver.
1. Preconditions
Install Spark(this article uses Spark-1.5.1, if the installation directory is /opt/spark) Install Cassandra (+ 3.0)Create keyspace
CREATE KEYSPACE hfcb WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
Create table
CREATE TABLE person (
id text PRIMARY KEY,
first_name text,
last_name text
);
Insert test data
insert into person (id,first_name,last_name) values('1','wang','yunfei');
insert into person (id,first_name,last_name) values('2','peng','chao');
insert into person (id,first_name,last_name) values('3','li','jian');
insert into person (id,first_name,last_name) values('4','zhang','jie');
insert into person (id,first_name,last_name) values('5','liang','wei');
2. spark cassandra - connector installation
To enable Spark-1.5.1 to use Cassandra as the data store, you need to add the following dependencies to the jar package (the example places the package in /opt/spark/ managed-lib /, which is optional):
cassandra-clientutil-3.0.2.jar
cassandra-driver-core-3.1.4.jar
guava-16.0.1.jar
cassandra-thrift-3.0.2.jar
joda-convert-1.2.jar
joda-time-2.9.9.jar
libthrift-0.9.1.jar
spark-cassandra-connector_2.10-1.5.1.jar
In /opt/spark/conf, create a new file spark-env. sh and enter the following
SPARK_CLASSPATH=/opt/spark/managed-lib/*
3.Spring Boot application development
Add spark-cassandra-connector and spark dependencies
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.5.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.5.1</version>
</dependency>
spark and cassandra paths are configured in application.yml
spark.master: spark://master:7077
cassandra.host: 192.168.1.140
cassandra.keyspace: hfcb
spark://master:7077 is the domain name and not the ip address. You can modify the local hosts file to map master to ip address.
Configure SparkContext and CassandraSQLContext
@Configuration
public class SparkCassandraConfig {
@Value("${spark.master}")
String sparkMasterUrl;
@Value("${cassandra.host}")
String cassandraHost;
@Value("${cassandra.keyspace}")
String cassandraKeyspace;
@Bean
public JavaSparkContext javaSparkContext(){
SparkConf conf = new SparkConf(true)
.set("spark.cassandra.connection.host", cassandraHost)
// .set("spark.cassandra.auth.username", "cassandra")
// .set("spark.cassandra.auth.password", "cassandra")
.set("spark.submit.deployMode", "client");
JavaSparkContext context = new JavaSparkContext(sparkMasterUrl, "SparkDemo", conf);
return context;
}
@Bean
public CassandraSQLContext sqlContext(){
CassandraSQLContext cassandraSQLContext = new CassandraSQLContext(javaSparkContext().sc());
cassandraSQLContext.setKeyspace(cassandraKeyspace);
return cassandraSQLContext;
}
}
A simple call
@Repository
public class PersonRepository {
@Autowired
CassandraSQLContext cassandraSQLContext;
public Long countPerson(){
DataFrame people = cassandraSQLContext.sql("select * from person order by id");
return people.count();
}
}
Start-up can be executed as normal Spring Boot program 1.
Source address: https: / / github com wiselyman/spring - spark - cassandra. git
conclusion