Spark implementation K ES2en algorithm code example

2020-06-23 02:30:03
OfStack

K-Means algorithm is a distance-based clustering algorithm, which USES iterative method to calculate K clustering centers and cluster several points into K classes.

MLlib implements the ES9en-ES10en algorithm by running multiple ES11en-ES12en algorithms, each called run, to return the cluster center of the best cluster. The initial class cluster center can be random or obtained from KMean||. The algorithm ends when the iteration reaches a fixed number of times, or when all run converges.

K-Means algorithm was implemented with Spark. First, the pom file was modified and the MACHINE learning MLlib package was introduced:


  <dependency>
   <groupId>org.apache.spark</groupId>
   <artifactId>spark-mllib_2.10</artifactId>
   <version>1.6.0</version>
  </dependency>

Code:


import org.apache.log4j.{Level,Logger}
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.mllib.clustering.KMeans
import org.apache.spark.mllib.linalg.Vectors
object Kmeans {
 def main(args:Array[String]) = {
 //  Shielding log 
 Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
 Logger.getLogger("org.apache.jetty.server").setLevel(Level.OFF)
 //  Set up the running environment 
 val conf = new SparkConf().setAppName("K-Means").setMaster("spark://master:7077")
  .setJars(Seq("E:\\Intellij\\Projects\\SimpleGraphX\\SimpleGraphX.jar"))
 val sc = new SparkContext(conf)
 //  Load data set 
 val data = sc.textFile("hdfs://master:9000/kmeans_data.txt", 1)
 val parsedData = data.map(s => Vectors.dense(s.split(" ").map(_.toDouble)))
 //  Gather the data into classes ,2 A class ,20 iteration , Forming a data model 
 val numClusters = 2
 val numIterations = 20
 val model = KMeans.train(parsedData, numClusters, numIterations)
 //  The central point of the data model 
 println("Cluster centres:")
 for(c <- model.clusterCenters) {
  println(" " + c.toString)
 }
 //  Use the sum of the squares of error to evaluate the data model 
 val cost = model.computeCost(parsedData)
 println("Within Set Sum of Squared Errors = " + cost)
 //  Test a single point of data using a model 
 println("Vectors 7.3 1.5 10.9 is belong to cluster:" + model.predict(Vectors.dense("7.3 1.5 10.9".split(" ")
  .map(_.toDouble))))
 println("Vectors 4.2 11.2 2.7 is belong to cluster:" + model.predict(Vectors.dense("4.2 11.2 2.7".split(" ")
  .map(_.toDouble))))
 println("Vectors 18.0 4.5 3.8 is belong to cluster:" + model.predict(Vectors.dense("1.0 14.5 73.8".split(" ")
  .map(_.toDouble))))
 //  Returns data sets and results 
 val result = data.map {
  line =>
  val linevectore = Vectors.dense(line.split(" ").map(_.toDouble))
  val prediction = model.predict(linevectore)
  line + " " + prediction
 }.collect.foreach(println)
 sc.stop
 }
}

The textFile() method was used to load the data set and obtain RDD. Then the KMeans. train() method was used to obtain an KMeans model based on RDD and K values and the number of iterations. After obtaining the KMeans model, you can determine which class a set of data belongs to. The method is to generate an Vector using the Vectors.dense () method, and then use the KMeans.predict () method to return which class belongs to.

Operation results:


Cluster centres:
 [6.062499999999999,6.7124999999999995,11.5]
 [3.5,12.2,60.0]
Within Set Sum of Squared Errors = 943.2074999999998
Vectors 7.3 1.5 10.9 is belong to cluster:0
Vectors 4.2 11.2 2.7 is belong to cluster:0
Vectors 18.0 4.5 3.8 is belong to cluster:1
0.0 0.0 5.0 0
0.1 10.1 0.1 0
1.2 5.2 13.5 0
9.5 9.0 9.0 0
9.1 9.1 9.1 0
19.2 9.4 29.2 0
5.8 3.0 18.0 0
3.5 12.2 60.0 1
3.6 7.9 8.1 0

conclusion

This article on the Spark implementation of es52EN-ES53en algorithm code samples of the entire content, I hope to be helpful to you. Interested friends can continue to refer to this site: Talk about 7 common Hadoop and Spark project cases, Spark broadcast variable and accumulator use method code example, Spark introduction, etc., if there is any deficiency, welcome to leave a message, this site will promptly reply you and correct, I hope friends to the site more support!