Steps to improve lookup performance using HashMap in Java

2021-08-16 23:59:12
OfStack

In Java, HashMap is actually a key-value pair. 1 Key, corresponding to 1 value; When writing data, specify Key to write the corresponding value; When reading, find the corresponding value by Key. It feels like Redis.


//  Create  HashMap  Object  Sites
HashMap<Integer, String> Sites = new HashMap<Integer, String>();
//  Add key-value pairs 
Sites.put(1, "Google");
Sites.put(2, "Runoob");
Sites.put(3, "Taobao");
Sites.put(4, "Zhihu");
// Read 
String val = Sites.get(1);// Get Google

Why can you use HashMap to improve performance? The reason is not that the storage performance of HashMap data structure is much more advanced than other data structures, such as arrays and collections. What I am mainly interested in is that when I know Key, I can find the corresponding value very quickly. If you use arrays, the simplest is to use loops; Pay attention to 1 point, arrange the order, and find it in half (find it in 2 points). Can't compare with reading directly in HashMap with Key. I don't know why HashMap is so fast in searching. It is estimated that it is the storage structure, what tree is used, and an index is established for Key. This is another topic, and we will learn about it later. Yesterday, I just took advantage of this feature and would run the problem for several hours without ending, which took only 10 seconds.

The questions are as follows:
There are 250,000 records, each with latitude and longitude; There are cases where different record coordinates are the same. Now I want to merge records with the same coordinates into 1.

If the data is stored in a database, coordinate grouping with SQL should solve the problem. However, there is no database, and the data is read from gdb file.

Well, save the data to an array and create a new collection; Then loop the array and compare it with the records in the new collection one by one. If the coordinates are the same, they will be merged into the new collection, and if the coordinates are different, they will be inserted into the new collection. It's the simplest. Results Two hours passed, and there was no sign of end.

It's right to think about it. The new collection is getting bigger and bigger, and the number of comparisons is getting more and more. It's like one kind of rice on the chessboard, and the amount of rice in each grid is twice that of the first grid; In the end, even if the rice in the whole national grain depot is put in, the whole chessboard cannot be filled.

Sequence 250,000 records before processing? Sorting alone is busy, no way.

Save 250,000 records in the database first, and then group them? It should be OK, but I always feel stupid, and the speed should also be calculated in minutes.

Finally, it was decided to use HashMap for this new collection.
As described above, HashMap writes or reads values according to Key. The key is how to get this Key. In the above example, the person who wrote the code himself gave 1 character as Key. In our project, we can use the hash value of the sum of latitude and longitude as Key. If the hash values are the same, it is considered that the latitude and longitude are the same. It is only necessary to judge whether there is an element corresponding to Key in the new set, and there is no need for circular comparison at all.

Because there are two different latitudes and longitudes, the result is one possibility, so multiply the longitude by 1000 and then add the latitude, which basically eliminates the chance of conflict.

The code is as follows:


private HashMap<Long,SimpleItem> recGeo(HashMap<Long, SimpleItem> map,String geo,int j){
  /*
     Combine records with the same coordinates 1 Article 
    HashMap<Long, SimpleItem> map,  New set 
    String geo,  Coordinate string 
    int j  Record ID
   */

  try {
    Point p = (Point)reader.read(geo);
    /*
       Calculate hash value 
       Because if you use a loop to compare, the amount of data is too large and the speed is too slow 
       In order to avoid longitude in different coordinates, + If the latitude results are the same, the longitude will be changed  * 1000 Add again 
     */
    // Calculation Key
    long k = Long.valueOf(Double.doubleToLongBits(p.getX() * 1000 + p.getY())).hashCode();
    
    SimpleItem si = map.get(k);
    if(si != null){// In the new collection, the Key The corresponding element already exists and should be a record with the same coordinates 
      si.getPointers().add(j);// Merge 
    } else {// Otherwise insert 
      si = new SimpleItem();
      si.setGeo(geo);
      List<Integer> pointers = new ArrayList();
      pointers.add(j);
      si.setPointers(pointers);
      map.put(k,si);
    }
  } catch (ParseException e) {
    e.printStackTrace();
  }

  return map;
}

private static GeometryFactory geometryFactory = JTSFactoryFinder.getGeometryFactory( null );
private static WKTReader reader = new WKTReader( geometryFactory );
class SimpleItem{
  private Point geo;
  private List<Integer> pointers;

  public Point getGeo() {
    return geo;
  }

  public void setGeo(String geo) {
    try {
      this.geo = (Point)reader.read(geo);
    } catch (ParseException e) {
      e.printStackTrace();
    }
  }

  public List<Integer> getPointers() {
    return pointers;
  }

  public void setPointers(List<Integer> pointers) {
    this.pointers = pointers;
  }
}

In just a few seconds, the new collection gets 50,000 elements.

These are the steps in Java to use HashMap to improve lookup performance. For more information about Java HashMap to improve lookup performance, please pay attention to other related articles on this site!