Fully resolve the hashtable in Java

  • 2020-04-01 02:13:08
  • OfStack

Hashtables provides a useful way to maximize application performance.

Hashtables are not a new concept in the computer world. They are designed to speed up computer processing, which is very slow by today's standards, and they allow you to quickly find a particular item when querying many data items. Although modern machines are thousands of times faster, hashtables is still a useful way to get the best performance out of your application.

Imagine that you have a data file containing about a thousand records. For example, a small business customer record and a program that reads the record into memory for processing. Each record contains a unique five-digit customer ID number, customer name, address, account balance, and so on. Assuming that records are not sorted by customer ID number, the only way to find a particular customer record if the program USES the customer number as a "key" is to search each record consecutively. Sometimes, it will find the record you need very quickly. But sometimes, before the program finds the record you need, it has almost reached the last record. To search through 1,000 records, finding any record requires the program to check an average of 500.5 ((1000 + 1)/2) records. If you often need to look up data, you should need a faster way to find a record.

One way to speed up your search is to break records into segments so you don't have to search a large list, but several short lists. For our digital customer ID number, you can create 10 lists?? ID Numbers starting with 0 make up a list, ID Numbers starting with 1 make up a list, and so on. To find the customer ID number 38016, all you need to do is search the list beginning with 3. If there are 1,000 records and the average length of each list is 100 (1,000 records are divided into 10 lists), the average number of comparisons to search for a record drops to about 50 (see figure 1).

Of course, if about a tenth of the customer Numbers start with zero, another tenth start with one, and so on, then this approach works well. If 90% of the customer Numbers start with zero, that list will have 900 records, requiring an average of 450 comparisons per lookup. In addition, 90 percent of the searches the program needs to perform are for Numbers starting with 0. As a result, the average comparison is well beyond the bounds of simple mathematics.

It would be better if we could allocate records in our list in such a way that each list has approximately the same entries of records, regardless of the distribution of Numbers in the key values. We needed a way to mix the customer Numbers together and distribute the results better. For example, we can take each number in a number, multiply it by some large number (depending on the position of the number), then add the results to produce a total, divide the number by 10, and use the remainder as the index value. When a record is read in, the program runs the hash function on the customer number to determine which list the record belongs to. When the user needs to query, use the same hash function as a "key" for the customer number so that the correct list can be searched. A data structure like this is called a hashtable.

Java The Hashtables
Java Contains two classes, Java . Util. Hashtable and Java .util.hashmap, which provides a multi-purpose hashtable mechanism. The two classes are similar and usually provide the same public interface. But they do have some important differences, which I'll talk about later.

The Hashtable and HashMap objects let you combine a key with a value and use them The put The () method enters the pair of keys/values into the table. Then you can get the value by calling the get() method with the key as an argument. The key and value can be any object as long as the two basic requirements are met. Note that because the key and value must be objects, primitive types must be converted to objects by using methods such as Integer(int).

To use an object of a particular class as a key, the class must provide two methods, equals() and hashCode(). These two methods are Java .lang.Object, so all classes can inherit from both methods; However, the implementation of these two methods in the Object class is generally not useful, so you usually need to override them yourself.

The Equals () method compares its object to another object and returns true if the two objects represent the same information. The method also looks at and ensures that the two objects belong to the same class. If two reference objects are exactly the same, object.equals () returns true, which explains why this method is generally not a good fit. In most cases, you need a method to compare field by field, so we assume that different objects representing the same data are equal.

The HashCode() method generates an int by executing a hash function on the contents of the object. Hashtable and HashMap use this value to figure out which bucket (hash) (or list) a pair of keys/values resides in.

As an example, we can look at the String class because it has its own method to implement both methods. String.equals() compares two String objects character by character and returns true if the String is the same:


String myName = "Einstein";
// The following test is 
// always true
if ( myName.equals("Einstein") )
{ ...

String.hashcode () runs the hash function on a String. The numeric code for each character in the string is multiplied by 31, depending on the position of the character in the string. Then add up the results of these calculations to get a total. This process may seem complicated, but it ensures a better distribution of values. It also demonstrates how far you can go in developing your own hashCode() method to be sure that the result is unique.

For example, let's say I'm going to use a hashtable to implement a directory for a book, with the ISBN number of the book as the search key. I can use the String class to host the details and have the equals() and hashCode() methods ready (see listing 1). We can use The put The () method adds a paired key/value to the hashtable (see listing 2).

The Put The () method takes two arguments, both of type Object. The first parameter is key; The second parameter is value. The Put The () method calls key's hashCode() method, dividing the result by the number of lists in the table. Use the remainder as the index value to determine to which list the record is added. Note that the key is unique in the table; If you call it with an existing key The put (), the matching entry is modified, so it refers to a new value and the old value is returned (when the key does not exist in the table, The put () returns a null value.

To read a value in the table, we use the search key for the get() method. It returns an Object reference that converts to the correct type:


BookRecord br =
(BookRecord)isbnTable.get(
"0-345-40946-9");
System.out.println(
"Author: " + br.author
+ " Title: " + br.title);

Another useful method is remove(), which is almost identical to get() in that it removes the entry from the table and returns it to the caller.

Your own class
If you want to use a primitive type as a key, you must create an object of the same type. For example, if you want to use an Integer key, you should use the constructor Integer(int) to generate an object from the Integer. All wrapper classes?? Integer, Float, and Boolean all treat primitive values as objects, overloading the equals() and hashCode() methods, so they can be used as keys. The same is true of many other classes provided in the JDK (even the Hashtable and HashMap classes implement their own equals () and hashCode () methods), but you should look at the file before using any of the class's objects as Hashtable keys. It is also necessary to look at the source of the class and see how equals() and hashCode() are implemented. For example, Byte, Character, Short, and Integer all return the represented Integer value as the hash code. This may or may not suit your needs.

in Java Use Hashtables
If you want to create a hashtable that USES the object of a class you define as the key, you should make sure that the equals() and hashCode() methods of the class provide useful values. First look at the class you've extended to see if its implementation meets your needs. If not, you should override the method.

The basic design constraint for any equals() method is that it should return true if the object passed to it belongs to the same class and its data fields are set to values representing the same data. You should also be sure that if you pass an empty argument to the method, your code returns


false : public boolean equals(Object o)
{
if ( (o == null)
|| !(o instanceof myClass))
{
return false;
}
// Now compare data fields...

Also, there are some rules to keep in mind when designing a hashCode() method. First, the method must return the same value for a particular object, no matter how many times the method is called (which, of course, should be avoided when using an object as the key of a hashtable, as long as the contents of the object do not change between calls). Second, if two objects defined by your equals() method are equal, they must also generate the same hash code. Third, this is more of a policy than a principle, and you should try to design methods that produce different results for different object content. It doesn't matter if occasionally different objects happen to generate the same hash code. However, if this method can only return values ranging from 1 to 10, then only 10 lists can be used, regardless of how many lists are in the hashtable.

Another factor to keep in mind when designing equals() and hashCode () is performance. Each call The put () or get(), both involve calling hashCode() to find the correct list, and when get() scans the list to find the key, it calls equals() for each element in the list. Implement these methods to make them run as quickly and efficiently as possible, especially if you intend to make your classes publicly available, because other users may want to use your classes in high-performance applications where execution speed is important.

Hashtable performance
The main factor affecting the efficacy of hashtable is the average length of the list in the table, because the average search time is directly related to this average length. Obviously, to reduce the average length, you have to increase the number of lists in the hashtable; If the number of lists is so large that most or all lists contain only one record, you will get the best search efficiency. However, this may be going too far. If your hashtable has far more lists than data entries, you don't need to do this, and in some cases, people won't accept it.

In our previous example, we knew in advance how many records we had 1,000. Knowing this, we can decide how many lists our hashtable should contain in order to achieve the best compromise between search speed and memory efficiency. However, in many cases, you do not know in advance how many records you will be dealing with; The number of files the data is read from may grow, or the number of records may change dramatically from day to day.

As the number of entries increases, the Hashtable and HashMap classes handle this problem by dynamically extending the table. Both classes have a constructor that accepts the initial number of lists in the table, and a load factor as a parameter:
Public Hashtable (
Int initialCapacity,
Float loadFactor)

Public HashMap (
Int initialCapacity,
Float loadFactor)

Multiply these two Numbers to calculate a critical value. Each time a new entry is added to the hash table, the count is updated, and when the count exceeds the threshold, the table is reset (rehash). The number of lists is increased to two times the previous number plus one, and all entries are moved to the correct list. The default constructor sets the initial capacity to 11 and the load factor to 0.75, so the threshold is 8. When the ninth record is added to the table, the hash table is readjured to 23 lists, and the new threshold will be 17 (the integer part of 23*0.75). As you can see, the load factor is the upper limit of the average number of lists in the hash table, which means that, by default, the hash table rarely has many lists containing more than one record. Compare our original example, in which we had 1,000 records spread over 10 column tables. If we had used the default values, the table would have expanded to more than 1,500 lists. But you can control that. If the number of lists multiplied by the load factor is greater than the number of items you are processing, the table will never be redone, so we can follow this example:


// Table will not rehash until it
// has 1,100 entries (10*110):
Hashtable myHashTable = 
new Hashtable(10, 110.0F);

You probably don't want to do this unless you don't save memory for an empty list and don't mind the extra search time, which might be the case in an embedded system. However, this approach can be useful because resetting can be computationally time consuming, and this approach ensures that resetting never occurs.

Note that although called The put () increases the table (the number of lists), and the call remove() does not do the opposite. So, if you have a large table and you delete most of the entries from it, you end up with a large but mostly empty table.

Hashtable, HashMap
There are three important differences between the Hashtable and HashMap classes. The first difference is mainly historical. Hashtable is based on the stale Dictionary class, and HashMap is Java 1.2 the introduction of The Map An implementation of the interface.

Perhaps the most important difference is that the Hashtable method is synchronous, while the HashMap method is not. This means that while you can use a Hashtable in a multi-threaded application without any special behavior, you must also provide external synchronization for a HashMap. A convenient method is to take advantage of the Collections class's static synchronizedMap() method, which creates a thread-safe one The Map Object and return it as a wrapped object. This object's method lets you access the underlying HashMap synchronously. The result is that you can't cut the synchronization in Hashtable when you don't need it (in a single-threaded application, for example), and synchronization adds a lot of processing overhead.

The third difference is that only HashMap lets you use null values as the key or value of an entry in a table. Only one record in a HashMap can be an empty key, but any number of entries can be an empty value. That is, if the search key is not found in the table, or if the search key is found, but it is an empty value, get() will return null. If necessary, use the containKey() method to distinguish between the two cases.

Some data suggest that when synchronization is needed, Hashtable is used, and HashMap is used instead. However, because HashMap can be synchronized when needed, HashMap has more functionality than Hashtable, and it is not based on an old class, it is argued that HashMap takes precedence over Hashtable in all cases.

On the Properties
Sometimes, you might want to use a hashtable to map the key string to the value string. There are some examples of environment strings in DOS, Windows, and Unix, such as the string PATH of key mapped to the string C: \ Windows; C: \ WINDOWS \ SYSTEM. Hashtables is an easy way to represent this, but Java An alternative approach is provided.

Java The.util.properties class is a subclass of Hashtable designed for String keys and values. The use of the Properties object is similar to the use of Hashtable, but the class adds two time-saving methods that you should know.

The Store() method saves the contents of a Properties object to a file in a readable form. The Load() method, in contrast, reads the file and sets the Properties object to contain keys and values.

Note that because Properties extends Hashtable, you can use the superclass's The put () method to add keys and values that are not String objects. This is not desirable. In addition, if you use store() for a Properties object that does not contain a String object, the store() will fail. As a The put Instead of () and get(), you should use setProperty() and getProperty(), which take strings.

Okay, I hope you now know how to use hashtables to speed up your processing


Related articles: