Method to remove duplicate elements from an array in Java

  • 2020-04-01 02:19:40
  • OfStack

Problem: let's say I have an array (the number of elements is 0), and I want to add elements that cannot be repeated.

So given a problem like this, I might write the code very quickly, and I'm going to use an ArrayList for the array.


private static void testListSet(){
        List<String> arrays = new ArrayList<String>(){
            @Override
            public boolean add(String e) {
                for(String str:this){
                    if(str.equals(e)){
                        System.out.println("add failed !!!  duplicate element");
                        return false;
                    }else{
                        System.out.println("add successed !!!");
                    }
                }
                return super.add(e);
            }
        };

        arrays.add("a");arrays.add("b");arrays.add("c");arrays.add("b");
        for(String e:arrays)
            System.out.print(e);
    }

I don't care about anything here, but I'm just going to make a decision when I add an element to an array (and I'm just going to add an element to an array), whether the same element already exists, and if it doesn't, I'm going to add it to the array, and vice versa. This may be easy to write, but it's awkward when faced with a large array: if an array with 100,000 elements has one element, should I call equal 100,000 times? This is the foundation.

          Question: add an array that already has some elements. How do you remove duplicate elements from the array?

As you know, collections in Java generally fall into two categories: List and Set. In the Set of List class, the elements are ordered but can be repeated, while in the Set class, the elements are unordered but cannot be repeated. So here you can consider the use of Set this feature to remove duplicate elements not to achieve the purpose, after all, the existing algorithm in the system is better than their own algorithm.


public static void removeDuplicate(List<People> list){
       HashSet<People> set = new HashSet<People>(list);
       list.clear();
       list.addAll(set);
    }  private static People[] ObjData = new People[]{
        new People(0, "a"),new People(1, "b"),new People(0, "a"),new People(2, "a"),new People(3, "c"),
    }; 


public class People{
    private int id;
    private String name;

    public People(int id,String name){
        this.id = id;
        this.name = name;
    }

    @Override
    public String toString() {
        return ("id = "+id+" , name "+name);
    }    
}

The above code, with a custom People class, when I add the same object (refers to the same data content), call the removeDuplicate method found that this does not solve the actual problem, still exist the same object. So how does a HashSet determine whether an object is the same or not? Opening the HashSet source code reveals that each time you add data to it, you must call the add method:


@Override 
     public boolean add(E object) { 
         return backingMap.put(object, this) == null; 
     }

The backingMap here is the data maintained by the HashSet, which USES a clever method to treat the Object added each time as the KEY in the HashMap, and the HashSet Object itself as the VALUE. This takes advantage of the uniqueness of the KEY in the Hashmap, so that the data of the automatic HashSet is not repeated. But whether there is actually duplicate data depends on how the keys in the HashMap are determined to be the same.


@Override public V put(K key, V value) {
        if (key == null) {
            return putValueForNullKey(value);
        }
        int hash = secondaryHash(key.hashCode());
        HashMapEntry<K, V>[] tab = table;
        int index = hash & (tab.length - 1);
        for (HashMapEntry<K, V> e = tab[index]; e != null; e = e.next) {
            if (e.hash == hash && key.equals(e.key)) {
                preModify(e);
                V oldValue = e.value;
                e.value = value;
                return oldValue;
            }
        }
        // No entry for (non-null) key is present; create one
        modCount++;
        if (size++ > threshold) {
            tab = doubleCapacity();
            index = hash & (tab.length - 1);
        }
        addNewEntry(key, value, hash, index);
        return null;
    }

In general, the idea here is to walk through the elements in the hashmap, if the hashcode of the elements is equal (in fact, the hashcode has to be processed once), and then determine the eqaul method of the KEY. If these two conditions satisfy, then it's going to be different elements. In this case, if the element types in the array are custom, to make use of the Set mechanism, we have to implement equal and hashmap methods by ourselves.


public class People{
    private int id; //
    private String name;

    public People(int id,String name){
        this.id = id;
        this.name = name;
    }

    @Override
    public String toString() {
        return ("id = "+id+" , name "+name);
    }

    public int getId() {
        return id;
    }
    public void setId(int id) {
        this.id = id;
    }
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    @Override
    public boolean equals(Object obj) {
        if(!(obj instanceof People))
            return false;
        People o = (People)obj;
        if(id == o.getId()&&name.equals(o.getName()))
            return true;
        else
            return false;
    }

    @Override
    public int hashCode() {
        // TODO Auto-generated method stub
        return id;
        //return super.hashCode();
    }
}

I'm calling the removeDuplicate(list) method so that I don't have two identical people.

          Well, here's how to test them:


public class RemoveDeplicate {

    public static void main(String[] args) {
        // TODO Auto-generated method stub
        //testListSet();
        //removeDuplicateWithOrder(Arrays.asList(data));
        //ArrayList<People> list = new ArrayList<People>(Arrays.asList(ObjData));

        //removeDuplicate(list);

        People[] data = createObjectArray(10000);
        ArrayList<People> list = new ArrayList<People>(Arrays.asList(data));

        long startTime1 = System.currentTimeMillis();
        System.out.println("set start time --> "+startTime1);
        removeDuplicate(list);
        long endTime1 = System.currentTimeMillis();
        System.out.println("set end time -->  "+endTime1);
        System.out.println("set total time -->  "+(endTime1-startTime1));
        System.out.println("count : " + People.count);
        People.count = 0;

        long startTime = System.currentTimeMillis();
        System.out.println("Efficient start time --> "+startTime);
        EfficientRemoveDup(data);
        long endTime = System.currentTimeMillis();
        System.out.println("Efficient end time -->  "+endTime);
        System.out.println("Efficient total time -->  "+(endTime-startTime));
        System.out.println("count : " + People.count);
        
        

    }
    public static void removeDuplicate(List<People> list)
    {
     HashSet<People> set = new HashSet<People>(list);
     list.clear();
     list.addAll(set);
    }
    public static void removeDuplicateWithOrder(List<String> arlList)
    {
       Set<String> set = new HashSet<String>();
       List<String> newList = new ArrayList<String>();
       for (Iterator<String> iter = arlList.iterator(); iter.hasNext();) {
          String element = iter.next();
          if (set.add( element))
             newList.add( element);
       }
       arlList.clear();
       arlList.addAll(newList);
    }

    
    @SuppressWarnings("serial")
    private static void testListSet(){
        List<String> arrays = new ArrayList<String>(){
            @Override
            public boolean add(String e) {
                for(String str:this){
                    if(str.equals(e)){
                        System.out.println("add failed !!!  duplicate element");
                        return false;
                    }else{
                        System.out.println("add successed !!!");
                    }
                }
                return super.add(e);
            }
        };

        arrays.add("a");arrays.add("b");arrays.add("c");arrays.add("b");
        for(String e:arrays)
            System.out.print(e);
    }

    private static void EfficientRemoveDup(People[] peoples){
        //Object[] originalArray; // again, pretend this contains our original data
        int count =0;
        // new temporary array to hold non-duplicate data
        People[] newArray = new People[peoples.length];
        // current index in the new array (also the number of non-dup elements)
        int currentIndex = 0;
        // loop through the original array...
        for (int i = 0; i < peoples.length; ++i) {
            // contains => true iff newArray contains originalArray[i]
            boolean contains = false;

            // search through newArray to see if it contains an element equal
            // to the element in originalArray[i]
            for(int j = 0; j <= currentIndex; ++j) {
                // if the same element is found, don't add it to the new array
                count++;
                if(peoples[i].equals(newArray[j])) {

                    contains = true;
                    break;
                }
            }
            // if we didn't find a duplicate, add the new element to the new array
            if(!contains) {
                // note: you may want to use a copy constructor, or a .clone()
                // here if the situation warrants more than a shallow copy
                newArray[currentIndex] = peoples[i];
                ++currentIndex;
            }
        }

        System.out.println("efficient medthod inner  count : "+ count);
    }

    private static People[] createObjectArray(int length){
        int num = length;
        People[] data = new People[num];
        Random random = new Random();
        for(int i = 0;i<num;i++){
            int id = random.nextInt(10000);
            System.out.print(id + " ");
            data[i]=new People(id, "i am a man");
        }
        return data;
    }
 } 

Test results:


set end time -->  1326443326724
set total time -->  26
count : 3653
Efficient start time --> 1326443326729
efficient medthod inner  count : 28463252
Efficient end time -->  1326443327107
Efficient total time -->  378
count : 28463252


Related articles: