c compares instance code for a large number of images efficiently

  • 2020-05-19 05:41:49
  • OfStack

The traditional way of comparison used to be to walk through every pixel in an image and compare them. Such a comparison is less efficient than a small number of pictures, but it is not bad either. However, in the case of a large number of image comparison, too long reaction time and high consumption to the server are definitely not acceptable.

Therefore, Microsoft provided us with another comparison method, which can correctly and efficiently compare whether the two images are the same. Here's how it works: save the image to the data stream and then use Convert.ToBase64String to convert the data stream to a string, so we only need to compare the two image strings to ok. The code is as follows:


public bool CheckImg(string filePath1, string filePath2)
        {

           
            MemoryStream ms1 = new MemoryStream();
            Image image1 = Image.FromFile(filePath1);
            image1.Save(ms1, System.Drawing.Imaging.ImageFormat.Jpeg);
            string img1 = Convert.ToBase64String(ms1.ToArray());
            Image image2 = Image.FromFile(filePath2);
            image2.Save(ms1, System.Drawing.Imaging.ImageFormat.Jpeg);
            string img2 = Convert.ToBase64String(ms1.ToArray());
            if (img1.Equals(img2))
            {
               return true;
            }
            else
            {
                return false;
            }
        }

In this method, I used the console to test the time, and the time was about 0.015s to compare a picture. It's efficient.

Compare lots of pictures
Comparing two pictures can satisfy the demand, but how about a large number? I did the same test on my side. Out of 450 images, select the duplicate and show it. The time is about 16s, which can basically meet most of the requirements.

The comparison method is to start with 450 images, all of which are converted to string type once, and stored in an array. This avoids the need for extra conversion every time in a loop, which makes the program much less efficient.


public static List<Dictionary<string, string>> chekImgss(string filePath)
        {

            List<Dictionary<string, string>> liststr = new List<Dictionary<string, string>>();
            DirectoryInfo dir = new DirectoryInfo(filePath);
            FileInfo[] files = dir.GetFiles();
            foreach (FileInfo fileInfo in files)
            {
                Dictionary<string, string> dic = new Dictionary<string, string>();
                string ex = fileInfo.Extension;
                if (ex == ".jpg" || ex == ".png")
                {
                    MemoryStream ms1 = new MemoryStream();
                    Image image2 = Image.FromFile(filePath + fileInfo.Name);
                    image2.Save(ms1, System.Drawing.Imaging.ImageFormat.Jpeg);

                    string imgBase64 = Convert.ToBase64String(ms1.ToArray());
                    dic["base64"] = imgBase64;
                    dic["imgName"] = fileInfo.Name;
                    liststr.Add(dic);
                }
            }
            return liststr;
        }

Store the image base64string and the image name in an dictionary array, then in an list array. So when we compare, we just have to determine if the two strings are equal.


/// <summary>
        ///  Make a deep copy of the array 
        /// </summary>
        /// <param name="files"></param>
        /// <returns></returns>
        public static List<Dictionary<string, string>> CopyList(List<Dictionary<string, string>> files)
        {
            MemoryStream ms = new MemoryStream();// serialization 
            BinaryFormatter bf = new BinaryFormatter();
            bf.Serialize(ms, files);
            ms.Position = 0;
            List<Dictionary<string, string>> array3 = (List<Dictionary<string, string>>)bf.Deserialize(ms);  // deserialization 
            return array3;
        }
        /// <summary>
        ///  More pictures 
        /// </summary>
        /// <param name="listDic"></param>
        /// <param name="filePath"></param>
        /// <returns></returns>
        public static List<Dictionary<object, string>> chekImg2(List<Dictionary<string, string>> listDic,string filePath)
        {
            List<Dictionary<object, string>> list = new List<Dictionary<object, string>>();
            DirectoryInfo dir = new DirectoryInfo(filePath);
            var files = dir.GetFiles().ToList();
            for (int j = 0; j < listDic.Count; j++)
            {
                var file = listDic[j];
                
                var fileList = CopyList(listDic);
                var index = 0;
                var isFirst = false;
                Dictionary<object, string> dic = new Dictionary<object, string>();
                for (int i = 0; i < fileList.Count; i++)
                {
                    var fileInfo = fileList[i];
                    if (file["imgName"] == fileInfo["imgName"])
                    {
                        fileList.Remove(fileInfo);
                        i -= 1;
                        continue;
                    }
                    // use equals Than ordinary, string==string  Many times more efficient 
                    if (file["base64"].Equals(fileInfo["base64"]))
                    {
                        if (!isFirst)
                        {
                            dic[++index] = file["imgName"];
                            isFirst = true;
                        }
                        dic[++index] = fileInfo["imgName"];
                        fileList.Remove(fileInfo);
                        listDic.Remove(listDic.Where(c => c.Values.Contains(fileInfo["imgName"])).FirstOrDefault());
                        i -= 1;
                    }
                    else
                    {
                        fileList.Remove(fileInfo);
                        i -= 1;
                    }
                }
                if (dic.Keys.Count > 0)
                {
                    list.Add(dic);
                    listDic.Remove(file);
                    j -= 1;
                }
                else
                {
                    listDic.Remove(file);
                    j -= 1;
                }
            }
            return list;
        }

In this way, we can take the exact same images and store them in an dictionary array, one dictionary for each set of the same images, and store them in an list array. Just walk through it when you need it.

conclusion

When making a lot of comparisons, it's best to convert the image to string first. If this transformation were implemented in two for, the time would be many times larger than it is now, because there would be a lot of repetitive transformations. Then during the comparison process, try to remove the images from the array that have already been compared. In this way, you will find that the comparison is faster and faster.


Related articles: