In Visual Studio, create a new WPF application. A fast implementation of this algorithm can be found here, and you can use it by installing the CRC32C.NET NuGet package.įrom there, we can create our program to find and list the duplicates in the disk. As we are not seeking for huge accuracy, but for speed, we’ll use the CRC32 algorithm to compute the hashes. You can use other algorithms, like MD5 (128 bits), SHA1 (196 bits) or SHA256 (256 bits), but computing these will be way longer than computing the CRC32 bits. CRC32 allows 2,147,483,647 combinations and, thus, is more difficult to have a wrong result. But the larger number of bits make it more difficult to get wrong results: if you are using CRC16 checksum (16 bits), you will have 65,535 combinations and the probability of two different files have the same checksum is very large. Every checksum has a number of bits and, roughly, the larger the number of bits, the longer it takes to compute it. Now, we only have to choose the checksum. The search in a dictionary has a O(1) complexity, so this would do a O(n) complexity. A third approach would be to use a dictionary to group the files with the same hash. That way, you will still have the O(n^2) complexity, but you’ll have less data to compare (but you will have to compute the time to calculate the checksums). One other approach is to get a checksum of the file and compare checksums. But this is really cumbersome, because if there are 100 files in the group, there will be 100!/(2!*98!) = 100*99/2 = 4950 comparisons and has a complexity of O(n^2). The naive approach is to get all files with the same size and compare them one with the other. So, the best thing to do is to find a way to find and list all duplicates in the disk. So, we try to find the duplicate files in the disk to remove some extra space, but we have a problem: where are the duplicate files? The first answer is to check the files with the same name and size, but that isn’t enough – files can be renamed, and still be duplicates. We call the system cleanup, that removes some unused space, but this isn’t enough to make things better.
#Duplicates in file explorer full
Sometimes, when we open an Explorer window in the main computer, we see red bars in some disks, telling us that the disk is almost full and that we need to do some cleanup.