xxHash is an Extremely fast Hash algorithm, processing at RAM speed limits
100x181 documents
MD5:
start = timeit.default_timer()
for i in tqdm(range(100)):
total = 0
for path in paths:
if not path.is_file():
continue
hash = hashlib.md5(path.read_bytes()).hexdigest()
total += 1
print(total)
end = timeit.default_timer()
print(f"Time: {end - start}")
181 Time: 10.780842124004266xxhash:
start = timeit.default_timer()
for i in tqdm(range(100)):
total = 0
for path in paths:
if not path.is_file():
continue
hash = xxhash.xx64(path.read_bytes()).hexdigest()
total += 1
print(total)
end = timeit.default_timer()
print(f"Time: {end - start}")
181 Time: 10.775027380004758
Not significat faster. I guess most of the time is spent in IO. I go with MD5 because its more common, familiar for others and implemented everywhere.
No comments:
Post a Comment