Notes on Programming: xxhash is not faster than md5 on my files

Tuesday, 16 April 2024

xxhash is not faster than md5 on my files

xxHash is an Extremely fast Hash algorithm, processing at RAM speed limits

100x181 documents

MD5:

    start = timeit.default_timer()
    for i in tqdm(range(100)):
        total = 0
        for path in paths:
            if not path.is_file():
                continue
            hash = hashlib.md5(path.read_bytes()).hexdigest()
            total += 1
    print(total)
    end = timeit.default_timer()
    print(f"Time: {end - start}")

181
Time: 10.780842124004266

xxhash:

    start = timeit.default_timer()
    for i in tqdm(range(100)):
        total = 0
        for path in paths:
            if not path.is_file():
                continue
            hash = xxhash.xx64(path.read_bytes()).hexdigest()
            total += 1
    print(total)
    end = timeit.default_timer()
    print(f"Time: {end - start}")

181
Time: 10.775027380004758

Not significat faster. I guess most of the time is spent in IO. I go with MD5 because its more common, familiar for others and implemented everywhere.

Notes on Programming

Tuesday, 16 April 2024

xxhash is not faster than md5 on my files

No comments:

Post a Comment

Parse Wikipedia dump

About Me

Blog Archive