Hashing With Python

hashing with python image

Hashing with python is actually easier than you think, and the fun part is that you can even write a reusable function with which you can always authenticate checksum for downloads when given one.

Interesting right? Oh sorry! you are among the guys that are new to tech and you don’t have an idea what the heck am talking about “hashing with python”

Have you ever wondered how people that deal on selling and buying of digital products verify the authenticity of those products? You have probably heard of file integrity right?

Well, am not going to bore you with the long explanation of hashing and what it does. However, you just need to know in a nutshell, that hashing is used for generating another value for a given string or key.

Think about passwords used on a secure site for example, these passwords are not stored as plain text, but as hashed values of those passwords. These hash values are used to authenticate passwords rather than having the passwords stored in plain text in the database.

Hash functions like md5, sha256 and similar functions has the ability to give off an entirely different and unique value even if you change just one letter in the content that is being hashed.

If the paragraphs above does not make much sense to you yet, not worry because you are about to understand more with practical examples while learning hashing with python.

Meanwhile hashing with python is not the only way of hashing a file or a string. We can easily hash a file, confirm the integrity of a file using the terminal/shell(Command line).

We will make a quick demonstration with a Linux based command line interface/terminal: Let’s say you just bought a software online from a good tech company, and they had the hash value of the product written on the description section of the product.

These hash values are usually some alphanumeric strings, just like the following write up below:

sha256sum: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

These simple means that the alphanumeric strings was generated with sha256 hash function, which is a popular 32 bit hash function. You are expected to use the same hash function to confirm the integrity of the file.

There are other popular hash functions like md5, sha512 and etc. However in this tutorial we will focus on the ones that mostly used today which is sha256. sha512 is also popular and it’s 64 bit hash function.

Enough of the talking, let’s see how these things work: So if you just go to your Linux terminal or whatever CLI you are using, cd to where you have the file and type in sha256sum [your_file_name] , you will get the hash value printed out for you.

Let’s say your file name is test_file, here is what you will get:

$ sha256sum test_file 
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  test_file

To keep checking monitoring the file integrity, you will need to modify the previous command like in the following code:

$ sha256sum test_file > check_test_file

The command does not return any output.

What we did was just setting the check_test_file to be used in monitoring the integrity of the test_file, the name also could be anything. Now let’s check the file if something or someone else got access to it and made some change or damage to it.

$ sha256sum -c check_test_file 
test_file: OK

The -c is used to check the check_test_file for any changes or integrity status in the main file(test_file). You can also use –check in place of the -c

You can consider hashing as a way of doing an integrity check on documents and strings.

HASHING WITH PYTHON CODES/FUNCTIONS

Creating reusable codes is one of the best benefits of programming. So now we are going to create a python function to easily run our file integrity check.

To use sha256, sha512 and even md5, we will use a python module called hashlib.

So we begin by importing hashlib:

# checking against site provided checksum
import hashlib

BEFORE WE CONTINUE: My assumptions are that you already have the basic experience in Python; You are familiar with some python concepts like: input handling, file manipulation, printing, control logic and loops.

LET’S CONTINUE: Now we will create a function for authenticating our file with a valid checksum provided for us.

Let’s say we were given the same hash value we used earlier above, and we now want to confirm that there was no slipping of a malicious code by middle man into our file.

We will create a function and call it check_auth()

import hashlib
def check_auth():
    file_path = input('Enter your file path:\n')
    p_chk = input('Enter the checksum provided:\n')

    f = open(file_path, 'rb')
    file1 = f.read()
    f.close()

    hash1 = hashlib.sha256(file1).hexdigest()
    hash2 = p_chk

    print('\n' + hash1)
    print(p_chk)

    if hash1 == hash2:
        print('\nFile matched!')
    else:
        print('\nFile mismatched!')
check_auth()

If everything goes well you should be getting an output like the following:

Enter your file path:
/home/you/your_file_path/test_file
Enter the checksum provided:
a717f6b95046953a2568dcbe97eea3edfd3d0a50280afa0f86c8854ce7e1124a

a717f6b95046953a2568dcbe97eea3edfd3d0a50280afa0f86c8854ce7e1124a
a717f6b95046953a2568dcbe97eea3edfd3d0a50280afa0f86c8854ce7e1124a

File matched!

So let’s explain the code:

So in our check_auth function, we used the input function to get both the given checksum and our file path. Then we opened the file in binary mode, read and assigned the content to the variable file1.

Then we used the hashlib to hash the file content and assigned the hashed value to the variable hash1. We now assigned the given checksum to hash2.

Next we printed both hash values out to make visual confirmation. Then we used an if statement to confirm if our hashing matched with the given checksum.

Now you have seen how hashlib module can be used to hash objects in python. The same step could also be used with md5, sha512 and the rest.

BONUS FUNCTION FOR HASHING WITH PYTHON:

Let’s create another function that will check if two files has the same hash value. We will use almost the same pattern, we will just make a few changes. So there is no need to explain the function as long as you understand the first function.

# checking file integrity
import hashlib

def check_file_auth():
    path1 = input('Enter file 1 path:\n')
    path2 = input('Enter file 2 path:\n')

    f = open(path1, 'rb')
    file1 = f.read()
    f.close()

    f2 = open(path2, 'rb')
    file2 = f2.read()
    f2.close()

    hash1 = hashlib.sha256(file1).hexdigest()
    hash2 = hashlib.sha256(file2).hexdigest()

    print(hash1)
    print(hash2)

    if hash1 == hash2:
        print('File matched!')
    else:
        print('File mismatched!')

check_file_auth()

I believe with the last function you understood the process even more. There are other hash modules in python like simhash and imagehash.

Simhash is mostly used when you want to detect if two or more files has similar or near duplicate contents(plagiarism). Unlike md5, sha256 and the likes, the hash values of two or more different contents changes according to level of difference or similarities in the different contents.

Imagehash is used almost in the same manner with simhash, but in most cases it is used to determine duplicate images or near duplicate images.

Both simhash and imagehash has to be installed with pip3 install, they are not python built ins. However we are not going to cover them on this tutorial, but will perhaps in another cryptography post or course.

If you have questions , please don’t hesitate to ask me in the comment section, chat me up or write me directly. To also see more codes similar to hashing with python kindly check our tutorial post page.

🤞 Don’t miss the tips!

We don’t spam! Read more in our privacy policy

Geoff

Geoff is a python software engineer, web content specialist, tech private trainer and an IT virtual assistant.

This Post Has One Comment

  1. Snakeman

    Thanks for this Geoff..I needed it.

Leave a Reply