Can ChatGPT Help My Non-Coder Partner Build Software?

An AI Experiment: Challenging a Non-Coder to Use ChatGPT for File Comparison - Here's What Happened.

ยท

14 min read

Picture yourself, you're a developer and your partner or friend asks you for help with a task they want to automate. This is the kind of situation I found myself in recently, as well as many times in the past, only this time I had ChatGPT on my mind. Of course, I had an idea in my mind of how to solve the task at hand, but I wanted to use this as an excuse to test ChatGPT more comprehensively.

I considered trying to use ChatGPT myself to see how well it could write the code for me. But ultimately I decided to ask my partner if they were interested in solving it with the help of ChatGPT. After all, having a non-coder attempt to use ChatGPT to write code seemed like a much more interesting prospect.

So that is what we ended up doing. I sat my partner at my computer and opened ChatGPT (note that GPT-3.5 was used throughout unless otherwise noted). Then I instructed my partner to talk to it just as they would to me. Throughout this, I did my best to avoid influencing what my partner wrote and did.

So what happened? Well, before we jump into it. Let me explain what the task looked like.

The Task: File Comparison

We have two files to work with, each containing similar content: a series of lines containing dialogue for a screenplay. The two files contain dialogue for the same chapter of a story. The task is to find all the lines in the first file that are also present in the second and to calculate an overall similarity percentage of the two files, then display it at the end. The lines don't necessarily need to be equal, but be similar enough.

To help illustrate the task more clearly, here is a simple example. First the two files:

Boris: Delighted to have you all here at this splendid gathering, my friends.
Boris: We've got a delightful spread awaiting you, with an assortment of sandwiches, delectable jam, flaky croissants, and an abundance of refreshing beverages, including an impressive selection of beer and wine.
Rishi: I must extend my deepest gratitude, Boris, for this magnificent feast you've arranged. It's truly a treat for the senses.
Boris: Delighted to have you at this splendid gathering, my friends.
Rishi: I must extend my deepest gratitude, Boris, for this magnificent feast you've arranged. It's truly a treat for the senses.

For your reference, the differences across the files are:

  • "all here" from the first line has been removed.

  • The second line was removed completely.

The software should output something like the following:

File 1: "Boris: Delighted to have you all here at this splendid gathering, my friends." File 2: "Boris: Delighted to have you at this splendid gathering, my friends." [Similarity: 80%]
File 1: "Rishi: I must extend my deepest gratitude, Boris, for this magnificent feast you've arranged. It's truly a treat for the senses." File 2: "Rishi: I must extend my deepest gratitude, Boris, for this magnificent feast you've arranged. It's truly a treat for the senses." [Similarity: 100%]
Overall similarity: 66.7%

How did the non-coder attempt it?

Create me a piece of code that will compare two written documents and match similar lines between them. I'd also like it to provide a percentage of similarity between the content of the two documents.

๐Ÿ‘†
The first prompt
import difflib

def compare_documents(file1, file2):
    # Read the contents of the documents
    with open(file1, 'r') as f1, open(file2, 'r') as f2:
        lines1 = f1.readlines()
        lines2 = f2.readlines()

    # Find similar lines between the documents
    matcher = difflib.SequenceMatcher(None, lines1, lines2)
    matching_blocks = matcher.get_matching_blocks()
    similar_lines = []
    for block in matching_blocks:
        if block.size > 0:
            similar_lines.extend(lines1[block.a:block.a+block.size])

    # Calculate the percentage of similarity
    total_lines = max(len(lines1), len(lines2))
    similarity_percentage = len(similar_lines) / total_lines * 100

    # Print the similar lines and similarity percentage
    print("Similar lines:")
    for line in similar_lines:
        print(line.strip())

    print("Similarity percentage: {:.2f}%".format(similarity_percentage))

# Usage example
compare_documents("document1.txt", "document2.txt")
๐Ÿ‘†
ChatGPT's response

ChatGPT's response also contained an explanation of the code which I won't reproduce here. Though I found it overly verbose, my partner did read through it all. Note how ChatGPT chose the programming language to use.

My partner correctly surmised that they needed to copy this and save it as a file. As they mentioned this I showed them VS Code and showed them how to create a new file. They were wise to mention that if I didn't point this out they would be lost. From that point on I became more conscious of not giving hints.

In general, throughout this exercise, it was clear that using the tools (VS Code/Python/Terminal) was in itself a challenge that ChatGPT isn't able to assist with effectively. Because how do you describe to ChatGPT what you are seeing? A lot of programming is knowing how to deal with the tooling, so this proved challenging during this exercise.

But even outside that, my partner wasn't sure where to change the filenames to point at their files. They correctly modified the document1.txt and document2.txt but thought that they also had to change the file1 and file2 variables in the function arguments and function body. As they were doing this I let them know that this was the wrong path.

Their next prompt was "How do I execute this code in Visual Studio Code software?" and ChatGPT responded with the following:

To execute the code in Visual Studio Code, follow these steps:

  1. Open Visual Studio Code and create a new Python file. You can do this by selecting "File" > "New File" or using the shortcut Ctrl+N (Windows) or Cmd+N (Mac).

  2. Copy the code provided into the new Python file.

  3. Save the file with a .py extension, for example, document_comparison.py.

  4. Make sure you have Python installed on your machine. You can check by opening a terminal window within Visual Studio Code and running the command python --version. If Python is not installed, you'll need to install it from the official Python website (https://www.python.org/downloads/).

  5. Once Python is installed, you need to install the difflib module, which is used in the code. Open the terminal window in Visual Studio Code and run the command pip install difflib.

  6. After installing the difflib module, you're ready to run the code. In Visual Studio Code, go to the terminal window and navigate to the directory where your Python file is located using the cd command. For example, if your file is in the "Documents" folder, you can use the command cd Documents to navigate to that folder.

  7. Run the code by typing python document_comparison.py in the terminal and press Enter. Replace document_comparison.py with the actual name of your Python file if different.

  8. The code will execute, and the output will be displayed in the terminal window.

Make sure you have the text documents you want to compare in the same directory as your Python file or provide the full paths to the documents in the compare_documents function.

That's it! You should now be able to execute the code in Visual Studio Code and see the results in the terminal.

This explanation was fair (according to me) and overwhelming (according to my partner). But it did not explain where to find the terminal, so my partner's next prompt was "Where is the terminal in Visual Studio Code", which gave the following response:

In Visual Studio Code, the terminal can be accessed through the integrated terminal feature. Here's how you can open the terminal:

  1. Open Visual Studio Code.

  2. From the top menu, go to "View" > "Terminal". Alternatively, you can use the keyboard shortcut Ctrl+ backtick () on Windows/Linux or Cmd+ backtick () on Mac.

  3. The terminal will appear at the bottom of the Visual Studio Code window.

Once the terminal is open, you can navigate to the directory where your Python file is located using the cd command. For example, if your Python file is in the "Documents" folder, you can use the command cd Documents to navigate to that folder. Then you can execute the Python file by running python filename.py in the terminal, where filename.py should be replaced with the actual name of your Python file.

You can also use the terminal to install any necessary dependencies or packages using the pip command. In the case of the provided code, you need to install the difflib module, which you can do by running pip install difflib in the terminal.

Remember to replace filename.py with the actual name of your Python file when executing it in the terminal.

This enabled my partner to open a terminal in VS Code and they were able to paste in the command. Terminal text editing is not something that most people are used to, since it works so much differently to ordinary text editing on all operating systems. My partner attempted to edit the filename by selecting it and overwriting it as you would in a typical text editor. This didn't work. This wasn't a big obstacle for long though as they figured out that they can remove each character one-by-one with the backspace. After some back and forth, dealing with incorrect file extensions, and the terminal, they were able to run the script. This gave them an error because the filename they specified in their script wasn't pointing at the right directory (their files were in a sub-folder, but they were just pointing at the current working directory).

I'll spare the details, but suffice it to say my partner ran into a couple of gotchas:

  • \ (directory separators) on Windows which Python was interpreting as escape sequences. Not something obvious to non-coders. ChatGPT was able to suggest using raw string literals correctly.

  • Encoding issues with one of the files. ChatGPT was able to offer a solution.

One impressive interaction was when ChatGPT showed snippets of code and was then asked "How does that fit into the existing code?", to which ChatGPT reproduced the original code with the required changes.

The Code Executes

Eventually, my partner managed to resolve all the issues and get a successful run. What became evident though is that the code did not do what it was supposed to.

For the two example files I gave in the first section, it returned the following:

Similar lines:
Rishi: I must extend my deepest gratitude, Boris, for this magnificent feast you've arranged. It's truly a treat for the senses.
Similarity percentage: 33.33%

It seems to have only found the lines that are an exact match. In response, my partner asked ChatGPT about the incorrect result, to which it gave a lengthy explanation with many possibilities that weren't particularly helpful. Then they asked ChatGPT the following: "Can you edit the code to find matches in sequences of three words or more together?"

ChatGPT obliged and generated the following code:

import difflib
from nltk import ngrams
from nltk.tokenize import word_tokenize

def compare_documents(file1, file2):
    # Read the contents of the documents
    with open(file1, 'r', encoding='utf-8') as f1, open(file2, 'r', encoding='utf-8') as f2:
        lines1 = f1.readlines()
        lines2 = f2.readlines()

    # Combine lines into single strings
    text1 = ' '.join(lines1)
    text2 = ' '.join(lines2)

    # Tokenize the texts into words
    words1 = word_tokenize(text1)
    words2 = word_tokenize(text2)

    # Generate n-grams of length 3 or more
    ngrams1 = set(ngrams(words1, 3))
    ngrams2 = set(ngrams(words2, 3))

    # Find matching n-grams
    matching_ngrams = ngrams1.intersection(ngrams2)

    # Convert n-grams back to original lines
    similar_lines = [' '.join(ngram) for ngram in matching_ngrams]

    # Calculate the percentage of similarity
    total_lines = max(len(lines1), len(lines2))
    similarity_percentage = len(similar_lines) / total_lines * 100

    # Print the similar lines and similarity percentage
    print("Similar lines:")
    for line in similar_lines:
        print(line.strip())

    print("Similarity percentage: {:.2f}%".format(similarity_percentage))

# Usage example
compare_documents("document1.txt", "document2.txt")

This resulted in a lengthy troubleshooting session. First to install nltk, then to fix pip issues, then to ask nltk to download punkt (by using nltk.download('punkt') ). That last step in particular was frustrating to watch because ChatGPT was instructing my partner to run the nltk.download function in a Python REPL. Something which proved challenging since it required knowing how to open a Python REPL, then how to exit it.

What was impressive was that through the frustration my partner showed to ChatGPT, it managed to understand that this nltk function call should be put into the original script and instructed my partner to do so.

Unfortunately, the code ChatGPT generated never produced what was desired. It created many strange bugs, calculating things as simple as percentages incorrectly. The final code it produced was the following:

import difflib
import nltk
from nltk.tokenize import word_tokenize

# Download the "punkt" resource
nltk.download('punkt')

def compare_documents(file1, file2):
    # Read the contents of the documents
    with open(file1, 'r', encoding='utf-8') as f1, open(file2, 'r', encoding='utf-8') as f2:
        lines1 = f1.readlines()
        lines2 = f2.readlines()

    # Combine lines into single strings
    text1 = ' '.join(lines1)
    text2 = ' '.join(lines2)

    # Tokenize the texts into words
    words1 = word_tokenize(text1)
    words2 = word_tokenize(text2)

    # Generate n-grams of length 3 or more
    ngrams1 = [words1[i:i+3] for i in range(len(words1)-2)]
    ngrams2 = [words2[i:i+3] for i in range(len(words2)-2)]

    # Find matching n-grams
    matching_ngrams = set(tuple(ngram) for ngram in ngrams1 if ngram in ngrams2)

    # Convert n-grams back to original lines
    similar_lines = []
    for ngram in matching_ngrams:
        start_index1 = words1.index(ngram[0])
        start_index2 = words2.index(ngram[0])
        end_index1 = start_index1 + len(ngram)
        end_index2 = start_index2 + len(ngram)
        similar_lines.append(' '.join(words1[start_index1:end_index1]))

    # Calculate the percentage of similarity
    total_lines = max(len(lines1), len(lines2))
    similarity_percentage = len(similar_lines) / total_lines * 100

    # Print the similar lines and similarity percentage
    print("Similar lines:")
    for line in similar_lines:
        print(line.strip())

    print("Similarity percentage: {:.2f}%".format(similarity_percentage))

# Usage example
compare_documents("document1.txt", "document2.txt")

For the two example files I gave in the first section, it returned the following:

Similar lines:
truly a treat
Boris : Delighted
treat for the
to have you
must extend my
feast you 've
a delightful spread
gratitude , Boris
I must extend
Delighted to have
. Boris :
Rishi : I
. Boris :
gathering , my
's truly a
for this magnificent
splendid gathering ,
my friends .
arranged . It
this splendid gathering
magnificent feast you
Boris : Delighted
, my friends
extend my deepest
: Delighted to
deepest gratitude ,
at this splendid
It 's truly
for this magnificent
the senses .
, my friends
you all here
've got a
this splendid gathering
: Delighted to
, my friends
my friends .
Similarity percentage: 1233.33%

Something very wrong has happened as the output is completely wrong. The percentage calculation is wildly off.

Conclusion: Fail

In conclusion, this experiment was a failure. Not only did ChatGPT not manage to generate valid code for the prompts given. It dug its own grave even further as my partner attempted to get it to fix the code.

In addition, the tooling challenges mean that it's easy to fall into traps, which at best are a time sink but at worst make it very difficult to move forward. Maybe ChatGPT will eventually be advanced enough to be able to see the user's development environment and understand exactly what's going wrong. But right now it's not there yet.

Addendum: How does a developer fare?

Since my partner needed a working version of this and I needed to write it I decided to do so. But I wanted to first see if my more developer-optimised prompts would fare better with ChatGPT 3.5.

This is what it produced for me:

import difflib
import re

def compare_files(file1_path, file2_path, similarity_threshold):
    with open(file1_path, 'r', encoding='utf-8') as file1, open(file2_path, 'r', encoding='utf-8') as file2:
        lines_file1 = file1.readlines()
        lines_file2 = file2.readlines()

        total_similarity = 0
        total_comparisons = 0

        for line1 in lines_file1:
            if not line1.strip():  # Skip empty lines
                continue

            words1 = re.findall(r'\w+', line1)  # Extract words from line1
            for line2 in lines_file2:
                if not line2.strip():  # Skip empty lines
                    continue

                words2 = re.findall(r'\w+', line2)  # Extract words from line2

                similarity = calculate_similarity(words1, words2)
                if similarity >= similarity_threshold:
                    print(f'File 1: "{line1.strip()}" File 2: "{line2.strip()}" Similarity: {similarity}%')

                total_similarity += similarity
                total_comparisons += 1

        overall_similarity = total_similarity / total_comparisons * 100 if total_comparisons > 0 else 0
        print(f'----\nOverall similarity: {overall_similarity}%')


def calculate_similarity(words1, words2):
    # Use SequenceMatcher from difflib to calculate similarity ratio
    matcher = difflib.SequenceMatcher(None, words1, words2)
    similarity = matcher.ratio() * 100
    return similarity


# Usage example:
file1_path = 'file1.txt'  # Replace with the path to your file 1
file2_path = 'file2.txt'  # Replace with the path to your file 2
similarity_threshold = 90  # Adjust the threshold as desired

compare_files(file1_path, file2_path, similarity_threshold)

Here is what this script produces for the two example files in the first section:

File 1: "Boris: Delighted to have you all here at this splendid gathering, my friends." File 2: "Boris: Delighted to have you at this splendid gathering, my friends." Similarity: 91.66666666666666%
File 1: "Rishi: I must extend my deepest gratitude, Boris, for this magnificent feast you've arranged. It's truly a treat for the senses." File 2: "Rishi: I must extend my deepest gratitude, Boris, for this magnificent feast you've arranged. It's truly a treat for the senses." Similarity: 100.0%
----
Overall similarity: 3814.5257868240087%

Better, but it messes up the overall similarity percentage. When I asked it to fix the issue it - hilariously - just capped the overall_similarity to 100 using min.

One redeeming aspect of this is that when I asked ChatGPT 4.0 to fix the above code, it seemed to have done a pretty good job of it. Though it has managed to introduce other silly bugs into the code. At this point, I decided to fix the code myself.

So even when used by a developer, ChatGPT leaves a lot to be desired. For non-coders the bar is even higher, and clearly ChatGPT is not there yet.

ย