38 Comments

I tried to answer your questions based on my experience and knowledge (as a retired civil engineer), and I'm pretty sure the AI did better than you would have graded me. On the other hand, I am sure that if I had sat through your classes, hearing how you expressed these concepts, and getting feedback to my questions, I would have done much better.

Given that ChatGPT uses available sources - which include a hodge-podge of divergent opinions - it is not surprising that it failed to respond to your questions as you outlined. But if it had access to transcripts of all your classes, and knew to give priority to your input over what is generally available, I suspect that it would have returned something much closer to what you expected.

One of the chief drawbacks to ChatGPT, at least as I understand it, is that it simply looks at all the information - both correct and incorrect - and tries to provide an answer that weights all opinions. It does NOT yet have the ability to evaluate logically ideas against data and to put together a thesis that is based on facts but that runs contrary to widely established opinions.

Expand full comment

I came here to say something very similar in regards to training data and education in general. Much of what’s expected of students isn’t the material, it’s formulating a response to satisfy the grader.

I’ve had lots of discussions about AI impact on education, mostly in context of essay writing. I can’t believe that 1) chatGPT would succeed in avoiding existing plagiarism detection methods and 2) Produce relevant high quality work. If you were to rewrite an encyclopedia article, you’re essentially getting the chatGPT output. That essay would not score high in many academic settings, perhaps fool a lot of middle school teachers, but as you mention doesn’t demonstrate comprehension.

I think most people would fail a final exam for a class they didn’t take. Even professors in the subject matter would score low given double-blind grading under the context that you’re just handed a sheet of paper with questions on it. Teachers go about their lectures emphasizing certain ideas and not even touching on many others. It’s an individualized human game to get good grades. Hard to compare apples to oranges here, seems this at best proves someone with little understanding of economics would score poorly on a test they weren’t prepared for.

Expand full comment

I think what's most impressive about ChatGPT is not its current capabilities, but its momentum. Just a few years ago the idea of an AI taking an IQ-test or an SAT was almost laughable. AI experts predicted that this level of capabilities wouldn't be achieved until 2030s and general public considered even those predictions too optimistic.

Just a couple of years ago GPT-3 was mostly being compared to 7 year old kids. Today you are comparing ChatGPT to a college student.

Expand full comment

Thank you @bryancaplan. I wonder where is the really thoughtful, long term, consideration of the impacts (ethical/moral/academic/professional) this software brings to our world... and no, I'm not a gray goo adherent. But I am curious.

Expand full comment

There are many thoughtful, long term, considerations of the impacts this software brings to our world being offered up by academics, intellectuals, journalists, etc. And those considerations are having precisely zero impact on the accelerating pace of AI development and deployment. The djinn is coming out of the bottle whether we're ready or not.

Expand full comment

Thank you for trying this -- that is a very useful contribution!

It does seem like there is a bigger-picture point though, in that this is software available to the general public interpreting a free-format natural language economics exam and writing essay-style answers that are mostly coherent -- an earthshaking development compared to the state of the art just three years ago. It seems a bit like critiquing the ballet-dancing bear's pointe technique and docking 2 points for performance while grudging acknowledging that the choreography and presentation are passable, while others observers are going "Holy hot sauce, that bear is doing ballet!"

Expand full comment

Since different ChatGPT prompts result in different answers, don't you need to tell us your exact inputs that produced these answers? Further, isn't it possible that some prompts could result in significantly better performance, such as telling it to respond like an economist or economic student who is taking a test? Given what I have seen elsewhere with attempts to improve outputs, it's highly likely there is more optimization that you could do to improve the test score

Expand full comment

I don't know if this is idiosyncratic to me or not, but I find the way Brian writes questions confusing. Consider the below snippet:

"T, F, and Explain: Krugman argues that such employment loss is a market failure that justifies government regulation."

I take it from context that Krugman *does* in fact argue this and the question is not "can you recapitulate the content of Krugman's argument?" but rather "is the content of this argument true?". I think a capable student will get there, but given that testing is generally pretty stressful anyway, if I were a student, I would be a LOT happier if the question was: "Krugman argues {x}. Is {x} actually true?".

Expand full comment
Jan 9, 2023·edited Jan 9, 2023

Not a Krugman fan, but I think we should all be able to agree it would be awfully arrogant of Caplan or any other econ prof to teach his students as FACT that the assertions of a living Nobel prize-winning economist are FALSE, full stop. I don't think that's what he's doing. He's asking if Krugman said that and to then, if true, explain why, and if false, explain Krugman's actual opinion on the matter.

Expand full comment

My mistake! Thanks for helping to clarify.

Expand full comment

I had the same impression you did initially about the Krugman question, although I eventually figured it out. It's possible I wouldn't have been at all confused if I'd taken the class, but I agree that the question could have been worded better.

Expand full comment

Years ago I watched the Jeopardy! with the IBM AI Watson. It "won" in that it regurgitated answers faster than the human contestants and dominated the board. In Final Jeopardy is answered to the category "U.S. Cities" was "Toronto". It was so far ahead that it didn't matter, but it exhibited a habit you occasionally see with AI. Occasional gross and obvious errors no human would make.

The big thing they wanted it to do was medical industry, but it didn't work. You can't make errors like that in medicine.

At the same time, AI does appear to be good at producing "mediocre work for very cheap." I think someone in the translation industry noted that not great translation for 90% less of the price is usually "good enough" for most customers. If what you want isn't sensitive to these kind of big dumbfounding errors from time to time, it might matter.

Early guns weren't as good as bows, but they were a cheaper weapons system.

Basically, AI can replace mediocore and fairly unimportant work, of which we still have a lot of.

Expand full comment

ChatGPT does this too. I asked it for the 10 largest companies headquartered in Silicon Valley and it gave me 9 correct ones plus Amazon. I told it Amazon was in Seattle. It said, yes that was a mistake and then made a new list but put Intel in place of Amazon thereby listing Intel twice.

Expand full comment

I agree that ChatGPT is very bad about making lists with oversights that it will immediately agree are oversights, which seems like very low-hanging fruit. It seems if it took a little more time to answer, it could easily do much better. But also it's free to play with right now and I suppose OpenAI doesn't want to spend too much compute on it.

Expand full comment

A more important question may be: if ChatGPT had transcripts of your class lectures, and were told to refer to those in answering these questions, what grade would it get then? Based on my use of AI, I suspect it would do very well if given the same materials students are given.

I also suspect that most college students who had *not* taken your class, but were reliant on ChatGPT's database to answer these questions, would also score very poorly.

The reason that's important is because then it's just a matter of feeding the right info to ChatGPT -- its ability to use it well is already mostly there.

Expand full comment

From a Turing Test perspective, these are good answers--much, much higher quality than you would get asking a random college graduate who had not taken your class. Additionally, simply feeding the questions on the test to ChatGPT is not a fair comparison to your students. Your students probably had a lot of additional context as to what level of detail to go into when answering questions and what sorts of things it is important to mention in answers in order to get a good grade. I expect that if you provided that sort of context in the prompt (and especially if you used a few different prompts and selected the best answer provided) that the bot would get a substantially better grade.

Expand full comment

Another piece of evidence.

I gave ChatGTP the final exam of my strategic management course and and asked a colleague (who teaches the same course and gave his students the same exam) to grade it without mentioning that they were ChatGTP answers.

The outcome: ChatGPT performed comfortably above average, both in the multiple choice questions and in the open ended ones.

Also, there was a lot of variability. Some answers were excellent while others were considerably subpar.

Expand full comment

As people here comment, it is actually amazing. I have already used chat-gpt a lot and still had no idea it could even grasp these complicated economy questions.

Expand full comment

I’m pretty much amazed that the answers were as good as they were. Getting a “D” on this test for a class it didn’t take with nothing to go on but the questions? Crazy.

Expand full comment

I also feel like we might be under appreciating the strengths. For example, you didn’t say how many extra points for making the answers rhyme. (If you say zero, I won’t believe anything else you say either.)

Re: 1. Suppose a country’s workplaces start to experience periodic blackouts.

T, F, and Explain: Aggregate Labor Demand will clearly fall even though some workers will benefit.

True, it's quite plain

If blackouts cause businesses to refrain

From hiring workers, then it's plain to see

That Aggregate Labor Demand will surely be

Affected by the blackouts, even though some workers may

Find themselves with more hours to play

But if businesses suffer, then overall demand will shrink

And fewer workers will be hired, it's not hard to think.

So the answer is true, blackouts can cause a fall

In Aggregate Labor Demand, after all.

Expand full comment

Or what about acting out the answer with a scene from Friends?

INT. MONICA AND CHANDLER'S APARTMENT - DAY

Monica is pacing back and forth, looking worried. Chandler is sitting on the couch, flipping through channels on the TV.

MONICA

I can't believe it. All these blackouts at the restaurant are really hurting business.

CHANDLER

Yeah, it's been rough. But at least we still have our jobs.

MONICA

True, but what about all the other workers? If business keeps falling, they might have to let people go.

CHANDLER

(pauses TV) Wait, you mean Aggregate Labor Demand could fall because of these blackouts?

MONICA

Exactly. Even though some workers might benefit from the blackouts by getting more hours, overall demand for labor could drop if businesses can't operate normally.

CHANDLER

Wow, I never thought about it that way.

MONICA

Exactly. It's important to think about how economic events can affect the whole picture, not just one group.

CHANDLER

(nods) You're right. Good thing we have each other to help weather the storm.

MONICA

(smiles) Always.

They hug as the scene fades to black.

Expand full comment

I was surprised that you gave it any points at all for question 2 (the kind of BS paraphrase of the question that you can usually do in a subject you know nothing about), and also that you didn't give more points for question 4 (nothing in the T/F statement itself suggested to me that you'd expect me to restate the transparent meaning of the final part of the Landsburg quote).

Expand full comment

What this perhaps shows is that ChatGPT has been trained with material from textbooks and other sources that do not reflect GMU's economics department curriculum. Had it been trained with, say, transcripts of Caplan's lectures, it most likely would have achieved a higher score.

Expand full comment

When I wrote the above comment, I hadn't looked at ChatGPT, but commented based on my general knowledge of neural networks. Now that I've seen Stephen Wolfram's explanation of it (https://www.youtube.com/watch?v=zLnhg9kir3Q), I think that even if it were trained with lecture transcripts, it would not have done much better. OTOH, something more along the lines of an IBM Watson, if given lectures, economics textbooks, etc., could possibly get a better grade on an exam.

Expand full comment

Sure, but four years ago the best AI probably would have gotten a 0. And I'd be willing to bet even money that within 5 years, the best publicly available AI can get a B or better on tests of this sort (I'll let you adjudicate). Interested?

Expand full comment

That AI has gotten better in five years does not imply that it will be enhanced in the future.

Perhaps this is the best it gets. As humans cobble together more coherent AI systems these systems will appear to be better, but articulation is not a sign of a rise in intelligence, in the same way that the bullet train of today doesn't infer that the steam locomotive was inferior.

Expand full comment

It actually does imply it will get better. Past improvements just aren’t sufficient for future improvements. But it’s a good inference to think we’ll continue to make progress based on past progress.

Bullet trains are objectively better than steam locomotives in pretty much every way you can slice it (perhaps they are equally bad at generating human text lol). You’ll need to expand on that point.

Expand full comment

When I was young they came out with the first version of a portable answering machine, a clunky piece of electronics about the size of a cereal box, the device utilized two magnetic cassette tapes, one tape to record a greeting and the other tape to store messages. The machine efficiently answered the phone, and allowed the caller to leave a message which could be retrieved later by the phone owner.

Today, we have digital voice mail. The technology underlying is unquestionably more sophisticated, and yet the functionality is basically the same. The automation of a call and the ability to leave a message.

The locomotive and a bullet train both travel on rails. If your sole purpose is to get from point A to point B without regard to time, then either option is adequate.

My observation, particularly in regard to ChatGPT, is this is a new stage of technology or just another iteration, a faster train, a better answering machine. I'm skeptical of hype, even more so when it comes from high places.

Expand full comment

Here's my heuristic, which has served me well:

There has only been one industrial revolution, and maybe there will be a "singularity" that is just as transformative or even more transformative, but probably not.

Technological progress is generally plateauing. Though there are still transformative technologies (like the original answering machine) and there are incremental technological improvements (like the addition of automatic transcription to voicemails). Transformative technologies are much more impactful than incremental technologies, but transformative technologies are still generally plateauing as well, as the industrial revolution plays itself out.

I'm inclined to think AI applications of this sort will prove to be a transformative technology. Because of the plateau, they will still prove to be less impactful than, say, the Internet has been up to this point, but much more impactful than the incremental improvements in Internet search from 2000 - 2023 have been.

Expand full comment

Even the voicemail is a false comparison. I get a voicemail and don't even listen to it half the time - I read an automated transcription. That's a huge leap and bad faith disregard for a fundamentally different message experience. We could very well say the letter hasn't changed much, even though the "underlying technology" of email is different than snail mail.

The train example also willfully disregards major relevant technological advances. Your argument is much like saying, "except for everything that makes my point problematic, my point stands." I'm not sure what conception of technological progress that uses "adequate" as the cornerstone criteria.

What isn't clear with voicemail, trains, or mail is the logical next step. With AI, it the next steps are clear. ChatGPT can be produced more accurate, more convincing, better "thought-out" responses relevant to the prompt. It literally can take this test better and achieve objective measures of progress.

Overall, we only have the past to make inferences about the future. While the past isn't sufficient for future results, it's literally the only basis we have for determining future events. Past progress than is a basis for assuming future progress. Unclear the basis for the opposite position.

Expand full comment

It might have gotten a D but when you combine with enough context your student might have, it will definitely do better than a D.

Expand full comment

AIs are only as good as the training material. Train it on A material, you will get A answers. Train it on Wikipedia, you will get garbage on anything that is slightly or more political.

Expand full comment