AI Bet!

Jan 19, 2023

42 Comments

Jan 19, 2023

The real question is whether the AI’s exam performance means anything at all. Studies show very little overlap between what AIs do in school and the skills they actually need on the job.

Reply (1)

forumposter123@protonmail.com

Jan 19, 2023

My very limited experience with ChatGPT is that it will give you a shallow summary of something with a lot of data on the internet without taking much of a side.

That's probably passes the test in many tasks but not all.

Do you need mediocre but cheap answers to things without deep understanding? We've got a Voxsplainer writer in a box!

Reply (1)

SolarxPvP

Jan 19, 2023

This is partially because the designers are terrified of it being offensive. They have explicitly said they've tried to make it as inoffensive as possible.

Reply (1)

forumposter123@protonmail.com

Jan 20, 2023

Ok, but I asked it a question about my industry that doesn't touch on race or sex or anything and the output was just as mediocre.

Jason Crawford

Jan 19, 2023

If your goal is to actually identify breakthrough technologies even slightly ahead of the curve, then I don't think it's helpful to apply base rates, for this exact reason. You will always predict “no”, you will be right 95+% of the time, and you will miss every transformative technology until it's too obvious to ignore.

I think AI is on a strong trajectory to be extremely useful, but I'm not sure I would take this bet. “Passing exams” is not an economically useful function (except to students who want to cheat?) and it's not clear to me that AI will be engineered or optimized for this. If you picked something with a clear economic value, like generating marketing copy or writing scripts for TV and movies, I would be much more likely to take the bet.

Reply (3)

Calion

Jan 19, 2023

https://astralcodexten.substack.com/p/heuristics-that-almost-always-work

Byrel Mitchell

Jan 19, 2023

If you interpret 'apply a 95% negative base rate' as 'just say no to all transformative techs', then of course you're right. But that's not really how one should apply a base rate. You just use Bayes rule, and allow the negative base rate to pre-weight your odds that a given tech will be transformative appropriately low.

Reply (1)

Jason Crawford

Jan 19, 2023

Good point, but if you're really seriously doing that then I don't see how you could dismiss everything that AI has just become capable of in the last couple of years. That is an extremely strong trajectory towards some very fundamental capabilities—far more than enough to overcome 19:1 odds.

Reply (1)

Byrel Mitchell

Jan 19, 2023

This boils down to what we mean by transformative, at least in my view. I mean, my personal evaluation is that AI is 90+% likely to be very useful as a tool in many fields by 2030. It's FAR less likely to replace entire fields. I'm not clear exactly what Bryan is estimating here.

Dave Friedman

Jan 21, 2023

This seems like the correct interpretation to me. In any event, ChatGPT (or a similar tech) purportedly has passed assorted medical exams and bar exams. So I don't know what insight is gained by this bet. You can make a test arbitrarily difficult, such that ChatGPT or its future descendants can't pass it, but what does that prove other than arbitrary difficulty?

Reply (1)

Calion

Jan 21, 2023

Since he expects his students to pass it, it can't be arbitrarily difficult.

Ferran Casarramona

Jan 19, 2023

D is not bad for a guy that didn't attend to your lectures.

NJS

Jan 19, 2023

Shouldn't you use a third-party grader or even a set of graders? Grading is inherently subjective. What you consider a D, another professor might consider a C depending on the rubric, their mood, student quality, etc. And even if we assume no progress in this technology, which seems unlikely, a beta version of a new tech scored a marginally passing grade in an advanced economics course - probably as good or better than a substantial percentage of all college students in the country. That seems pretty amazing to me.

Reply (1)

SolarxPvP

Jan 19, 2023

Caplan's exams also seem hard. His grading seems particularly demanding (as his Rate My Professor reviews confirm).

Kurt

Jan 19, 2023

How about blinding the AI’s exam by including it with all other student’s exams for grading? That way, Brian won’t know whether he’s grading a human student or the AI.

Reply (1)

SolarxPvP

Jan 19, 2023

Seems fun, but I don't think Caplan is that biased.

Reply (1)

SolarxPvP

Jan 19, 2023

As in it would be fun to see Bryan's reaction to it being an AI.

Kenny Easwaran

Jan 19, 2023

Wouldn't the right way to do this be to include the AI test among the exams you actually grade during the semester, without identifying it as an AI test? Grading without knowing the identity of the student who wrote the test is probably good for a variety of reasons (though it can introduce complications if you're dealing with essays that students have worked on drafts of) and would make the test more fair.

Shasta

Jan 19, 2023

You should do this in a blinded way! You likely will grade the AI very differently because you know it is an AI. My old econ teacher used to do this to avoid bias – have students write their name on the back of the last page.

This is probably the first Bryan bet I've thought he was way off the mark on. Exciting!

Enrique Guerra-Pujol

Jan 20, 2023

I have a feeling that Caplan will either become an especially hard grader or that he will lose this bet!

William Connolley

Jan 19, 2023

Now we need a prediction market on this bet. I'd go for the AI's side, certainly at evens.

Danno28

Jan 20, 2023

Somebody posted this joke on twitter, apologies I forgot who, but i think it is highly relevant -

I was in the park the other day, and walked past a man playing chess against a dog. "Wow," I said , "That's a smart dog."

"Not that smart," the man replied. "I'm winning 3 games to 1."

Seriously, what % of the population could get a D or higher on an labor econ midterm. Maybe 10%?

For certain tasks the ChatGPT is already outperforming humans (eg some coding tasks, organizing rough notes in to a coherent structure). It's underperforming on internal consistency of answers and general knowledge. But I can't imagine those things won't be fixed in six years.

JSM

Jan 19, 2023

One thing I don't understand: if Matthew is right, why would he pick the 6 latest midterms from ~2028? If he's right, professors might be forced to change their assignments and midterms by that point. I think you should the 6 latest midterms from today, not from 6 years from now.

Additionally, by allowing "any AI selected by Matthew" does that mean you'd allow Matthew to train an AI on your class lectures and midterms? Because if so, there's a chance ChatGPT could pass right now with the right training.

Infinita City

Jan 19, 2023

You miss 100% of the moonshots you don't take - that's the problem with the base rate argument

That said, I think you're correct when it comes to Generative AI

I think ChatGPT was a PR stunt for potentially more valuable but far less flashy use cases, such as B2B automation, data aggregation, the workplace etc.

There is a reason that Microsoft is the biggest investor in OpenAI

Maxim Lott

Jan 20, 2023Edited

My prediction: By 2029, it will be common knowledge that AI aces college exams, in general.

However, Bryan's exams are idiosyncratic enough that the AI might not quite hit this high grading bar, due to being trained on conventional economics textbooks (Krugman etc.) So I think Bryan will win the bet. The AI would need to be trained on his lecture transcripts to avoid this issue.

Andrea

Jan 20, 2023

"2. Bryan will then grade the AI's work, as if it were one of his students"

How will you know you'll be fair? Will you accept the 6 manuscripts shuffled in between your students and grade anonymously?

ipsherman

Jan 20, 2023

Great bet! I've posted this elsewhere, but you (and other commenters) may be interested in seeing a working data scientist's opinion about ChatGPT that I wrote about a month ago, wherein I more-or-less agree with the sentiment of "grossly overpromising and underdelivering:" https://ipsherman.substack.com/p/an-opinion-about-ai-chatgpt-and-more

Unrelatedly, in 2021 I did a post on how much to worry about COVID for kids. I wouldn't usually comment at all, let alone about something unrelated, but in this post I refer to Kahneman’s maxim as well (using the same terminology!): https://ipsherman.wordpress.com/2021/09/11/why-i-dont-make-my-kids-wear-masks/ <- I was (at least partially) inspired to write this and a previous post by your questions about how much worse was COVID than the normal flu.

Thank you Professor Caplan for your years of insightful, prolific, social-desirability-bias-eschewing blogging!

Bet On It

AI Bet!