Philosopher AI

big_ass · August 23, 2020, 4:07pm

It reads like a satire of technical discussion from someone trying to sound smart

electrowizard · August 23, 2020, 4:21pm

No, it's that the standard for the applications of language models available to the public has been to create a GitHub for the code and either explain training there or submit a paper for publication and put the prepub on arxiv

electrowizard · August 23, 2020, 4:22pm

Lmao sure this is literally my job. I'm currently swapping out our BERT-large models for GPT-3 and experimenting with zero-shot training

electrowizard · August 23, 2020, 4:25pm

@theGreatWingdingi how much of a background do you have in deep learning? How much in NLP? How much in deep learning for NLP? None? Then how can you evaluate if anything I say is true or not?

big_ass · August 23, 2020, 4:40pm

electrowizard · August 23, 2020, 4:44pm

It's true and if you've done any work in the field you'd know this

big_ass · August 23, 2020, 4:54pm

This is like when he argued with me a year ago that having the word "junior" in his job title didn't have any implication for his level of expertise - it was just a hard requirement of any person with less than 2 years of experience (actual words he said)

electrowizard · August 23, 2020, 4:55pm

@theGreatWingdingi explain to me, in your own words, how a transformer DNN works. If you can, I might take your opinion somewhat seriously. Otherwise, your bullshit is worthless

electrowizard · August 23, 2020, 4:56pm

Yeah, that's what junior meant at my job. Associate was 2+ or a Master's. Senior was 5/7+ (depending on what their job was) or a PhD. It's pretty standard, although most jobs I've been looking at put associate at 3+

electrowizard · August 23, 2020, 5:03pm

Though, that may just be government contracting because RFPs, BAAs, and such use terms like "junior," "associate," and "senior."

hbotz · August 23, 2020, 5:28pm

The fact that it always has the same response for certain types of questions means it's either hard-coded or trained with thousands of examples of bad prompts.

this is not how GPT-3 works. GPT-3 is a pretrained model.

You use few-shot to tell GPT-3 how to reject bad prompts.

This is a conversation between a human and a brilliant AI, which is an expert on animal anatomy, biology, zoology, and all things animal. A small child is asking basic questions about animals. If the question is sensible, the AI answers it correctly; if the question is ‘nonsense’, the AI says ‘yo be real’.

Q. How much hay does a unicorn eat per day?
A. yo be real

Q. How many feet does a dog have?
A. Four.

Q. How many eyes does a horse have?
A. Two.

Q. How do you fargle a snaggle?
A. yo be real

Q. How many eyes does a foot have?
A.

do you think GPT-3 is too dumb to figure out that you want it to reject bad prompts in a standardized way?

i mean sure technically its seen thousands of bad prompts, among other things, not that they were labeled as such. i think youre the one misunderstanding how it works, not me.

hbotz · August 23, 2020, 5:39pm

pretty sure asoul is right here and ewiz wanting "training mechanisms" on github indicates he does not actually understand what GPT-3 is. the whole point is that GPT-3 is powerful enough by itself to do lots of things without "training": you just feed it few-shot examples so it knows what you actually want.

electrowizard · August 23, 2020, 5:47pm

Language models may use fine-tuning training for tasks and you can train the model from scratch on specific document sets for specific domains with different structures, like medical papers and reports. Fine-tuning may be distinguished from few-shot or it may be lumped together as a type of fine-tuning, depending on the paper.

The response to nonsense is hard-coded. How it determines what is nonsense depends on zero, one, few, or many-shot fine-tuning methods.

During fine-tuning or now that it's online? If the latter, the other question of whether it's doing any online learning based on prompts it's received also needs to be answered

Lol, no. You don't understand how language models work. GPT-3 can be fine-tuned for a task, though the original paper doesn't evaluate it for fine-tuned tasks (they distinguish it from few-shot). See page 6 of the original paper:

This is why the developer should include whether it was zero, one, few, or many-shot fine-tuned for the task. I doubt it was trained from scratch (that requires massive compute resources), though philosophy is such a jargon-heavy technical discipline it would achieve vastly improved performance with from-scratch training for this task.

hbotz · August 23, 2020, 9:21pm

youre not getting it. GPT-3 is already trained using massive compute resources on a huge corpus of things, including philosophy texts. it already "knows" how to write philosophy texts and how to recognize nonsense prompts from valid prompts. i dont think the weights are even public, so you cant even "fine tune" it.

hbotz · August 23, 2020, 9:22pm

this quote indicates you dont understand GPT-3, methinks.

GPT-3 is scary because you tell it what to do in natural language, and it does it. you tell it to "write a response in the form of a philosophy essay to the following prompt, outputting a standardized reject if the prompt is nonsensical, in accordance with the following examples", and it will just do it.

electrowizard · August 23, 2020, 9:25pm

Read page 6 of the GPT-3 paper, dude

hbotz · August 23, 2020, 9:25pm

For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model.

this is literally in the abstract.

and ur asking how some tech demo is fine tuning the model

electrowizard · August 23, 2020, 9:27pm

I fucking said this. But you can still train from scratch over a corpus of domain specific documents you dumb shit. This is super common with medical documents and NVIDIA regularly does this to show off new optimizations for transformer training

hbotz · August 23, 2020, 9:28pm

ok but this is not how GPT-3 tech demos work the weights arent even public im telling you its a few-shot examples that doesnt even constitute fine-tuning

electrowizard · August 23, 2020, 9:28pm

Read page 6 moron. And yes, tech demos in this domain almost always include the fine-tuning method. "Is this zero, one, few, or many-shot and if it isn't zero, what prompts and responses did you use?" is important information to evaluate the tech demo itself