In this work we do not fine-tune GPT-3 because our focus is on task-agnostic performance, but GPT-3 can be fine-tuned in principle and this is a promising direction for future work.
considering the OpenAI people havent even done this somehow i dont think tech demo people are.
yes yes sure it would be helpful for the author to have put the prompt structure on github. you asked a question, so i gave an answer. its few-shot.
its like ur a middle school report writer who opens up the GPT-3 paper flips to page 6 and sees a list of various
different settings that we will be evaluating GPT-3 on or could in principle evaluate GPT-3 on
which are a set of four bullet points "fine-tuning, few-shot, one-shot, zero-shot"
and from that make a generic declaration
except ur missing the point because the tech demo is few-shot, like basically all the tech demos. do note the "in principle" clause covering the fact that GPT-3 is not fine-tuned in practice.
If the guy doesn't say if it's zero, one, few, or many-shot then we don't know.
And what? GPT-3 tech demos almost always include how fine-tuning was done and with what datasets or what prompts and responses were used.
You realize that two of the GPT-3 pretrained models are smaller than GPT-2 and can be fine-tuned with a moderate amount of cloud compute, right? Or can be completely trained from scratch with DGX-2s
We don't fucking know if it's few-shot, that's the fucking point. It could be one-shot for all we know. That information almost always accompanies the demos
okay very cool i conjecture it is few shot because you minimally need one example for "reject" and one example for "accept". maybe i am wrong, but i cant think of a different way to do it
indeed the author should have put it on github but i will provide you an answer to your question. i think that is more productive than a useless criticism considering neither you nor i are the tech demo author
Nobody's training GPT-3 "from scratch" by any traditional definition of from scratch
The suggestion that a guy who made a gimmick philosophy bot for twitter may have spent 5-10 million dollars to train a GPT-3 philosophy model and you can't wait to see the paper on his methods is bad-faith pseudointellectual masturbation
You wanted to say "few-shot" and "github" a couple times so someone like epokkk or iaafr would come in here and say "wow this guy really knows what he's talking about! He knows about github!"
we start this section by explicitly defining and contrasting the different settings that we will be evaluating GPT-3 on or could in principle evaluate GPT-3 on ... Fine Tuning (FT) ... Few-Shot (FS) ... One-Shot (1S) ... Zero-Shot (0S)
You're right, the API for fine-tuning hasn't been released: it comes out in a few weeks. I've been working with zero and one-shot, so I hadn't checked if it was out yet. It's still important to know if it's zero, one, or few-shot and what prompt/response pairs were used
Some papers before GPT-3 referred to few-shot as a fine-tuning mechanism. I already stated that depending on the paper, some classify it as fine-tuning while some depend on the scale of the dataset being a certain size for it to qualify as fine-tuning
It is very likely that NVIDIA is like they have with every other extremely large language model to present an optimized training mechanism specific to their hardware. Come the fuck on
I need to reread the paper, but I believe it can learn to reject from zero-shot if it concludes that the prompt given isn't similar enough to the zero-shot prompt
Yes, none of that shit is contradictory. Saying that there would be vastly improved performance with from-scratch training on a philosophy corpus doesn't contradict shit when THE FUCKING SENTENCE BEFORE THAT says it's very unlikely any of that is true.