470
Discussion[D] GPT-3, The $4,600,000 Language Model(self.MachineLearning)
submitted 5 years, 7 months ago* (edited 18 hours, 40 minutes after) by mippie_moe to /r/MachineLearning (3m)
since 5 years, 7 months ago
3 of 3
Tip Reveddit Real-Time can notify you when your content is removed.
your account history
Tip Check if your account has any removed comments.
view my removed comments you are viewing a single comment's thread.
view all comments


While it would likely be enormously cost-prohibitive, AWS does offer some "private" tiers.
For example, the u-12tb1.metal instance type has 12 TB of RAM and 448 CPU cores. While this one is aimed at in-memory DBs, they do have some other huge cluster offerings.
I don't think many will be running the 175b parameter model anywhere, even OpenAI is probably hurting a bit after doing it. They also published smaller models which I think would be enough, the 13B param is still like 10x the largest GPT-2 model. Humans were only 52% accurate at identifying fake articles written by the 175B model, pretty much just guess 50/50, but even for the 13B model people were only 55% accurate.
13 B you can probably reasonably well on a single Tesla A100 with 40 GB VRAM.
But technology advancements will make these things more accessible as well. Nvidia's NVSwitch solution is incredibly niche and expensive by requiring you to build a board that wires every GPU to every other GPU in the server.
AMD with 3rd gen infinity fabric will try to do that built in to the CPU + GPU. Nvidia was limited to PCIe 3.0 and it wasn't fast enough. With Zen 3 or 4 AMD is moving to PCIe 5.0 which can do 63GB/s compared to 16GB of gen 3. They will be using this to interconnect 8 GPU and a EPYC processor in the El Capitan 2 exaflop supercomputer with full GPU resource sharing. The NVSwitch has a port bandwidth of 50 GB/s, so in a few years an off the shelf server will be able to do this stuff instead of needing a super niche product.
https://en.wikichip.org/wiki/nvidia/nvswitch
This thing is absolutely ridiculous, it's a 100W linking cable.
In 2022 AMD servers will be able to do this without specific hardware,
https://www.anandtech.com/show/15596/amd-moves-from-infinity-fabric-to-infinity-architecture-connecting-everything-to-everything
That's when models of this size can start to become common.
Thanks for sharing the specifics on this. Very exciting stuff!