For only a year and a half, the world’s strongest language artificial intelligence may have been GPT-3, created at OpenAI, founded by Elon Musk. The Megatron – Turing Natural Language Generator (MT – NLG, or Megatron – Turing Natural Language Generator) is now the world’s largest and most powerful language generator model. The 530 billion parameters handled by Megatron – Turing are three times that of GPT – 3.
The number of parameters characterizes the amount of data used to teach artificial intelligence, and thus the quality of the data it generates. The 175 million parameters of GPT-3 artificial intelligence were also very large compared to its predecessor, GPT-2, which handled only one and a half billion parameters. Plenty of parameters also made an impact: the GPT-3 had capabilities that no one expected, such as the ability to write a program, compile, or replace missing parts of images.
Megatron-Turing tripled on this. For training, Nvidia provided 560 servers, each containing eight 80 gigabyte video cards. A dataset called Pile was used for training, which includes the entire Wikipedia and PuibMed medical article database and the complete GitHub source code manager, among others. The 825-gigabyte heap of text has been sorted for higher quality, and data has been added to the Common Crawl, a nonprofit organization that gathers billions of web pages in a data mining-ready format.
The end result of the $ 85 million training is a language model capable of completing sentences, interpreting text, arguing, drawing linguistic conclusions, and interpreting words. Like the GPT-3, unexpected capabilities will only emerge when Megatron-Turing becomes more widely used. The latter, however, is yet to come, as it has not yet been announced when corporate developers will be able to try it out.