A paper describing MineDojo, Nvidia’s generalist AI agent that can perform actions from written prompts in Minecraft, won an Outstanding Datasets and Benchmarks Paper Award at the 2022 NeurIPS (Neural Information Processing Systems) conference, Nvidia revealed on Monday.
To train the MineDojo framework to play Minecraft, researchers fed it 730,000 Minecraft YouTube videos (with more than 2.2 billion words transcribed), 7,000 scraped webpages from the Minecraft wiki, and 340,000 Reddit posts and 6.6 million Reddit comments describing Minecraft gameplay.
From this data, the researchers created a custom transformer model called MineCLIP that associates video clips with specific in-game Minecraft activities. As a result, someone can tell a MineDojo agent what to do in the game using high-level natural language, such as “find a desert pyramid” or “build a nether portal and enter it,” and MineDojo will execute the series of steps necessary to make it happen in the game.