On Wednesday, Databricks released Dolly 2.0, reportedly the first open source, instruction-following large language model (LLM) for commercial use that’s been fine-tuned on a human-generated data set. It could serve as a compelling starting point for homebrew ChatGPT competitors.
Databricks is an American enterprise software company founded in 2013 by the creators of Apache Spark. They provide a web-based platform for working with Spark for big data and machine learning. By releasing Dolly, Databricks hopes to allow organizations to create and customize LLMs “without paying for API access or sharing data with third parties,” according to the Dolly launch blog post.
Dolly 2.0, its new 12-billion parameter model, is based on EleutherAI’s pythia model family and exclusively fine-tuned on training data (called “databricks-dolly-15k”) crowdsourced from Databricks employees. That calibration gives it abilities more in line with OpenAI’s ChatGPT, which is better at answering questions and engaging in dialogue as a chatbot than a raw LLM that has not been fine-tuned.