[ad_1]
The race to launch open supply generative AI fashions is heating up. Salesforce has joined the bandwagon by launching XGen-7B, a big language mannequin that helps longer context home windows than the obtainable open supply LLMs.
The 7B in XGen-7B LLM represents 7 billion parameters. The bigger the variety of parameters, the larger the mannequin. Fashions with bigger parameters, resembling 13 billion tokens, require high-end CPUs, GPUs, RAM, and storage. However the bigger mannequin measurement helps get an correct response since it’s educated on bigger information corpora. So, it’s a tradeoff between measurement and accuracy.
One of many key differentiators of XGen-7B is the 8K context window. A bigger context window interprets to a big immediate and the output generated by the mannequin. This implies it’s doable to ship prompts with extra context to the mannequin and get longer responses. The 8K context window is the cumulative measurement of the enter and output textual content.
Let’s perceive what a token is. Since machine studying fashions perceive numbers and never characters, every phrase or part of it’s transformed right into a token. A token is a option to encode textual content, like ASCII or Unicode. To show phrases into tokens, XGen-7B makes use of the OpenAI tokenizing system used with its common fashions, resembling GPT-3 and GPT-4.
XGen-7B turns into a substitute for open supply LLMs resembling MPT, Falcon, and LLaMa. Salesforce claims that its LLM achieves comparable or higher outcomes than the present state-of-the-art language fashions of comparable measurement.
Salesforce releases three variants of the XGen-7B. The primary one, XGen-7B-4K-base, helps a 4K context window, whereas the second variant, XGen-7B-8K-base, is educated with extra information with assist for an 8K context size. Each of those variants are launched underneath the Apache 2.0 open supply license, which permits business utilization.
The third variant, XGen-7B-{4K,8K}-inst, is educated on educational information together with databricks-dolly-15k, oasst1, Baize and GPT-related datasets, which can be found just for analysis functions. The instruct key phrase within the identify signifies that the mannequin can perceive directions and has been educated primarily based on reinforcement studying from human suggestions (RLHF) methods. An instruction-based language mannequin can be utilized to construct chatbots just like ChatGPT.
Salesforce has used a number of datasets, resembling RedPajama and Wikipedia, and Salesforce’s personal dataset, Starcoder, to coach the XGen-7B LLM. Based mostly on Google Cloud pricing for TPU-v4, the coaching price of the mannequin is $150K on 1T tokens. The mannequin is educated on 22 totally different languages to make it multilingual.
Salesforce’s XGen-7B helps Huge Multitask Language Understanding, which is the power to reply multiple-choice questions from numerous branches of data such because the humanities, STEM, Social Sciences, and different domains. The XGen-7B scores higher than different fashions on this class.
The XGen-7B mannequin additionally does effectively in different classes, resembling conversations, long-form Q&A and summarization.
Salesforce additionally added a disclaimer stating that their LLM is topic to the identical limitations as different LLMs, resembling bias, toxicity, and hallucinations.
With a bigger context window and a complete set of datasets used for coaching, the XGen-7B LLM from Salesforce appears promising.
[ad_2]
Source link