[ad_1]
Introduction
On this planet of enormous language fashions (LLMs), the price of computation is usually a vital barrier, particularly for intensive tasks. I lately launched into a mission that required working 4,000,000 prompts with a median enter size of 1000 tokens and a median output size of 200 tokens. That’s almost 5 billion tokens! The standard strategy of paying per token, as is frequent with fashions like GPT-3.5 and GPT-4, would have resulted in a hefty invoice. Nonetheless, I found that by leveraging open supply LLMs, I might shift the pricing mannequin to pay per hour of compute time, resulting in substantial financial savings. This text will element the approaches I took and examine and distinction every of them. Please observe that whereas I share my expertise with pricing, these are topic to alter and should differ relying in your area and particular circumstances. The important thing takeaway right here is the potential value financial savings when leveraging open supply LLMs and renting a GPU per hour, fairly than the precise costs quoted. If you happen to plan on using my beneficial options in your mission, I’ve left a few affiliate hyperlinks on the finish of this text.
ChatGPT API
I performed an preliminary check utilizing GPT-3.5 and GPT-4 on a small subset of my immediate enter knowledge. Each fashions demonstrated commendable efficiency, however GPT-4 persistently outperformed GPT-3.5 in a majority of the instances. To provide you a way of the price, working all 4 million prompts utilizing the Open AI API would look one thing like this:
Whereas GPT-4 did provide some efficiency advantages, the price was disproportionately excessive in comparison with the incremental efficiency it added to my outputs. Conversely, GPT-3.5 Turbo, though extra inexpensive, fell quick when it comes to efficiency, making noticeable errors on 2–3% of my immediate inputs. Given these components, I wasn’t ready to speculate $7,600 on a mission that was…
[ad_2]
Source link