[ad_1]
Managing prompts and dealing with OpenAI request failures could be a difficult process. Happily, spaCy launched spacy-llm, a robust software that simplifies immediate administration and eliminates the necessity to create a customized resolution from scratch.
On this article, you’ll learn to leverage spacy-llm to create a process that extracts knowledge from textual content utilizing a immediate. We are going to dive into the fundamentals of spacy and discover among the options of spacy-llm.
spaCy is a library for superior NLP in Python and Cython. When coping with textual content knowledge, a number of processing steps are usually required, comparable to tokenization and POS tagging. So as to execute these steps, spaCy supplies the nlp
technique, which invokes a processing pipeline.
spaCy v3.0 introduces config.cfg
, a file the place we will embody detailed settings of those pipelines.
config.cfg
makes use of confection, a config system which permits the creation of arbitrary object bushes. As an example, confection parsers the next config.cfg
:
[training]
endurance = 10
dropout = 0.2
use_vectors = false[training.logging]
degree = "INFO"
[nlp]
# This makes use of the worth of coaching.use_vectors
use_vectors = ${coaching.use_vectors}
lang = "en"
into:
{
"coaching": {
"endurance": 10,
"dropout": 0.2,
"use_vectors": false,
"logging": {
"degree": "INFO"
}
},
"nlp": {
"use_vectors": false,
"lang": "en"
}
}
Every pipeline use parts, and spacy-llm shops the pipeline parts into registries utilizing catalogue. This library, additionally from Explosion, introduces operate registries that permit for environment friendly administration of the parts. A llm
part is outlined into two main settings:
- A task, defining the immediate to ship to the LLM in addition to the performance to parse the ensuing response
- A model, defining the mannequin and the way to connect with it
To incorporate a part that makes use of a LLM in our pipeline, we have to observe just a few steps. First, we have to create a process and register it into the registry. Subsequent, we will use a mannequin to execute the immediate and retrieve the responses. Now it’s time to do all that so we will run the pipeline
We are going to use quotes from https://dummyjson.com/ and create a process to extract the context from each quote. We are going to create the immediate, register the duty and eventually create the config file.
1. The immediate
spacy-llm makes use of Jinja templates to outline the directions and examples. The {{ textual content }}
can be changed by the quote we’ll present. That is our immediate:
You might be an skilled at extracting context from textual content.
Your duties is to just accept a quote as enter and supply the context of the quote.
This context can be used to group the quotes collectively.
Don't put another textual content in your reply and supply the context in 3 phrases max.
{# whitespace #}
{# whitespace #}
Right here is the quote that wants classification
{# whitespace #}
{# whitespace #}
Quote:
'''
{{ textual content }}
'''
Context
2. The duty class
Now let’s create the category for the duty. The category ought to implement two features:
generate_prompts(docs: Iterable[Doc]) -> Iterable[str]
: a operate that takes in an inventory of spaCyDoc
objects and transforms them into an inventory of promptsparse_responses(docs: Iterable[Doc], responses: Iterable[str]) -> Iterable[Doc]
: a operate for parsing the LLM’s outputs into spaCyDoc
objects
generate_prompts
will use our Jinja template and parse_responses
will add the attribute context to our Doc. That is the QuoteContextExtractTask
class:
from pathlib import Path
from spacy_llm.registry import registry
import jinja2
from typing import Iterable
from spacy.tokens import DocTEMPLATE_DIR = Path("templates")
def read_template(identify: str) -> str:
"""Learn a template"""
path = TEMPLATE_DIR / f"{identify}.jinja"
if not path.exists():
increase ValueError(f"{identify} just isn't a sound template.")
return path.read_text()
class QuoteContextExtractTask:
def __init__(self, template: str = "quotecontextextract.jinja", discipline: str = "context"):
self._template = read_template(template)
self._field = discipline
def _check_doc_extension(self):
"""Add extension if want be."""
if not Doc.has_extension(self._field):
Doc.set_extension(self._field, default=None)
def generate_prompts(self, docs: Iterable[Doc]) -> Iterable[str]:
surroundings = jinja2.Setting()
_template = surroundings.from_string(self._template)
for doc in docs:
immediate = _template.render(
textual content=doc.textual content,
)
yield immediate
def parse_responses(
self, docs: Iterable[Doc], responses: Iterable[str]
) -> Iterable[Doc]:
self._check_doc_extension()
for doc, prompt_response in zip(docs, responses):
attempt:
setattr(
doc._,
self._field,
prompt_response.change("Context:", "").strip(),
),
besides ValueError:
setattr(doc._, self._field, None)
yield doc
Now we simply want so as to add the duty to the spacy-llm llm_tasks
register:
@registry.llm_tasks("my_namespace.QuoteContextExtractTask.v1")
def make_quote_extraction() -> "QuoteContextExtractTask":
return QuoteContextExtractTask()
3. The config.cfg file
We’ll use the GPT-3.5 mannequin from OpenAI. spacy-llm has a mannequin for that so we simply want to ensure the key secret’s accessible as an environmental variable:
export OPENAI_API_KEY="sk-..."
export OPENAI_API_ORG="org-..."
To construct the nlp
technique that runs the pipeline we’ll use the assemble
technique from spacy-llm. This strategies reads from a .cfg
file. The file ought to reference the GPT-3.5 mannequin (it’s already in he registry) and the duty we’ve created:
[nlp]
lang = "en"
pipeline = ["llm"]
batch_size = 128[components]
[components.llm]
manufacturing unit = "llm"
[components.llm.model]
@llm_models = "spacy.GPT-3-5.v1"
config = {"temperature": 0.1}
[components.llm.task]
@llm_tasks = "my_namespace.QuoteContextExtractTask.v1"
4. Operating the pipeline
Now we simply have to put every thing collectively and run the code:
import os
from pathlib import Pathimport typer
from wasabi import msg
from spacy_llm.util import assemble
from quotecontextextract import QuoteContextExtractTask
Arg = typer.Argument
Decide = typer.Choice
def run_pipeline(
# fmt: off
textual content: str = Arg("", assist="Textual content to carry out textual content categorization on."),
config_path: Path = Arg(..., assist="Path to the configuration file to make use of."),
verbose: bool = Decide(False, "--verbose", "-v", assist="Present additional info."),
# fmt: on
):
if not os.getenv("OPENAI_API_KEY", None):
msg.fail(
"OPENAI_API_KEY env variable was not discovered. "
"Set it by operating 'export OPENAI_API_KEY=...' and check out once more.",
exits=1,
)
msg.textual content(f"Loading config from {config_path}", present=verbose)
nlp = assemble(
config_path
)
doc = nlp(textual content)
msg.textual content(f"Quote: {doc.textual content}")
msg.textual content(f"Context: {doc._.context}")
if __name__ == "__main__":
typer.run(run_pipeline)
And run:
python3 run_pipeline.py "We should stability conspicuous consumption with acutely aware capitalism." ./config.cfg
>>>
Quote: We should stability conspicuous consumption with acutely aware capitalism.
Context: Enterprise ethics.
If you wish to change the immediate, simply create one other Jinja file and create a my_namespace.QuoteContextExtractTask.v2
process the identical means we’ve created the primary one. If you wish to change the temperature, simply change the parameter on the config.cfg
file. Good, proper?
The flexibility to deal with OpenAI REST requests and its simple method to storing and versioning prompts are my favourite issues about spacy-llm. Moreover, the library provides a Cache for caching prompts and responses per doc, a way for offering examples for few-shot prompts, and a logging function, amongst different issues.
You may check out your entire code from right this moment right here: https://github.com/dmesquita/spacy-llm-elegant-prompt-versioning.
As at all times, thanks for studying!
[ad_2]
Source link