Elegant prompt versioning and LLM model configuration with spacy-llm | by Déborah Mesquita

[ad_1]

Utilizing spacy-llm to simplify immediate administration and create duties for knowledge extraction

A tidy desk, the way you code will appear like for those who use spacy-llm haha

Managing prompts and dealing with OpenAI request failures could be a difficult process. Happily, spaCy launched spacy-llm, a robust software that simplifies immediate administration and eliminates the necessity to create a customized resolution from scratch.

On this article, you’ll learn to leverage spacy-llm to create a process that extracts knowledge from textual content utilizing a immediate. We are going to dive into the fundamentals of spacy and discover among the options of spacy-llm.

spaCy is a library for superior NLP in Python and Cython. When coping with textual content knowledge, a number of processing steps are usually required, comparable to tokenization and POS tagging. So as to execute these steps, spaCy supplies the nlp technique, which invokes a processing pipeline.

spaCy v3.0 introduces config.cfg, a file the place we will embody detailed settings of those pipelines.

config.cfg makes use of confection, a config system which permits the creation of arbitrary object bushes. As an example, confection parsers the next config.cfg:

[training]
endurance = 10
dropout = 0.2
use_vectors = false[training.logging]
degree = "INFO"
[nlp]
# This makes use of the worth of coaching.use_vectors
use_vectors = ${coaching.use_vectors}
lang = "en"

into:

{
"coaching": {
"endurance": 10,
"dropout": 0.2,
"use_vectors": false,
"logging": {
"degree": "INFO"
}
},
"nlp": {
"use_vectors": false,
"lang": "en"
}
}

Every pipeline use parts, and spacy-llm shops the pipeline parts into registries utilizing catalogue. This library, additionally from Explosion, introduces operate registries that permit for environment friendly administration of the parts. A llmpart is outlined into two main settings:

A task, defining the immediate to ship to the LLM in addition to the performance to parse the ensuing response
A model, defining the mannequin and the way to connect with it

To incorporate a part that makes use of a LLM in our pipeline, we have to observe just a few steps. First, we have to create a process and register it into the registry. Subsequent, we will use a mannequin to execute the immediate and retrieve the responses. Now it’s time to do all that so we will run the pipeline

We are going to use quotes from https://dummyjson.com/ and create a process to extract the context from each quote. We are going to create the immediate, register the duty and eventually create the config file.

1. The immediate

spacy-llm makes use of Jinja templates to outline the directions and examples. The {{ textual content }} can be changed by the quote we’ll present. That is our immediate:

You might be an skilled at extracting context from textual content. 
Your duties is to just accept a quote as enter and supply the context of the quote.
This context can be used to group the quotes collectively. 
Don't put another textual content in your reply and supply the context in 3 phrases max.
{# whitespace #}
{# whitespace #}
Right here is the quote that wants classification
{# whitespace #}
{# whitespace #}
Quote:
'''
{{ textual content }}
'''
Context

2. The duty class

Now let’s create the category for the duty. The category ought to implement two features:

generate_prompts(docs: Iterable[Doc]) -> Iterable[str]: a operate that takes in an inventory of spaCy Doc objects and transforms them into an inventory of prompts
parse_responses(docs: Iterable[Doc], responses: Iterable[str]) -> Iterable[Doc]: a operate for parsing the LLM’s outputs into spaCy Doc objects

generate_prompts will use our Jinja template and parse_responses will add the attribute context to our Doc. That is the QuoteContextExtractTask class:

from pathlib import Path
from spacy_llm.registry import registry
import jinja2
from typing import Iterable
from spacy.tokens import DocTEMPLATE_DIR = Path("templates")
def read_template(identify: str) -> str:
"""Learn a template"""
path = TEMPLATE_DIR / f"{identify}.jinja"
if not path.exists():
increase ValueError(f"{identify} just isn't a sound template.")
return path.read_text()
class QuoteContextExtractTask:
def __init__(self, template: str = "quotecontextextract.jinja", discipline: str = "context"):
self._template = read_template(template)
self._field = discipline
def _check_doc_extension(self):
"""Add extension if want be."""
if not Doc.has_extension(self._field):
Doc.set_extension(self._field, default=None)
def generate_prompts(self, docs: Iterable[Doc]) -> Iterable[str]:
surroundings = jinja2.Setting()
_template = surroundings.from_string(self._template)
for doc in docs:
immediate = _template.render(
textual content=doc.textual content,
)
yield immediate  
def parse_responses(
self, docs: Iterable[Doc], responses: Iterable[str]
) -> Iterable[Doc]:
self._check_doc_extension()
for doc, prompt_response in zip(docs, responses):      
attempt:
setattr(
doc._,
self._field,
prompt_response.change("Context:", "").strip(),
),
besides ValueError:
setattr(doc._, self._field, None)
yield doc

Now we simply want so as to add the duty to the spacy-llm llm_tasks register:

@registry.llm_tasks("my_namespace.QuoteContextExtractTask.v1")
def make_quote_extraction() -> "QuoteContextExtractTask":
return QuoteContextExtractTask()

3. The config.cfg file

We’ll use the GPT-3.5 mannequin from OpenAI. spacy-llm has a mannequin for that so we simply want to ensure the key secret’s accessible as an environmental variable:

export OPENAI_API_KEY="sk-..."
export OPENAI_API_ORG="org-..."

To construct the nlp technique that runs the pipeline we’ll use the assemble technique from spacy-llm. This strategies reads from a .cfg file. The file ought to reference the GPT-3.5 mannequin (it’s already in he registry) and the duty we’ve created:

[nlp]
lang = "en"
pipeline = ["llm"]
batch_size = 128[components]
[components.llm]
manufacturing unit = "llm"
[components.llm.model]
@llm_models = "spacy.GPT-3-5.v1"
config = {"temperature": 0.1}
[components.llm.task]
@llm_tasks = "my_namespace.QuoteContextExtractTask.v1"

4. Operating the pipeline

Now we simply have to put every thing collectively and run the code:

import os
from pathlib import Pathimport typer
from wasabi import msg
from spacy_llm.util import assemble
from quotecontextextract import QuoteContextExtractTask
Arg = typer.Argument
Decide = typer.Choice
def run_pipeline(
# fmt: off
textual content: str = Arg("", assist="Textual content to carry out textual content categorization on."),
config_path: Path = Arg(..., assist="Path to the configuration file to make use of."),
verbose: bool = Decide(False, "--verbose", "-v", assist="Present additional info."),
# fmt: on
):
if not os.getenv("OPENAI_API_KEY", None):
msg.fail(
"OPENAI_API_KEY env variable was not discovered. "
"Set it by operating 'export OPENAI_API_KEY=...' and check out once more.",
exits=1,
)
msg.textual content(f"Loading config from {config_path}", present=verbose)
nlp = assemble(
config_path
)
doc = nlp(textual content)
msg.textual content(f"Quote: {doc.textual content}")
msg.textual content(f"Context: {doc._.context}")
if __name__ == "__main__":
typer.run(run_pipeline)

And run:

python3 run_pipeline.py "We should stability conspicuous consumption with acutely aware capitalism." ./config.cfg
>>> 
Quote: We should stability conspicuous consumption with acutely aware capitalism.
Context: Enterprise ethics.

If you wish to change the immediate, simply create one other Jinja file and create a my_namespace.QuoteContextExtractTask.v2 process the identical means we’ve created the primary one. If you wish to change the temperature, simply change the parameter on the config.cfg file. Good, proper?

The flexibility to deal with OpenAI REST requests and its simple method to storing and versioning prompts are my favourite issues about spacy-llm. Moreover, the library provides a Cache for caching prompts and responses per doc, a way for offering examples for few-shot prompts, and a logging function, amongst different issues.

You may check out your entire code from right this moment right here: https://github.com/dmesquita/spacy-llm-elegant-prompt-versioning.

As at all times, thanks for studying!

[ad_2]

Source link

Elegant prompt versioning and LLM model configuration with spacy-llm | by Déborah Mesquita | Jul, 2023

Waymo pumps the brakes on self-driving trucks

Using Plotly Express Sunburst Charts to Explore Geological Data | by Andy McDonald | Jul, 2023

Editor

Using Plotly Express Sunburst Charts to Explore Geological Data | by Andy McDonald | Jul, 2023

Leave a Reply Cancel reply

Browse by Category

Categories

Recommended

Elegant prompt versioning and LLM model configuration with spacy-llm | by Déborah Mesquita | Jul, 2023

Utilizing spacy-llm to simplify immediate administration and create duties for knowledge extraction

1. The immediate

2. The duty class

3. The config.cfg file

4. Operating the pipeline

Waymo pumps the brakes on self-driving trucks

Using Plotly Express Sunburst Charts to Explore Geological Data | by Andy McDonald | Jul, 2023

Editor

Using Plotly Express Sunburst Charts to Explore Geological Data | by Andy McDonald | Jul, 2023

Leave a Reply Cancel reply

Browse by Category

Browse by Tags

Categories

Recommended