Finding Spelling Bee Pangrams with GPT-4 and SpaCy | by Sean Zhai

[ad_1]

The hunt to unravel the New York Instances puzzle

Photograph by Nemichandra Hombannavar on Unsplash

Fixing the New York Instances Spelling Bee is usually a rewarding expertise that balances a problem with the pleasure of phrase exploration. Whereas it’s not at all times a stroll within the park, the satisfaction gained from discovering every phrase is properly definitely worth the effort. Among the many varied linguistic achievements within the puzzle, uncovering the pangram is like discovering a hidden treasure. This particular phrase, which makes use of all of the given letters, highlights the participant’s ability in navigating the wealthy complexities of the English lexicon.

Discovering the pangram is an exhilarating exercise for many individuals, and it additionally serves as a compelling case for pure language processing (NLP) workout routines. SpaCy (Honnibal & Montani, 2017) is my favourite device for such duties. It’s open-sourced below the MIT license. You may write a program for SpaCy manually, however I’d like to point out you the way to develop such an answer utilizing GPT-4.

Spelling Bee

The New York Instances Spelling Bee is a well-liked phrase puzzle sport discovered within the New York Instances newspaper and on-line on the New York Instances web site. Within the sport, gamers are given a set of seven letters, with one of many letters designated because the “heart” letter. The target of the sport is to create as many phrases as doable utilizing the given letters whereas adhering to the next guidelines:

Every phrase have to be a minimum of 4 letters lengthy.
The “heart” letter should seem in each phrase.
Phrases have to be within the English dictionary.
Correct nouns and obscure or offensive phrases usually are not allowed.

The sport assigns some extent worth to every phrase based mostly on its size. Gamers obtain one level for a four-letter phrase, and the purpose worth will increase with every extra letter. A pangram is a phrase that makes use of all seven given letters a minimum of as soon as, and it awards bonus factors.

See How Human resolves the puzzle: William Jackson Harper Solves the NYT Spelling Bee | Supply: YouTube.com

GPT-4

GPT, or Generative Pre-trained Transformer, is a cutting-edge AI language mannequin developed by OpenAI that leverages deep studying methods to grasp and generate human-like textual content. With its highly effective transformer structure and pre-training on huge quantities of textual knowledge, GPT is able to spectacular performances throughout a variety of pure language processing duties, together with textual content completion, translation, summarization, and extra.

Spacy

SpaCy is a high-performance, open-source Python library designed for superior pure language processing (NLP) duties. Developed by Explosion AI, SpaCy gives environment friendly, production-ready instruments for textual content processing, tokenization, part-of-speech tagging, named entity recognition, dependency parsing, and extra. Constructed with a concentrate on pace and ease of use, SpaCy allows builders to shortly construct customized NLP functions.

Get Prepared

Let’s get the toolchain able to develop Spacy. You need to have Python and pip put in, as SpaCy is a Python library.

To jot down a program in SpaCy to search out pangrams within the NYT Spelling Bee, we’ll must get a listing of phrases from a dictionary or a corpus. For this instance, I’ll use the NLTK (Pure Language Toolkit) (Fowl et al., 2009) library to get a listing of phrases. Set up NLTK when you haven’t already:

pip set up nltk

Then, obtain the ‘phrases’ corpus from NLTK:

import nltk
nltk.obtain('phrases')

Notice: NTLK’s phrases corpus is a listing of phrases used within the English language. The phrases corpus in NLTK is sourced from the “Phrases” dataset, which was initially derived from the Unix “Phrases” file. This file is a listing of English phrases that had been generally utilized in Unix programs for varied functions, corresponding to spell-checking and producing random textual content. The Unix phrases file shouldn’t be an exhaustive record of English phrases, and it might comprise some archaic or unusual phrases. It additionally lacks correct nouns and should not embody newer additions to the English language. Nonetheless, it supplies a helpful place to begin for varied NLP duties, corresponding to phrase segmentation, tokenization, and spelling correction.

First Try

Let’s make an easy request to GPT-4.

Are you able to write a program in SpaCy to search out pangrams within the New York Instances’s Spelling Bee?

The reply supplied by GPT-4 appears to be like promising. It contains the steps to arrange SpaCy and the next Python program.

import spacy
import nltk
from nltk.corpus import phrases# Load the English language mannequin
nlp = spacy.load("en_core_web_sm")
def is_pangram(phrase, required_letter, given_letters):
phrase = phrase.decrease()
if required_letter not in phrase:
return False
return all(l in phrase for l in given_letters)
def find_pangrams(word_list, required_letter, given_letters):
pangrams = set()
for phrase in word_list:
if is_pangram(phrase, required_letter, given_letters):
pangrams.add(phrase)
return pangrams
def most important():
# Get the record of phrases from nltk
nltk_words = phrases.phrases()
# Outline the required letter and different given letters
required_letter = 'i'
given_letters = ['a', 'd', 'e', 'm', 't', 'y', 'i']
# Discover pangrams
pangrams = find_pangrams(nltk_words, required_letter, given_letters)
print("Pangrams discovered:", pangrams)
if __name__ == "__main__":
most important()

That is the output.

>>> pangrams
{'diathermotherapy', 'pentamethylenediamine', 'absentmindedly', 
'undemonstratively', 'hypodermatically', 'bisdimethylamino', 
'endolymphatic', 'Polystomidae', 'metasedimentary', 'Methodistically', 
'thermodynamicist', 'vasoepididymostomy', 'animatedly', 'radiochemistry', 
'hydrometeorological', 'demisability', 'hydrometallurgically',
'Mylodontidae', 'ektodynamorphic', 'detrimentally', 'Tetradynamia',
'immoderately', 'Merycopotamidae', 'Hypsiprymnodontinae', 'lymphadenitis', 
'diaphanometry', 'Cyclostomatidae', 'Platystomidae', 'smithydander', 
'hydroatmospheric', 'hydropneumatic', 'condimentary', 'hydrometamorphism', 'pentadactylism', 'diathermacy', 'ventripyramid', 'acidometry', 'thermodynamist', 'improvidentially', 'methodically', 'diethylenediamine', 'aldermanity', 'hemidactylous', 'myxadenitis', 'Myliobatidae', 'demountability', 'diathermy', 'lymphangioendothelioma', 'unmitigatedly', 'Basidiomycetes', 'Hydrometridae', 'ectodactylism', 'thermoradiotherapy', 'predaytime', 'indemnificatory', 'Hemibasidiomycetes', 'dilemmatically', 'cardiometry', 'lymphadenectasia', 'misunderstandingly', 'diastematomyelia',

Taking a more in-depth look, we discover that it contains letters that aren’t on the given letters record.

Second Try

With our commentary, we will ask GPT to enhance.

This system doesn’t work correctly. It offers phrases containing the “given_letters”, however what we want is phrases ONLY comprise the given_letters. It can not have another letters.

GPT-4 rewrites this system.

import nltk
from nltk.corpus import phrasesdef is_pangram(phrase, required_letter, given_letters):
phrase = phrase.decrease()
if required_letter not in phrase:
return False
for letter in phrase:
if letter not in given_letters:
return False
return all(l in phrase for l in given_letters)
def find_pangrams(word_list, required_letter, given_letters):
pangrams = set()
for phrase in word_list:
if is_pangram(phrase, required_letter, given_letters):
pangrams.add(phrase)
return pangrams
def most important():
# Get the record of phrases from nltk
nltk_words = phrases.phrases()
# Outline the required letter and different given letters
required_letter = 'i'
given_letters = ['a', 'd', 'e', 'm', 't', 'y', 'i']
# Discover pangrams
pangrams = find_pangrams(nltk_words, required_letter, given_letters)
print("Pangrams discovered:", pangrams)
if __name__ == "__main__":
most important()

The output is three phrases this time. The phrase that NYTimes Spelling Bee appears to be like for is ‘daytime’.

>>> pangrams
{'didymate', 'daytime', 'Maytide'}

Can we name this a win?

Screenshots of NYTimes Spelling Bee: 1) Could 16. 2023 2) Could 17, 2023 3) Could 18, 2023 | Display seize by Sean Zhai

An Exception and the Answer

The Spelling Bee of Could 17, 2023, has the next given letters: C D E N L O W, however our program returned an empty output. It can not discover a pangram.

That is when we have to assist GPT-4. I attempted to ask GPT-4 to research the error, nevertheless it didn’t work properly, so I stepped in. The algorithm seemed appropriate, and my subsequent inspection was the NLTK corpus. I ran a fast search in Python for the pangram of that day — clowned, and it was not in. So we discovered the issue.

I said the difficulty and requested GPT-4 for a brand new resolution.

Within the Python program for locating pangrams, the ‘phrases’ in nltk.corpus didn’t cowl all of the phrases the NYTimes use. As an example, ‘clowned’ shouldn’t be included. Is there a greater place to retrieve the ‘phrases’?

GPT-4 acknowledged the difficulty and supplied a brand new supply for English phrases, together with the up to date Python program.

A extra complete supply for English phrases is the phrases.txt file, which is derived from the SCOWL (Spell Checker Oriented Phrase Lists) undertaking. You will discover it within the GitHub repository right here: https://github.com/dwyl/english-words

def read_words_from_file(filename):
with open(filename, 'r') as f:
return [word.strip() for word in f.readlines()]def is_pangram(phrase, required_letter, given_letters):
phrase = phrase.decrease()
if len(phrase) < 4 or required_letter not in phrase:
return False
for letter in phrase:
if letter not in given_letters:
return False
return all(l in phrase for l in given_letters)
def find_pangrams(word_list, required_letter, given_letters):
pangrams = set()
for phrase in word_list:
if is_pangram(phrase, required_letter, given_letters):
pangrams.add(phrase)
return pangrams
def most important():
# Get the record of phrases from the phrases.txt file
filename = 'phrases.txt'
word_list = read_words_from_file(filename)
# Outline the required letter and different given letters
required_letter = 'c'
given_letters = ['c', 'd', 'e', 'n', 'l', 'o', 'w']
# Discover pangrams
pangrams = find_pangrams(word_list, required_letter, given_letters)
print("Pangrams discovered:", pangrams)
if __name__ == "__main__":
most important()

Save the brand new program as pangram.py. Now the result’s as follows:

% python3 pangram.py
Pangrams discovered: {'clowned'}

It’s passable!

[ad_2]

Source link

Finding Spelling Bee Pangrams with GPT-4 and SpaCy | by Sean Zhai | Mar, 2023

How drones for organ transportation are changing the healthcare industry

Four Approaches to build on top of Generative AI Foundational Models | by Lak Lakshmanan | Mar, 2023

Editor

Four Approaches to build on top of Generative AI Foundational Models | by Lak Lakshmanan | Mar, 2023

Leave a Reply Cancel reply

Browse by Category

Categories

Recommended

Finding Spelling Bee Pangrams with GPT-4 and SpaCy | by Sean Zhai | Mar, 2023

The hunt to unravel the New York Instances puzzle

Spelling Bee

GPT-4

Spacy

Get Prepared

First Try

Second Try

An Exception and the Answer

How drones for organ transportation are changing the healthcare industry

Four Approaches to build on top of Generative AI Foundational Models | by Lak Lakshmanan | Mar, 2023

Editor

Four Approaches to build on top of Generative AI Foundational Models | by Lak Lakshmanan | Mar, 2023

Leave a Reply Cancel reply

Browse by Category

Browse by Tags

Categories

Recommended