![]() ![]() ![]() Subtitle = "This relationship would be great for pangram models, but we don't know the # of words ahead of playing" )Īs expected, given how the scoring works, the score (proxied by the minimum genius score) is closely correlated with the maximum number of words, but there is some variance. Title = "The possible score is related to the number of words you can make", Ggplot ( data, aes ( max_words, min_genius ) ) + geom_point ( ) + geom_smooth ( ) + theme_minimal ( ) + labs ( Safe analysis despite data prep failuresīefore completely calling it, we can used the data we successfully parsed off the HTML pages (not the image data) to see if there is a simple relationship between the genius score and the number of pangrams: In other words, where the total_letters_found != 7, it is safe to assume the required letter is off as well. If a letter can’t be found, this messes with the order, and prevents the data about the required letter from being accurate. In addition to the wrong letters, my determination of the required letter was based on the OCR analysis recording identifying letters in a consistent order. The days with too many letters are equally troubling. These letters have a major impact on game scoring, so any model of the game would have significant bias without days that included these letters. Out of 365 days, it produced 142 accurate datasets with 7 letters, less than 50% of the attempted 365 days! Of the inaccuracies, the biggest is missing letters: it did not identify any O, Q, or Xs. # what are A and S correlated with? # result: they could be confused with L, N, |, T, C, E, R a_s_days % filter ( letters %in% c ( "S", "A" ) ) %>% pull ( date ) For now, let’s take a quick look at the result of this workflow: The details of this workflow are detailed at the bottom section of this post. Created an OCR workflow, also in Python, to parse the letters from the static image of the game board.Used beautiful soup in Python to parse each URL to get the metadata.This turned out to be quite an involved task: My first task was to create a tidy dataset based on parsing these web pages. Historical data is available at links formatted like: Looking at the site before starting the game is one way to determine the number of pangrams in play… however, I believe looking at this site before playing a puzzle is akin to cheating! That said, the daily information is invaluable to creating my model. In other words:Ī generous spelling bee enthusiast maintains a website that contains helpful information about each daily bee. My goal is to see if I can determine the number of pangrams, \(p\), based on the point total required for the genius category \(g\) plus information about the letters in the game, \(l\). For example, in the spelling bee on, the rankings and associated minimum required points were: Instead, the game makers apply a normalizing function that translates scores into categories based on the max possible score. The other piece of information is the genius score.īecause some puzzles are not as conducive to creating words, it is not possible to compare one game to another based solely on point totals. Part of that information is the game board itself, including the letters and the required letter used in the puzzle. The goal of this analysis is to see if I can determine a model that will predict the number of pangrams given information I have at the start of each game. However, the game does not tell you how many pangrams exist in any given puzzle. Pangrams are worth extra points, and for me, are the most exciting part of the game. Players get points based on the length of each word they can spell. In each game there is at least 1 pangram, which is a word that includes all 7 letters at least once. In each game, one letter is required and must be included in any word. The 7 letters can be repeated any number of times. The New York Times Spelling Bee is a game where players try to spell valid words using 7 letters. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |