We want to find a million words that haven’t been included in major English dictionaries and give them each a home on the Internet.
At Wordnik we believe that every word of English deserves to be lookupable!
The internet is, for all practical purposes, infinite. Wordnik can and should include every English word that’s ever been used.
Every word deserves a recorded place in our language’s history. We want to collect, preserve, and share every word of English, and provide a place where people can find, learn, annotate, comment on, and argue about every word.
If you want to know more about a word—any word!—we want to help you find the information you need. If you’re curious about a word, why should you have to wait until someone else decides that a word is worth knowing?
The reason so few words are added to traditional dictionaries is because writing definitions takes a long time. A very talented editor may write seven entries in a day, or she may need weeks to describe just one word. Dictionary definitions are very difficult to write.
Wordnik takes a different approach. Instead of writing traditional definitions, we search for casual definitions that have already been created. You see these casual definitions all the time in good writing! Here’s an excellent casual definition for the word bibliobesity from Terry Teachout, in the New York Times:
“The problem of bloated books—bibliobesity, as it were—has always been with us.”
And another, for cookprint, from Consumer Reports:
“And cut down on your cookprint—the energy you use to prepare the food you eat.”
And for misophonia, by The Huffington Post:
“The condition is called misophonia—literally ‘hatred of sound’—and occurs when a common noise, whether it’s something like a person chewing loudly, water dripping or someone ‘ahem’-ing, causes you to become anxious or angry, more so than a typical response, TODAY reported.”
Sometimes we call these self-defining, explanatory sentences “free-range definitions”, or “FRDs” (pronounced “freds”).
By finding these sentences (and we’ve already found a lot of them) we can help people understand new and rare words the way people normally learn new words: in context.
Wordnik has been collecting and sharing words since 2009.
Our mission is to collect all the words of English and share them with everyone. We share words through our website at Wordnik.com, and through our API at developer.wordnik.com. Last year, we reincorporated as a non-profit, because we believe the English language belongs to everyone.
Every word at Wordnik gets its own full page, with as much data shown as possible: a standard definition (if one already exists), example sentences; synonyms, antonyms, and other related words; space for community-added tags, lists, and comments; images from Flickr and tweets from Twitter; and statistics on usage, including how many times a word has been favorited, listed, tagged, commented-upon, and, of course, whether or not it’s valid in Scrabble (and how many points it scores).
Wordnik also has an open API! For web and native app developers, instead of licensing a less-complete dictionary or finding public-domain resources and adapting them, you can use our API to incorporate information from our site in search results or as full definitions.
At Wordnik, we don’t want to be a gatekeeper between you and the word you’re interested in; we just want to show you the data that exists and let you make up your own mind! And we want to do this for every word.
We’ll start by taking the thousands of free-range-definition examples (FRDs, or “Freds”) we’ve already found, and use those as a training set for a new machine-learning-based tool.
Then we’ll start looking for more great sentences, beginning with the hundreds of thousands of words that have already been looked up on Wordnik but for which we don’t have good data. (If a real human looked something up, it is more likely to be interesting.)
We’ll also update Wordnik so that any time a word is looked up that we’ve never seen before, we’ll kick off a search to find more data about it.
We won’t limit ourselves to words that are more frequent than one-in-a-billion, either. If a word exists at all, we’ll show you what we can!
In 2010, Harvard researchers published findings in the journal Science that began to quantify the number of definition-less words in English. Using the Google Books Corpus (5 million books, 361 billion words) and comparing samples to major dictionaries (including the Oxford English Dictionary [OED] and the Merriam-Webster Unabridged Dictionary [MWD]), the researchers estimated that “52% of the English lexicon—the majority of the words used in English books—consists of lexical ‘dark matter’ undocumented in standard references.”
Jean-Baptiste Michel, Erez Lieberman Aiden, and the other authors of the Science paper defined a word as a meaningful string of letters that occurs more often than once every billion words. (1 in a billion is the frequency of the least common words in the dictionaries they analyzed.) Using this definition, they used their data to estimate that the English language contains 1,022,000 words—each of which appeared at least 362 times in their 361 billion word dataset.
The second edition of the OED has 615,100 “word forms defined and/or illustrated”; MWD has around 470,000 main entries (not including word forms).
Once you take into account words that appear less frequently than once every billion words, there may be a million or more missing words!
We already have all these words in English! They exist right now in articles, books, blog posts, and even tweets. But they’ve never all been recorded in one place where they can be discovered and loved.
Have you ever felt that the right word was out there, but you just couldn’t find it?
Have you ever learned a weird word that made your whole day? Perhaps a word like thoil, which means ‘to be able to justify the expense of a purchase’? Or pandiculation, which means ‘yawning and stretching (as when first waking up)’?
Many of these missing words aren’t words we use every day, but their existence enriches the whole language. We believe every word should have the right to be included in the dictionary.
At Wordnik, we want to gather data about every word—and then you can use that data to decide whether a word is worth adding to your vocabulary. We’ll just find the words: what happens next is up to you!
In addition to adding all these new words to Wordnik for everyone to enjoy, we will be releasing a report to backers in March of 2016 detailing our progress and giving an overview of the words we’ve found and added. Every backer will receive this (digital) report.
We have some other splendiferous rewards, too!
Wordnik is a non-profit effort. For US backers, your donation is tax-deductible and may also qualify for matching funds from your employer if they support such a program for charitable contributions. While we file for our own non-profit status we have teamed up with a well-known 501(c)3 fiscal sponsor, Planetwork (they were also the fiscal sponsor for Hypothes.is). For any contributions you make you will receive a letter from Planetwork/Wordnik verifying that your contribution qualifies as tax-deductible.
Our biggest expenses for this project are adding the extra servers and storage to find, process, and serve data for a million (or more!) new words. Although we believe we can be more efficient than our current setup, we’re basing our costs on what we spend now to run Wordnik’s servers now.
We also are hiring some amazing machine learning consultants to ensure that we’re doing this The Right Way. (We expect to make their names public very shortly after launch of the campaign.)
We also will be spending a little money to make cool stuff to reward our backers, and and, of course, there are Kickstarter fees.
This is a very good question! Linguists, lexicographers, and grammarians often struggle with what makes something “a word”. Here are some types of words that are A-OK with us and that we will include when we find them:
Here are some things we DON’T consider “words” for the purposes of this project:
If you want even more information about why MORE WORDS ARE BETTER, you probably will like these two TED videos:
Thanks so much for making it all the way to the end!