x

RSS Newsfeeds

See all RSS Newsfeeds

Global Regions

Global Regions ( XML Feed )
United States ( XML Feed )

Sep 29, 2015 1:47 EST

Let’s Add a Million Missing Words to the Dictionary: Let’s give a million missing words their rightful places in the dictionary.

iCrowdNewswire - Sep 29, 2015

We want to find a million words that haven’t been included in major English dictionaries and give them each a home on the Internet.

At Wordnik we believe that every word of English deserves to be lookupable!

The internet is, for all practical purposes, infinite. Wordnik can and should include every English word that’s ever been used.

Why?

Every word deserves a recorded place in our language’s history. We want to collect, preserve, and share every word of English, and provide a place where people can find, learn, annotate, comment on, and argue about every word.

If you want to know more about a word—any word!—we want to help you find the information you need. If you’re curious about a word, why should you have to wait until someone else decides that a word is worth knowing?

How?

The reason so few words are added to traditional dictionaries is because writing definitions takes a long time. A very talented editor may write seven entries in a day, or she may need weeks to describe just one word. Dictionary definitions are very difficult to write. 

Wordnik takes a different approach. Instead of writing traditional definitions, we search for casual definitions that have already been created. You see these casual definitions all the time in good writing!  Here’s an excellent casual definition for the word bibliobesity from Terry Teachout, in the New York Times:

“The problem of bloated books—bibliobesity, as it were—has always been with us.”

And another, for cookprint, from Consumer Reports:

“And cut down on your cookprint—the energy you use to prepare the food you eat.”

And for misophonia, by The Huffington Post:

“The condition is called misophonia—literally ‘hatred of sound’—and occurs when a common noise, whether it’s something like a person chewing loudly, water dripping or someone ‘ahem’-ing, causes you to become anxious or angry, more so than a typical response, TODAY reported.”

Sometimes we call these self-defining, explanatory sentences “free-range definitions”, or “FRDs” (pronounced “freds”).

By finding these sentences (and we’ve already found a lot of them) we can help people understand new and rare words the way people normally learn new words: in context.

Why Wordnik?

Wordnik has been collecting and sharing words since 2009.

Our mission is to collect all the words of English and share them with everyone. We share words through our website at Wordnik.com, and through our API at developer.wordnik.com. Last year, we reincorporated as a non-profit, because we believe the English language belongs to everyone.

Every word at Wordnik gets its own full page, with as much data shown as possible: a standard definition (if one already exists), example sentences; synonyms, antonyms, and other related words; space for community-added tags, lists, and comments; images from Flickr and tweets from Twitter; and statistics on usage, including how many times a word has been favorited, listed, tagged, commented-upon, and, of course, whether or not it’s valid in Scrabble (and how many points it scores).

Wordnik also has an open API! For web and native app developers, instead of licensing a less-complete dictionary or finding public-domain resources and adapting them, you can use our API to incorporate information from our site in search results or as full definitions.

At Wordnik, we don’t want to be a gatekeeper between you and the word you’re interested in; we just want to show you the data that exists and let you make up your own mind! And we want to do this for every word.

How exactly will Wordnik do this?

We’ll start by taking the thousands of free-range-definition examples (FRDs, or “Freds”) we’ve already found, and use those as a training set for a new machine-learning-based tool. 

We have lots of example sentences like this one!
We have lots of example sentences like this one!

Then we’ll start looking for more great sentences, beginning with the hundreds of thousands of words that have already been looked up on Wordnik but for which we don’t have good data. (If a real human looked something up, it is more likely to be interesting.)

We’ll also update Wordnik so that any time a word is looked up that we’ve never seen before, we’ll kick off a search to find more data about it.

We won’t limit ourselves to words that are more frequent than one-in-a-billion, either. If a word exists at all, we’ll show you what we can! 

A million words missing from dictionaries? How can there be that many?

In 2010, Harvard researchers published findings in the journal Science that began to quantify the number of definition-less words in English. Using the Google Books Corpus (5 million books, 361 billion words) and comparing samples to major dictionaries (including the Oxford English Dictionary [OED] and the Merriam-Webster Unabridged Dictionary [MWD]), the researchers estimated that “52% of the English lexicon—the majority of the words used in English books—consists of lexical ‘dark matter’ undocumented in standard references.”

Jean-Baptiste Michel, Erez Lieberman Aiden, and the other authors of the Science paper defined a word as a meaningful string of letters that occurs more often than once every billion words. (1 in a billion is the frequency of the least common words in the dictionaries they analyzed.) Using this definition, they used their data to estimate that the English language contains 1,022,000 words—each of which appeared at least 362 times in their 361 billion word dataset.

The second edition of the OED has 615,100 “word forms defined and/or illustrated”; MWD has around 470,000 main entries (not including word forms).

Once you take into account words that appear less frequently than once every billion words, there may be a million or more missing words!

Do we really need all these words?

We already have all these words in English! They exist right now in articles, books, blog posts, and even tweets. But they’ve never all been recorded in one place where they can be discovered and loved.

Have you ever felt that the right word was out there, but you just couldn’t find it? 

Have you ever learned a weird word that made your whole day? Perhaps a word like thoil, which means ‘to be able to justify the expense of a purchase’? Or pandiculation, which means ‘yawning and stretching (as when first waking up)’?

Many of these missing words aren’t words we use every day, but their existence enriches the whole language. We believe every word should have the right to be included in the dictionary.

At Wordnik, we want to gather data about every word—and then you can use that data to decide whether a word is worth adding to your vocabulary. We’ll just find the words: what happens next is up to you!

What Will Backers Get?

In addition to adding all these new words to Wordnik for everyone to enjoy, we will be releasing a report to backers in March of 2016 detailing our progress and giving an overview of the words we’ve found and added. Every backer will receive this (digital) report.

We have some other splendiferous rewards, too!

  • RANDOM BACKER: for one measly greenback, you can be a random word adopter. We’ll add your name to a “random sponsor” list that will display one random sponsor’s name every time someone clicks the “Random Word” link at Wordnik.com. We’ll also choose one random backer to receive all the other under-$500 rewards!
  • WE ❤️ STICKERS BACKER: At $10, we’ll send you a complete set of Wordnik stickers, plus a sticker conferring membership in the Semicolon Appreciation Society. We’ll also add a “Backer” badge to your Wordnik profile page.
  •  ADOPT A WORD:  For $25, we’ll list you as the proud adopter of your word for a year, and send you a full set of Wordnik stickers, plus special word adoption stickers and a downloadable commemorative adoption certificate. We’ll tweet about your adoption to Wordnik’s wordy followers (more than nineteen thousand of them!), and we’ll also add an “Adopter” badge to your Wordnik profile page. Words are first-come, first-served, so back early!
  • YOU DESERVE A MEDAL!: Seriously. For helping Wordnik and adopting a word, backers at the $45 level will get an honest-to-goodness backer medal, plus all the Adopt a Word rewards!
Your medal will look something like this, only more Wordnik-y.
Your medal will look something like this, only more Wordnik-y.
  • OOH, POSTER: For $75, we’ll send you an 18×24 poster featuring a selection of the new words added to Wordnik! What will it look like? We don’t know! But we’ll be sending regular updates to backers at this level to get your input on the design! (Extra $15 shipping for international backers) [Limited reward: only 500]
  • NOMINATE A WORD: want to suggest a specific missing word? At the $100 level, we’ll add your candidate to our research list and update it (data permitting) in the first batch. You’ll also be able to record the audio pronunciation for your word! Of course, you’ll also get all the $25-level adopter perks, and your Wordnik user page will show a “Nominator” badge! [Limited reward: only 1000]
  • WORDSMITH: For $250, not only can you suggest a specific missing word and record the audio pronunciation, we’ll also include the example sentence of your choice and link to its source. (Great for writers!) And we’ll make your word one of our words of the day for 2016, through the Wordnik site, our email list, and Twitter. You’ll also be the adopter of record for your word for TWO years, and get the full set of $25 adopter perks. (Your Wordnik user badge will read “Wordsmith”.) [Limited reward: only 45]
  • WORD-OF-THE-DAY TAKEOVER! At the $1000 level, you choose our words of the day for a whole week. Yep, choose any seven words you want (with the examples of your choice!), and we’ll send them out to thousands of word-hungry recipients! [Limited reward: only 12]
  • FOREVER ADOPTION: For $5000, adopt the word of your choice … FOREVER. We’re only making ten slots available! Obviously you’ll get all the other adoption perks, and your Wordnik user badge will read “Patron”. [Limited reward: only 10]
  • NEOLOGISM FOR YOU: Looking for a word that just doesn’t exist? At the $7500 level, we will create one for you to your specifications! Obviously you’ll get all the other adoption perks, and your Wordnik user badge will read “Neologist”. [Limited reward: only 5]
  • SPONSOR A LETTER: For $10,000, your name will appear on every word beginning with the letter you sponsor! Letters will be first-come, first-served. (The letter S has already been sponsored.) [Limited reward: only 25]

Your donation is tax-deductible!

Wordnik is a non-profit effort. For US backers, your donation is tax-deductible and may also qualify for matching funds from your employer if they support such a program for charitable contributions. While we file for our own non-profit status we have teamed up with a well-known 501(c)3 fiscal sponsor, Planetwork (they were also the fiscal sponsor for Hypothes.is). For any contributions you make you will receive a letter from Planetwork/Wordnik verifying that your contribution qualifies as tax-deductible.

How Will We Spend This Money?

A rough breakdown
A rough breakdown

Our biggest expenses for this project are adding the extra servers and storage to find, process, and serve data for a million (or more!) new words. Although we believe we can be more efficient than our current setup, we’re basing our costs on what we spend now to run Wordnik’s servers now.  

We also are hiring some amazing machine learning consultants to ensure that we’re doing this The Right Way. (We expect to make their names public very shortly after launch of the campaign.)

We also will be spending a little money to make cool stuff to reward our backers, and and, of course, there are Kickstarter fees.

Wait, What Do You Mean By “Word”?

This is a very good question! Linguists, lexicographers, and grammarians often struggle with what makes something “a word”. Here are some types of words that are A-OK with us and that we will include when we find them:

  • Nonce-words (words used just once)
  • Affixes (prefixes and suffixes), and words created by adding affixes to existing words
  • Fixed phrases and idioms
  • Words borrowed from other languages and not yet considered naturalized in English by other dictionaries, but used by English speakers and writers.

Here are some things we DON’T consider “words” for the purposes of this project:

  • obvious typos or OCR errors. Of course, then you have to figure out what’s “obvious” … we’ll be looking at words that are very similar to existing words and show up in identical contexts, e.g.:
  • keysmash, e.g. asjdlkfjasdf. We’ll be doing some filtering to see if new strings we find are “shaped” like English words.
  • Emoji(s). Sorry! Go look them up at Emojipedia.

Even More Things To Look At! 

If you want even more information about why MORE WORDS ARE BETTER, you probably will like these two TED videos: 

 Thanks so much for making it all the way to the end! 

Contact Information:

Erin McKean of Wordnik

View Related News >
support