Introducing “Silver”

Start your day with TPM.
Sign up for the Morning Memo newsletter

Amiable introduction

Greetings, all. Over the next few months — and, God-willing, for ages to come — we will be rolling out a series of open-source tools to contribute to the growing, fecund field of journo-coding/data-driven-reporting/technocratic inquiry. See, we are in the business of standing on the shoulders of giants, and we think you should be as well. Chances are we are already standing on your shoulders. We look forward to the ouroboros imagery that this code dependency encourages.

But enough introduction. Is this a prologue, or the poesy of a data-driven-reporting-driver? Without further yammering, we are proud to introduce Silver, a Redis-backed database cacher, indexer and searcher.

Boilerplate blues

So, what is Silver, and why does anyone need it?

Sometime in 2009 — the exact details are lost to my youth, novice and languor — Salvatore Sanfilippo released Redis, “an open source, advanced key-value store.” Now, many people already have heard of Redis, but for those who don’t anxiously refresh Hacker News for the latest scoop on NOSQL databases, angels, b-trees and heated bit-twiddling, here’s a brief introduction. Redis is similar to mongoDB, Cassandra or any of the other hot, new alternatives to relational databases such as MySQL or PostgreSQL. Rather than storing data in a logical system of schemas, joins, associations and all that jazz, so-called NOSQL databases, such as Redis, simply store data as the values of keys. Of course, this is a gross simplification. The way these products accomplish this and the various, sophisticated abstractions of this basic idea are manifold. You are welcome to read about the differences that make each of these products unique. Just not here.

What are the advantageous to Redis and its ilk?

  • They are deadly fast. Redis keeps all of its key-value pairs in memory. Tim Lossen compares the speed advantage to the difference between taking the Eurostar from Berlin to London and pushing a broken car the same distance. (10 hours vs. 114 years)
  • They are light, easy to set up and maintain. Replication, for example, requires a single line of configuration.
  • Redis, at least, does more than just store data. Need a simple PubSub messenger? Redis does that as well, making it perfect for event-driven applications, real-time, socketed webapps and whatnot.

However, these fancy new data stores are not perfect and, likely, won’t replace traditional relational databases. They are not ideal for large datasets because they hold all or much of their data in memory. Furthermore, they are new territory. Whereas there are legions of MySQL sages meditating inside the decommissioned shells of old Oracle machines, even the NOSQL forerunners only have a few years experience. This makes troubleshooting rarely a routine ordeal, especially at an enterprise scale. Just look at what happened at Digg when they flipped the switch to Cassandra. (Note: this is not a criticism of Digg or Cassandra. Just a demonstration that new things are sometimes touch-and-go at scale.)

But there are some (read: many, many) things that Redis is great at. We already use it for real-time stats tracking and as a caching layer for many of our applications. In fact, we are so loyal to Redis that we found ourselves writing boilerplate for Redis caching in almost every new app we wrote. Need to slurp tweets? Shouldn’t have to hit Twitter every time, why not store the tweets in Redis. Just gotta access stories and pictures from a CMS database? Tens of thousands of entries later, this become a Herculean task for MySQL, especially at high traffic. Solution: cache it in Redis. Need to index your images by their captions? You could use Sphinx or … just use Redis. It’s really a swiss-army key-value-storing, pubsubbing wonder.

See the issue? When you have a hammer and everything is a nail without that big, flat top part, you get tired of having to weld heads.

A challenger appears

Enter Silver. Silver is designed to be a simple, lightweight wrapper for all your calls to a database that you want to cache or index with Redis. It is completely database/web-service agnostic so you should be able to use if for anything you can imagine caching. This is all well, but you are just here to see how this works. Right. (What follows is taken from Silver’s README. If you have already read, you already got it.)

First make sure you have Silver installed.

gem install silver

Now, let’s pretend you have an app that queries your database for entries frequently. Entries are added frequently. Furthermore, you only want Entries that come from a specific blog, blog #12. Also you want to grab something from an association of the Entry row in the database. Let’s say the author’s name.

First, instantiate a new cache object.

cache = Silver::Cache.new("12_entries","created_time") do |date|
  Entry.all(:order => :created_time.desc,
                :created_time.gt => date, 
                :blog_id => 12)
end

The first paramater passed to the constructor is the name you want to give to this cache in Redis. Silver lets you creates as many caches for as many different queries as you would like. The second paramater is the name of the field that you will be using to determine if there are new entries. Finally, you pass the constructor a block that will receive the date of the newest cached entry from Redis. You must return the entries in reverse chronological order for Silver to be able to keep them in order. Silver will then query the database/service for newer entries when the instance’s find method is called. This is what a find call looks like:

results = cache.find do |entry|
  attrs = entry.attributes
  author = {:author_name => entry.author[:name]}
  attrs.merge author
end

The find method of a cache instance takes a block that will be called for every new entry. The results of the block call should be a hash that will be stored in the cache. The whole thing will be converted into JSON and stashed in the Redis cache. From now on the database will never have to be hit again to return this value. The find method returns an array of all the results old and new from the Redis cache.

If you just want to read from the cache without hitting the database, simply call find without a block and with a single param: false

results = cache.find(false)

Currently, the cache does not support the changing of cached entries and is, thus, intended for data that is unlikely to change once it has been written to the database. This feature will be included in future releases of Silver.

Finally, Silver provides a cull method.

cache.cull(30)

This will cut the Redis cache down to the 30 most recent items.

However, Silver is not just a simple cache. It can also be used to index a database. It is optimized to index based on short text, such as names, captions, tag lists, excerpts, tweets etc. There is nothing stopping you from using on longer fields such as body text except the size of your memory allotted to Redis. Silver uses a stupidly simple fuzzy text search. The search will likely be augmented in the future.

Here’s how you would index a mess of photos by their captions, falling back on their filename if no caption is given. First, instantiate a new index object.

index = Silver::Index.new("blog_pictures","created_time") do |date|
  Picture.all(:order => :created_time.desc, :created_time.gt => date)
end

This is the same deal as before with Silver::Cache: redis key name, time field, ordering block. Next, call the find_and_update method of the instance.

index.find_and_update do |result|
  output = result.label || result.filename || ""
  id = result.id
  [id,output]
end

Find_and_update takes a block that will be called for each db-fetched result. This block should return a two-item array of the row’s id, first, and the value we are using for indexing, second. As you can see in the example, Silver lets you mix fields to use to index. It let’s you do anything you want actually as long as an id and a corresponding value are returned. After calling find_and_update, your database is indexed and ready to be searched. Say, we wanted to search for photos of “Barack Obama”:

search = Silver::Search.new("Barack Obama","blog_pictures")

The constructor takes a string to search for and the name of Redis key storing the index. To actually perform the search:

search.perform{|id| Picture.get(id)}

The perform method takes a block that will be passed the ids of all the id’s whose indexes match the query. Perform will return an array of database/service objects for you to then interact with as you please.

There are more features of Silver that you can checkout at the GitHub repo or at the Rocco-annotated source:

Sharing is caring

Now, Silver is a young piece of software. We heard once that a good strategy is to release early and iterate often. In that spirit, Silver should be considered strictly beta. It passes all of its specs but has only been tested with DataMapper and SQLite/MySQL stores. We hope you will all start using silver, file bug-reports, contribute patches and follow the evolution of this embryonic product.

Oh, this is also my first gem. Hurrah, I’m becoming a real man.

We hope you enjoy. Write now, write often to erik@talkingpointsmemo.com

Love, Erik

Latest News
Comments
Masthead Masthead
Founder & Editor-in-Chief:
Executive Editor:
Managing Editor:
Associate Editor:
Editor at Large:
General Counsel:
Publisher:
Head of Product:
Director of Technology:
Associate Publisher:
Front End Developer:
Senior Designer: