Hacker News and SaaS

An interesting thing happened on Hacker News yesterday that highlights some complexities around innovation and compensation in today’s world.

Someone posted a link to visitor.js, which is a hosted piece of JavaScript that gives details on your visitor (like which city they’re in, date of last visit, etc.).  The creators set it up as a paid service.  You’d have to pay at least $10 per month to use it.

Within hours, someone posted a free open source version of visitor.js.

However you feel about this emotionally, here are two facts I believe to be true:

  • The creation of the open source version of visitor.js is good for the industry as a whole.  It’s yet another tool that developers everywhere can use and repurpose however they see fit.
  • The open source version makes it harder for the people behind the original visitor.js to make money from their efforts.  Hacker News has provided a disincentive for their innovation.

The economics of software are funny.  If you build a service that’s really deep so no one can easily copy it (like Twilio), you can survive.  If you build a site that accumulates millions of users, no one can easily take them away from you even if they replicate your product.

But if you do something that’s kind of cool, like visitor.js, you’ve added value to the ecosystem of technical tools, but… you don’t get compensated for this value

This dynamic creates a world where independent developers are increasingly effective at building things, but making money from what we build is more difficult.

Fun with Reddit Image Data

Reddit is one of my favorite sites.  It has a fantastic community that generates a wealth of interesting data.

Spidering image posts to Reddit

I wanted to play around with a slice of this data (image posts).  Getting all these images was kind of annoying, particularly because Reddit’s API only returns 1000 search results, and Reddit rate limits you to one request every two seconds.  So I wrote a cron script to suck down all new image posts every hour.

But I also wanted accurate scores and comment counts.  So I had another cron script re-download information on every post 1 week after it went live (to allow time to gather up/down votes and comments).

I wish Reddit would distribute an occasional database snapshot.  That would be a great data source to play with.  But, alas, they don’t, so I’ve had these scripts running for about 5 months.

If you’d like to play with the image links I spidered, you can use my snapshot (230,248 images, 19 megs compressed, JSON output from SOLR):

links.json.gz

The structure of each entry looks like:

{
        "permalink":"/r/pics/comments/jjpng/the_singlemost_sad_moment_of_my_childhood/",
        "author":"A_Slow_Descent",
        "url":"http://i.imgur.com/Gmjyq.gif",
        "num_comments":3,
        "sequence":628406,
        "subreddit":"pics",
        "score":4,
        "over_18":0,
        "title":"the single-most sad moment of my childhood",
        "thumbnail":"http://thumbs.reddit.com/t3_jjpng.png"
}

Example Application

I wanted to build something that surfaced interesting new photos with minimal effort.  Reddit is great first thing in the morning when it’s full of fresh content.  But what if I want to goof off and it’s only been 15 minutes since I last browsed Reddit?

I can keep paginating further and further away from the front page.  But them I’m constantly scanning links to see if I’ve visited them.  And there’s no way to tell when new content is on the front page.

What I really wanted was a single button that I could click as much as I wanted, each time getting a new photo.  I don’t really care to see the most recent photo — just give me a reasonably good one, and make it easy to keep seeing more.

Try it out!

Random Next

Every time you want a new image, just click the “NEXT” button.  To make that work:

  • I threw the content into a SOLR index, which allows filtering by subreddit as well as random sorting.
  • JavaScript handles resizing images (for smooth user experience), randomizing SOLR results (so you get a new path even if you reload the page), and updating the DOM with the image.
  • To improve quality, the default view filters out images with a score under 100 (which eliminates the bottom 80%).

Synchronized Viewing Experience

Perhaps the most novel feature is “synchronized viewing”.  Everyone who visits the image-viewing page gets a distinct channel as a URL.  If you share this URL with a friend, you’ll see the same images.  Each time one of you clicks “NEXT”, you’ll both see the same new image.

To build this feature, I used PubNub, which is a great service to publishing and subscribing to channels.  It’s really easy to set up.

Every time someone visits the page, I give them a random channel (if they don’t already have one).  And then every time they view another image, I publish that image’s id on the channel.

The page also subscribes to this channel, so everyone else on the same channel will receive the image id and display it.

Results

I found the image explorer fun when the images have immediate impact (like a beautiful landscape photo rather than a rage comic).  I also discovered lots of subreddits I never knew existed.

Some images I like:

A bullet going through some M&Ms

Butterfly Tongue

Just a baby hippo taking a bath

Attack!

Sunrise reflected in a bubble

Mount Bromo

Acknowledgements

In addition to PubNub, I’d also like to thank imgur.com for hosting all these images for free.

Hey YouTube, Implement This!

There’s practically an infinite supply of videos on the Internet.  But there are limited ways to discover and view video content.

Rather than waste words trying to explain visual concepts, let me just show you.  I took the top 1000 videos of all time on YouTube, and threw them into this special player I built:

Try it out!

Play around with it for a while, then come back to me.

The Product

People are great at processing moving images.  And we should be, since our visual cortex takes up 1/3 of our brains.  I wanted to create a viewing experience that took advantage of all this neural hardware.  I wanted users to feel a pleasant flow as they decide what to watch, and quickly jump from video to video.

To that end, I implemented the minimum set of features to provide this experience.  I didn’t want any other features — such as a speed control — that might introduce an additional cognitive loop.  I want viewers to just let the visual data flow, and not concern themselves about whether to speed up or reverse the stream.

Another aspect of the user experience, is the flow is infinite.  The players shows random slices of random videos, and it goes on forever.  1000 videos is only a tiny fraction of YouTube’s library (much less the entire web’s library).  But you can watch this player for a long time without the content losing its novelty.

The Technology

Some highlights from the technical side:

  • I use ActionScript because I had code with a polished player interface (i.e., displaying thumbnails while scrubbing the video).  The main reason to use html5 vs. ActionScript is to run on iOS devices.  But the browser on iOS doesn’t let you embed playing videos into a page, so it wouldn’t have worked on those devices anyway.  For what it’s worth, I really enjoy working in ActionScript, and I’m sad it’s going away.  But that’s a topic for another post.
  • The player takes advantage of high bandwidth.  Broadband speeds have been quietly increasing over the past few years.  Most people have much bigger pipes than a typical video play uses.  For instance, a DVD-quality video stream is roughly 1000 kilobits per second (or kbps).  I’ve got a cable modem, and I just measured over 20,000 kilobits per second.  In other words, my computer could stream 20 DVD-quality videos at once.
  • All the thumbnail videos you see scrolling by are compressed to 50 kbps (they’re 160×120 pixels with no audio).
  • I used YouTube’s API to get these videos, but unfortunately, their API only returns 1000 search results.  So I have no practical way of adding more content.  I really wish API designers would allow access to entire content libraries, but that’s also a topic for another post :)

Next Steps

I would love to see this “video wall” player as an option on all video sites.  I think it works really well with the enhanced interactivity possible on the Internet.

Technically, I can hook this player into different content libraries.  But there are a couple practical obstacles.

First, to make scrubbing work well, I need sprites that show different frame from the video.  Second, I need tiny 50 kbps thumbnail streams that show the videos scrolling across the wall.  For full integration, the site hosting the content needs to generate the sprites and tiny videos as part of their encoding pipeline.

Search vs. Browse

The two dominant metaphors for exploring information are searching and browsing.  Search is great when you have a clear intention behind what you’re looking for, and the information is well structured.  The “clear intention” often involves spending money, and that’s why Google is a $200 billion company.

But I’ve always had a soft spot for browsing.  When browsing, your intention is often a more diffuse desire for entertainment.  The information can be unstructured, like images and videos.  Search is serious, browse is fun.

In particular, I find browsing an interesting problem when your set of content is so large as to essentially be infinite.  I’ll go into more detail on aspects of browsing infinite content, but first here are some concepts that guide how I approach browsing.

Content

Content is the set of items available for browsing.  On Netflix, it’s the movie library.  On Facebook, it’s all the posts that are visible to you.

Filters

A filter is a subset of the content.  Common filters include:

  • Text searches
  • Categories matching
  • Recommendations (or other relevancy filtering)
  • Date ranges

Sorts

Once you apply a filter, you need to sort the results.  Now things get a little more interesting, based on how you sort:

  • Alphabetically — rarely the right choice.  Even in the case of searching text, you’re better off sorting by relevancy (which any decent text indexing tool like solr will provide).
  • Most popular — good in that you’re showing high-quality results, but bad in that the “most popular” view doesn’t change often.  If a user is likely to look at this slice of content regularly, you need to shake it up.  Some sites solve the problem of stale results by combining a “most popular” sort with a “content added in past week” filter.
  • Most recent — opposite trade off from “more popular”.  You get dynamic results, but their quality is low.  Works well if timeliness is important and the user has other ways of pruning the content for quality (like your Twitter feed).
  • Recently popular — I personally like to blend popularity and recency.  The easiest way to blend these two metrics is to discount the popularity of each piece of content by its age like so: recently\_popular_{i} = \frac{popular_{i}} {age_{i} + k}, where k is a constant that modulates how drastically popularity falls off with age.  It’s inherently subjective, so you’ll need to dial it in.
  • Random — rarely seen, but can be a great choice if the novelty is the most important factor in the user experience.

Views

Now that we’ve filtered and sorted, there are interesting and underappreciated choices involved in how to display the content.

  • Fixed number per page or infinite scroll?  Infinite scroll has become fashionable in the past couple years, but it’s not without drawbacks.  As you add more elements to the page, it becomes heavier and less responsive. An even bigger issue with infinite scroll is state is lost when a user clicks to view details on something, and then clicks back.
  • What meta data to show?  It’s easy to scan information in the browse page, so all the stakeholders involved in the project will likely argue for their piece to get on the browse page.  But you also don’t want a cluttered look.

Default Values

One thing I’ve learned from watching users interact with web sites is the vast majority of them (like 95%) never change default values.  This is so important, I’ll say it again: your users will not change default values.

So while you could present lots of choices on how to filter and sort and view, the only one that matters is the default.

Infinite Content

As I mentioned earlier, I find some of these choices around browsing interesting in domains where there is so much content as to be essentially infinite, and when the user is just seeking novelty.

Here are a couple examples:

  • An endless flow of videos, which serves random moments from the top 1000 YouTube videos of all time.  It’s instantly engaging — no need to think about what you want to watch.
  • A social way to browse pictures.  I got these pictures from Reddit, which is one of my favorite sites.  But I often get frustrated browsing pictures on Reddit, because their “recently popular” sort doesn’t change enough.  I wanted an interface with a simple “NEXT” button to give me a fix whenever I click.