But how do you keep track of everything?

[image]

Here's an interesting paradox for you: Despite the fact that there's a slow and steady pull away from RSS and Atom news feeds, it seems that there's more people than ever who are actually reading 'feeds' in one form or another. People who, just a few years ago, you'd never expect to be scrolling through a river of news, can't wait to log in to Twitter or Facebook, scrolling and clicking on links shared by their friends or pushed by custom news 'apps' right into their stream (which they then turn around and share with all their friends). Mobile apps like Flipboard or Pulse then collect all these social links, plus custom feeds from magazines and newspapers and create a good-looking booklet full of current news. It's all so seamless and amazing. Honestly, it's the most efficient way possible to see the latest amusing cat videos ever invented.

But here's the thing, what if you actually want to keep track of important information without relying on your social network to find it for you? What happens if you want to see EVERYTHING? It seems right now there's very few options - mostly just Google Reader, really, which relies on ever-harder-to-find RSS feeds to keep it going.

I mean, imagine you're at one of the analyst firms, or at Engadget, or in the marketing department of a large multinational corporation - technical or not - and your job is to keep track of all the news, all the conversations and anything else that happens in your particular area of interest, how do you do it? Flipboard, as pretty as it is, really isn't going to help a manager of a large dentist's office keep track of the latest amazing developments in the world of oral hygiene, and a business analyst isn't going to use Google Reader to keep track of the news concerning the various Fortune 500 companies in their watch list. Well, I hope not, anyways.

How do people keep track of everything? Are there any tools that I don't know about? Do they pay the thousands of dollars a month for those custom 'news tracking' services which send market summaries of every product or brand mention? Is there such a thing as a Bloomberg Terminal for tech and industry news? Is there something else I'm missing?

You know what I think? I think the vast majority of information workers out there are simply winging it. I think they browse the web randomly, read Yahoo's home page (in secret of course) or browse Google News, maybe check out a newspaper every other day, a magazine now and then, have 5,375 unread items in their Google Reader, and otherwise hope someone emails them interesting links, or pray they happen to be online at the right time when one of their Twitter contacts posts a relevant link to what they do for a living.

Because unless I'm missing something, and I'm pretty sure I'm not (as I do, in fact, read everything) they really don't have many other options.

Over the past several years, I've been continually developing and tweaking a personal news reader web app for exactly this reason. At first it was just hosted under /feeds on this site but eventually I decided to give it it's own domain, which was flip.io first (before Flipboard launched) and has wavered between xs.io, maven.io and the latest, magnet.io since. If you've ever seen one of those three domains in your server logs and wondered WTF it was, that was me.

Now this is one of those projects that is simply about scratching an itch. Though I'm always thinking about ways to transform it into some sort of money-making commercial effort, it really is simply an app that I want for my own use. I'm continually tweaking it and have a list of things I need to polish here and there, but even in it's continual work-in-progress state it's surprisingly effective at what it's supposed to do. At first, what I wanted a web-based news-reader that was cross platform, mobile-enabled and was not Google Reader for various reasons. But as soon as I had written the basic crawler, database and river of news pages, I immediately started adding features to help me deal with a deluge of information out there.

Here's just a few features:

24 hour expiration of news items. Why was I constantly going in and marking a huge list of backlogged items as 'read'? You know that massive number of articles and posts that you'll never, ever, ever get to? Why give myself an ulcer. If I'm not able to get to it, then they're just automatically out of my main feed. They're not deleted, or marked read, they just sort of fall off the end. I've adjusted the hours back and forth from 12 hours back to a week, but decided that, for me, 24 hours is good. It's not the exact number that's important, it's just the functionality.

Duplicate detection. Oh, I'd love to tell you I found some magic way of grouping and deleting items from multiple blogs that are basically the same summary of a news item in yet another site, but I haven't done that. I tried various ways, but none seemed to work for me. I know it's possible (look at Techmeme), but I haven't been able to get it yet. Some day. What I *did* do was stupid, basic and *incredibly* noticeable - which is simply look for the same exact damn item title and wack the second or third version in a day. You know how Engadget or TechCrunch have their mobile-focused sub-blogs? But then they end up linking to the same thing half the time? Well, they always use the same title, so I just filter that crap out. Simple, yet oh-so-effective.

Thumbnails. I've had this for years, and I'm starting to see that some of the iPad news reader apps are starting to get the idea. Every tech blog out there - including the best like Ars Technica - seems to REQUIRE that they have some obnoxious, huge, usually stolen, image accompanying every single item they post. I'd say 75% of the time, the image has absolutely nothing to do with the article and is usually nothing more than some stupid clip art, web meme, or upside-down brand logo (WHY?!??). It's very annoying. So the easiest thing to do is find the first image in every blog post and make a thumbnail out of it. If it's valuable, I'll be able to click and expand it, if not I can easily ignore it.

Summaries. Here's another one that some iPad readers are starting to grasp just now. Google Reader - like pretty much every other reader ever made - has two views: Titles or Full Posts. That's it. You either have to slog through every single massive image and lengthy text, or you get the bare information and usually a favicon (which I've decided are nothing but visual noise). What I do is detect the first paragraph (or a reasonable number of words) and use that in my main view - it allows you to fly through items, still grasping the main idea of each post, without having to slog forever. I also have the option of marking some feeds as 'default full', which I do for some link feeds like from Reddit, which don't have useful summaries. This is a no-brainer feature, that I don't understand why Readers don't have.

Visual Sparseness. There's no reason to have 'chrome', buttons, favicons, outlines, shading, colors, etc. bombarding your eyes as you're trying to work through a few thousand news items every day. As time has gone by, I've continually stripped down what my main stream shows me so that I can suck in as much information as possible without having to deal with a lot of crap. The side-benefit of this is there's nothing you have to worry about when you adapt the same interface to the phone, tablet or web, without a lot of extraneous crap in the middle of your feed, it makes an incredibly useful and light interface.

[image]

Logical and prioritized grouping. This isn't much of an innovation, just a note that after playing with various ways of sorting the items, I decided that having them organized into folders, and displayed in the by their published date, but within the order of the folders, was the most efficient. I keep the 'work' stuff up top, like my feeds for mobile-specific news items, and then I have the news, and link feeds like Y-Combinator down at the bottom. If I've made it down there, I'm most likely looking for novelty by that point. :-)

[image]

Integrated bookmarks. At one point I thought, "Why re-create the wheel? Use Delicious, or Pinboard to keep track of your links...", but the speed and efficiency of keeping stuff I want to read later in the same interface is worth the loss of the extra features a full-fledged link tracking app would give me. Also, to make sure I don't ignore the links I've saved, I automatically forward the main feed to the Links page once I'm done with my daily feeds. You know how it is, sometimes you save links that you think you should read, though you never seem to be in the mood to actually read it. Getting reminded of it every time you're jonesing for another good thing to read is quite useful.

[image]

Social Network Grouping. I subscribe to Twitter and Facebook by using their APIs in a different script than the main RSS crawler (which just uses SimplePie to grab the feed into my database before I process it). I don't just dump the entries from social feeds directly into my feeds though, as they'd overwhelm the rest of the links, or be so disorganized as to be incomprehensible. I'd say 90% of the people who post links and thoughts to Twitter or Facebook, do so in bursts. So, I run the scripts every hour, collect all the tweets, and then group them together by poster. Four or five tweets together looks, and reads, pretty much like a short blog post. It makes keeping track of everyone in your social network muuuuuch easier. Note, I used to actually keep appending tweets until I got around to reading them, but that was too much. The current system I have now means I still end up missing a few tweets now and then that hit the expiration limit, but chopping everything up is a bit more manageable - and can also lets me respond within a reasonable time as well.

[image]

Filters. This is a biggie, and not really finished yet. But the general idea is that I want to be able to 'mute' certain keywords for 24 hours if I've seen way too frigin' much of that topic that day. For example, today is E3 and it also seems to be a slow news day, so I've seen the same news about the XBox 30 times. If I thought this was going to continue (as it sometimes does when Apple launches something), I'd mute it. Also, I want to be able to auto-expand items that have keywords I want to keep track of.

But here's a lesson I learned from this so far about filtering - there's no magic. The corpus just isn't big enough. What I mean is, I've yet to find a way to train the news reader to highlight things I'm interested in. The total number of news items and the signals that you give the app (both positive and negative) never seem to make enough of a difference to actually hone future results. I absolutely *suck* at this sort of thing (aka 'math'), so I could be, and probably am incredibly wrong, but I've tried and tried again to get something working. In my reader, in addition to the links that I can explicitly mark with a star, *every* link I click on is recorded. So if I click on a link in the middle of an post, I record that URL, link title, and the parent item's full text in a history table. I've then used that and my explicit favorites to try and train a Bayesian filter, but even with thousands of clicks and hundreds of thousands of example items (which is about 6 months or so of feeds), I can't seem to get the damn thing to give any sort of reliable results. My goal was sort of a reverse SPAM filter - items that matched the Bayesian training would get marked - either displayed as a full post (instead of a summary) or getting a colored title bar, or whatever. But I've yet to get it working.

I suspect though, it might not be me. I think the amount of effort you'd need to make to mark items as like/dislike, and the amount of items you'd need to have as an example to compare against aren't practical. For SPAM, for which there are already massive examples of both, and highly tuned keyword filters, etc., it works (though for anyone who's managed their own SpamAssassin server, not always particularly well), but the meager numbers a personal news reader pulls in? I don't think it's going to happen. Then again, I visit Techmeme, and see all the automatically grouped, prioritized and filtered articles and I think I must be doing something wrong - even with the human editors, there's a lot of automated categorization as well. That's basically the same problem I'm trying to solve, so it must be possible with the same sort of inputs, or at least more feasible than I think.

Okay, so that's just some of the stuff. There's more: Full-text search, a JSON-based API and Javascript front-end with infinite scrolling which works on pretty much every smartphone I've tried it on, auto-marking read-items in batches, so when the iPad's Safari crashes (as it does quite often) or I'm on my phone and reading an item and get interrupted, I never actually lose those items as they're not marked read until they've scrolled well past that spot.

Anyways, the end result to all this is a custom-focused web application which allows me to keep track of hundreds of news feeds, social items, and even 'firehose' feeds without losing my mind or dedicating 20 hours a day to reading. My 24 hour timeout rarely gets used, as I'm able to zip through the feeds very efficiently - but I also don't have to fear opening up my reader after I spend a weekend out and about with my kid either. I can happily say it's the perfect news reader for me (because of course I made it for me.).

I think the next steps will probably be more focus on the filters - I imagine a full set of "If This, Then That" style item filters like you can create using an email client. Figuring out how to enable more intelligence - not necessarily magic, but at least automation - to the filtering process is key. Also, I see a time shortly when web scraping, rather than just feed parsing, will be something I'll need to add, as well as more API integration, as it's pretty obvious RSS is going away. (Though I'd love to see it replaced with something like a JSON version, that had added bits for requesting deltas - sort of like PubsubHubbub, but easier.).

Could this ever be a business? It depends - I'm not sure it'd be a consumer-level offering, but I bet there'd be a lot of info worker professionals who'd be willing to pay a few bucks - Pinboard.in style - to leave Google Reader behind for something more customizable and efficient. We might be coming full circle on a cycle of feed aggregators. It started with the original ones like Amphetameme and Radio Userland, then Newsgator and FeedDemon and Bloglines, then Google Reader came and seemed to suck the air out of the whole area. Now maybe after a break, there's another wave coming. Honestly, it doesn't seem like anyone has ever figured out how to make any real money from a feed reader though, which is probably why they all seem to fade away over time. This makes sense in a way, as the core focus of an aggregator is someone else's content. Even if you think of a feed reader as akin to a full web browser in its functionality, again, when's the last time anyone made any money making a web browser? Still, there's value there, you can feel it. It's just a matter of figuring out what it is.

So if you made it this far, now you know what I do in my spare time. I'm either reading everything I can, or figuring out ways I can read more. (Yes, I probably need help... Is there an info-junkie's anonymous? Sign me up.)

-Russ

< Previous         Next >