With the number of Internet services and web apps growing, the amount of statuses and indicators is itself growing. At the end of the day, you end up logging into 20 different websites in order to get a single value from each of them. And there are so many of them, just to name a few: StatCounter, Google Analytics, Apple iTunes Connect, Flurry, etc.
I am really getting tired of logging into all these websites, and here is my plan to fix it.
Here is what is needed:
- Extraction scripts. A script for each website from which data has to be extracted. Best case, the website has an API that provides you with clean data, and you can get down to business. Average case, you have to code a crawler for that website, extract the data yourself and store it in some file or database.
- Regular downloads. A server that runs the extraction scripts regularly, and get the most up-to-date data from the websites you want to follow. This server might hold all the API keys and passwords to the services and websites you are using, and that can be a serious security risk.
- Normalization. A normalization process, which transform all the extracted data into some common format that is independent from the source, so it can be displayed and transformed easily.
- Dashboard. A display interface or dashboard, which can take data from multiple source, and display them on one common screen.
Part 1 can be done in any language, and with any library. I would rather do it using scripting languages such as Perl, Python or Ruby.
Parts 2 and 3 and relatively trivial and just administration tasks.
Obviously, Parts 2, 3 and 4 are just a one-time task. Once this is done, most of it will be stable and won’t need much update. These parts have to be done only one time and then distributed as an open source project.
In addition, the extraction scripts needed for Part 1 can be easy shared and reused. Indeed, once a user has coded a crawler for some service that does not have a public API, then this code can be made publicly available to the Internet so others can save time. In in the end what is needed is a crawler repository or some sort of wiki on which crawlers can be posted.
Not re-inventing the wheel
Commercial solutions already exist. The best I have found so far is Geckoboard. The cheapest plan at the moment I am writing this is $9 per month, which to save millions of seconds is not that much. The advantage is that they already offer crawlers for services that have public APIs. For all the others, well, you have to code your own scripts. I am still holding it, as I think $9 is a bit expensive for personal use (even though that’s an acceptable price for business use). But honestly, I’d rather pay nice bucks than spend a dozens of hours recoding some boring framework.
I also found some discussion about some open source dashboard software on Quora. So far, nothing seems to compete even close to Geckoboard.
I also took a sneak peak at Piwik (open source alternative to Google Analytics), so see if there were any way to plug-in some custom data. It seems there are ways, but it’s not that convenient, as you have to put the data into some specific MySQL table format. I found a blog post that talks about it.
Why there will never be startups offering this service for the mass
Social network aggregators have been around for a while. But offering a service that would centralize any other website one may need is something else, and would be really hard. Indeed, how many people you know can code a crawler? Not that many right? So when some regular user will need to monitor some data on whatever website, he will be completely unable to do so. In addition, having an online app doing all this means that one will have to store the private access keys and possibly passwords of thousands of people. It won’t take long before some guy gets in and steals everybody’s passwords. For these two reasons only (and I could find more), there is no market for such software. The best that can be done is what Geckboards and clones did, which is selling the service with a B2B model, and hope that enough businesses will adopt it.