With the number of Internet services and web apps growing, the amount of statuses and indicators is itself growing. At the end of the day, you end up logging into 20 different websites in order to get a single value from each of them. And there are so many of them, just to name a few: StatCounter, Google Analytics, Apple iTunes Connect, Flurry, etc.
I am really getting tired of logging into all these websites, and here is my plan to fix it.
The Plan
Here is what is needed:
- Extraction scripts. A script for each website from which data has to be extracted. Best case, the website has an API that provides you with clean data, and you can get down to business. Average case, you have to code a crawler for that website, extract the data yourself and store it in some file or database.
- Regular downloads. A server that runs the extraction scripts regularly, and get the most up-to-date data from the websites you want to follow. This server might hold all the API keys and passwords to the services and websites you are using, and that can be a serious security risk.
- Normalization. A normalization process, which transform all the extracted data into some common format that is independent from the source, so it can be displayed and transformed easily.
- Dashboard. A display interface or dashboard, which can take data from multiple source, and display them on one common screen.
Part 1 can be done in any language, and with any library. I would rather do it using scripting languages such as Perl, Python or Ruby.
Parts 2 and 3 and relatively trivial and just administration tasks.
Finally, for Part 4, the best would be some kind of Javascript app, so the dashboard thing can be accessed from anywhere. Here is a list of Javascript data visualization librairies. In these, I’ll admit that Highcharts looks pretty cool.
Sharing work
Obviously, Parts 2, 3 and 4 are just a one-time task. Once this is done, most of it will be stable and won’t need much update. These parts have to be done only one time and then distributed as an open source project.
In addition, the extraction scripts needed for Part 1 can be easy shared and reused. Indeed, once a user has coded a crawler for some service that does not have a public API, then this code can be made publicly available to the Internet so others can save time. In in the end what is needed is a crawler repository or some sort of wiki on which crawlers can be posted.
Not re-inventing the wheel
Commercial solutions already exist. The best I have found so far is Geckoboard. The cheapest plan at the moment I am writing this is $9 per month, which to save millions of seconds is not that much. The advantage is that they already offer crawlers for services that have public APIs. For all the others, well, you have to code your own scripts. I am still holding it, as I think $9 is a bit expensive for personal use (even though that’s an acceptable price for business use). But honestly, I’d rather pay nice bucks than spend a dozens of hours recoding some boring framework.
I also found some discussion about some open source dashboard software on Quora. So far, nothing seems to compete even close to Geckoboard.
I also took a sneak peak at Piwik (open source alternative to Google Analytics), so see if there were any way to plug-in some custom data. It seems there are ways, but it’s not that convenient, as you have to put the data into some specific MySQL table format. I found a blog post that talks about it.
So here is the deal: if the websites and services you want to monitor are presented on Geckoboard as already having a crawler, then go for Geckboard. If not, then you are better off coding your own thing, since anyway you will have to code the crawlers yourself. Putting together a little dashboard with Javascript and Highcharts shouldn’t take much time.
Why there will never be startups offering this service for the mass
Social network aggregators have been around for a while. But offering a service that would centralize any other website one may need is something else, and would be really hard. Indeed, how many people you know can code a crawler? Not that many right? So when some regular user will need to monitor some data on whatever website, he will be completely unable to do so. In addition, having an online app doing all this means that one will have to store the private access keys and possibly passwords of thousands of people. It won’t take long before some guy gets in and steals everybody’s passwords. For these two reasons only (and I could find more), there is no market for such software. The best that can be done is what Geckboards and clones did, which is selling the service with a B2B model, and hope that enough businesses will adopt it.
Great article! Just wanted to let you know about TrailerBoard.com – we thought dashboards we’re awesome but too expensive and the custom data logging wasn’t powerful enough – so we built TrailerBoard, we’re targeting a $5/month price range and we’re launching right now through kickstarter – check us out: http://kck.st/KwroKA and let me know what you think! Thanks!
[…] master the ins and outs of poker and it’s really method. There are a lot of publications and websites which do a great job teaching the basic principles ans well whilst the more advanced tactics with […]
[…] individuals since they could duplicate the template of legit gambling site to construct their websites look just like the genuine one. Check the web site that you’re on if it is in fact that which […]