Email Correspondence with Chris Dixon, CEO Site Advisor

A few weeks ago Chris Dixon, CEO of Siteadvisor.com, wrote an email reply to my non-user review of SiteAdvisor. His reply and my subsequent replies are printed with his permission.

Hi Mat,

Thanks for taking time to write such a thoughtful review.

Let me try to address each of the main points you raise:

#1 - "low area web coverage". Note that (as of today) we have analyzed 1.5M _sites_, not _pages_. When sites like Google say they have crawled "5 billion pages," they are using a very different metric. A site like Wikipedia, for example, is (according to our nomenclature) 1 site but has about 144 MILLION pages (you can see this by typing "site:wikipedia.org" into Google). In fact, there are far fewer than 5 billion sites in the world. See http://www.whois.sc/internet-statistics/, and even among those registered domains, the vast majority of sites are (sadly) squatter sites and the like that get very little traffic. The reality is that we have tens of thousands of users right now and we know that, as of today, we have analyzed 97% of sites they request data for (note that we track _which_ sites users ask for data on, but do not track or store _who_ asks for that data - see our very explicit privacy policy about this). I could give you lots more arguments as to how our coverage is quite good, but the best way for you to see this for yourself is simply to try out our product. I think you'll be quite surprised by how much of the web we cover.

My Reply: I have been really trying to challenge you on this issue the last couple of days, and I cant:) Its very hard to find sites that are untested, especially in Google (Yahoo seems to have more). You claim of 97% seems to really match with the experience and testing I have been running through. It's fantastic:)


#2 - data freshness - actually, we have the capacity right now to analyze millions of sites quite frequently (multiple times per month). We have algorithms that adjust the frequency we analyze a site based on its popularity, threat level, frequency of updates, etc. We think this is quite sufficient to cover most cases when sites change their practices. Also, I should point out that in many cases we don't necessarily want to re-analyze sites TOO often. If a site had spyware yesterday and today didn't, would you trust that site today? We think that a web site's reputation should, in some sense, be "sticky."

My Reply: I'm not convinced on this, but who can know how much is sufficient; the task you are undertaking is so gargantuan. Also I imagine that aspects of SiteAdvisor will emerge (I allude to these below) that are not so specifically targeted at web based malware but more in terms of capturing the "ethos" of a website. And for these aspects (site birthdate, country etc) data freshness is not a priority.



#3 - "the flacidness of protection" I suppose if you think that the problems of spyware, viruses, spam etc have already been 100% solved by existing software, you won't find much use for SiteAdvisor. I've personally been infected by spyware that none of the popular spyware removers could remove. Most experts seem to think spyware removers are not nearly 100% effective today. We think the best way to avoid many of these problems is simply to prevent them. There is a long tradition in computer security of having multiple layers of defense. We definitely recommend that users have anti-virus, spyware removers etc, but think we can add an additional, valuable layer of security. Also note that we address issues like online scams (e.g. see our most recent blog entry on freedownloadhq.com at blog.siteadvisor.com) that no existing security products (that I know of) even attempt to protect against. We will also soon be rolling out exploit protection which will add an additional layer of protection against problems like the recent WMF exploit before patches are released.

My Reply: In essence my criticism is that the threat of web-based malware is significantly less significant than from P2P, Email or Network based threats. And I think going back 2 years the evidence really supports this. (Going back 7 years - happyhippo.com etc and yes, its different) The WMF exploit turned out to be a zero threat (Is there any conclusive evidence of a payload being delivered using it?) but I agree, if it could have been exploited then SiteAdvisor would have been the most effective layer of protection until patching.


#4 - "the problem of faith." Our goal is simply to build such a good database that users' faith in us will be justified. That said, as with any security product, there are all sorts of ways the bad guys might try to beat us. We have already addressed many of these ways and plan to address many more over time. Of course, whether we succeed or not remains to be seen, but we believe we've got a very good set of plans for this.

Thanks again for your comments. I wasn't sure from your review if you had tried the product yet. If not, I'd encourage you to, and, when you do, feel free to send/post additional comments. (If you don't like SiteAdvisor, I assure you it uninstalls quite easily).

My Reply: Agreed. As said, if SiteAdvisor is successful then there will/may be people who get scammed because of it. But the majority will gain a positively proportional amount of security and safety.

I have also been thinking about some things to do with SiteAdvisor that are loosely considered points and queries. I have put these at the bottom of the email if you are interested:)




Points and Queries: The SiteAdvisor Netmap

I think the real asset you have isn't the increased security layer you can offer but rather the fact you are mapping the web in a systematic, non-commercial way and the data is available for inspection and use under cc. That's the gem. The fact that you or others can use your netmap for all manner of useful, fun or funky reasons:) Nobody else is doing that.

  • Imagine if stumbleupon.com augmented your plugin so that when you do a Google any highly "stumbledupon" sites can be flagged in the list so you can choose to visit them - without having to go through the actual activity of "stumbling".
  • You do Yahoo, Msn, Google searches (maybe some others?), I would love a plugin that, in say a google search, showed the corresponding rankings for sites in Yahoo or MSN.
  • Linking with Whois data. People in the web industry would love to know when sites expire. Actually I guess you must link with whois data already.
  • Linking with contacts data. Imagine if you spideredout various email or phone number detail from sites and listed them as clickable in the popup. I want to contact Microsoft customer support, "onmouseover" I have the number there in my window. I don't need to navigate the site, its been done for me.
  • The site advisor ratings are great, but you need to be enter them on site from the browser
  • A plugin for email. If I could have the SiteAdvisor pop up its icon on my webmail messages or Outlook this could prove to be the best anti spam/phising solution possible.



Open Search Engine

I always think its good if you can imagine how an application can change the world – especially one of the scope of SiteAdvisor. SiteAdvisor's netmap could be used for something that until this week I thought was not possible.

This is an open source distributed search engine that can compete with Google, Yahoo et al. You have the web spidered, that's the really hard part, in many ways, of running a search engine. Google started in dark space, because of the SiteAdvisor data the web is illuminated for analysis. Isn't that fantastic? Is this something you have considered?