Decoding the Architecture of the Internet


The size of internet is hard to fathom. The incomprehensible amounts of structured and unstructured data generated online (120 trillion gigabytes every year) make it the single most important source of intelligence in corporate investigations today. This vastness creates a fundamental challenge in gathering open-source intelligence (OSINT): knowing where and how to find the data points that will unlock the next stage of an investigation.

Very rarely is intelligence found from a single source. Rather, it is the result of piecing together information found in court records, social media, offshore records, and dozens of other sources.

In addition to intelligence from content, we have also had remarkable breakthroughs by analysing the back-end architecture of web pages, web protocols, and servers.

Here are a couple of examples:

  1. Website developer files
  • TXT files instruct search engines on which pages to crawl and which to avoid. They can be useful in directing users to pages the sites wants to keep hidden, and so we often analyse these files when conducting investigations.
  • We also explore website sitemaps. Though these may seem innocuous, sitemaps may contain sensitive information about the site’s creation. Accessing the sitemap could also yield behind-the-scenes caches of data that would not otherwise be accessible.
  1. Web protocols
  • One of the best-known sources of intelligence from the internet’s “back office”, WHOIS databases often contain extensive details about a domain and its creators. We draw on these databases to find the origins of sites, making them crucial data points to build upon during an investigation. However, data protection may mean that sensitive information is hidden from investigators.
  • If this is the case, we find alternatives to WHOIS, exploring archived database records, domain transferrals, and hidden caches where data may only be public for short period of time. Finding these alternatives means that even where a website’s creator has deliberately tried to stay anonymous, we can still uncover actionable leads.
  • A final power resource are SSL certificates. These can be used to uncover links between seemingly unrelated websites and contain reams of information that may lead to previously undiscovered internet sources.
  1. Mail Servers
  • Finally, mail server information may be used to discover obfuscated connections between entities. In fact, it provided a breakthrough piece of evidence in Neon’s investigation into Wirecard, supporting our client’s suspicions of inflated revenue months before the tech unicorn was exposed as the biggest fraud in German history. We were able to link the Wirecard’s CEO to one of its UAE-based ‘clients’, by showing that both entities’ mail services were hosted on the same network and likely paid for by the same account.

Ultimately, beyond the abundance of content on the internet, its “behind the scenes” workings are a rich source of intelligence and one that, as we’ve shown, can yield information that is worth tens of millions of dollars to our clients.

Today, the value of the internet in modern corporate investigations is increasingly widely recognised. Yet only by combining form and content can we claim to be fully leveraging the insights of open-source intelligence.

Article by Andrew Knight.

Related News

error: Content is protected !!