With the explosion of social networking, the vast array of publically available information available for data mining is increasing exponentially, but the real challenge for governments and businesses is what to do with the intelligence it provides.
“Open source intelligence” (OSINT) is insight derived from collating and analysing unclassified, publically available information. It is the process of analysis to identify patterns and hot spots that transforms mere information into intelligence.
OSINT has long been used by governments and businesses as a method to build intelligence on their industry, target market and competitors.
However, in today’s online world the detail and sheer amount of data available from public information sources has exploded and the internet has made it much more accessible. Such sources include media, web-based communities, social networking sites, wikis and blogs, government and official reports, professional and industry associations, and academic papers. It involves massive amounts of data, constantly being published. This provides both an opportunity and a challenge.
Patterns Transform Information into Intelligence
Such a plethora of information offers great opportunity for data mining to gain intelligence. But as I noted above it is not the gathering of lots of information that provides insight – it is separating the “good” data from the noise, identifying and prioritising that data based on relevance, and then analysing what remains to determine what it really means before the data ages to the point that it is no longer relevant. The good news is that much of this analysis can be automated using advanced data analytics and heuristics tools such as neural-network technology. These are advanced computing techniques based on our growing understanding of how brains store information as patterns and use those patterns to solve problems. It involves mimicking how an analyst would work through the data manually by creating and continuously evolving business rules to automate the analytical process.
Neural-network technology is already being used today for fraud prevention and customer behaviour prediction. For example fraud prevention software can help identify patterns of suspicious behaviour to calculate a claim’s degree of risk. Similarly, pattern analytics are being incorporated into smart video surveillance solutions to identify and act on threats quickly.
It is no surprise that the same approach is being used in the intelligence community leading to the development of dedicated OSINT agencies – many of which grew out of previous traditional media monitoring services.
In the U.S., the Director of National Intelligence’s Open Source Center is an agency based within the Central Intelligence Agency (CIA) to provide analysis of open sources for U.S. intelligence. The U.S. Secure Border Initiative (SBINet) used open source intelligence to provide actionable indicators of changes in illegal migration and smuggling including analytical tools to detect patterns and predict illegal activity.
In Australia the National Open Source Intelligence Centre was established in 2001 to provide state and federal agencies with a dedicated open source issue monitoring, research and analytical support capability.
Of course automated technology does not remove the need for the human analyst – but a human would never be able to wade through the massive amount of data publically available – let alone do it quickly enough, and share it with the right people, for it be acted upon in real time. Rather it is a tool to enable open source intelligence gathering and analysis to be effective.
Understanding the Context
In addition to sheer volume, open source information is also characterised by the potential of erroneous or misleading information. The ability to automatically recognise patterns of interest in open source information is insufficient if the source information itself is untrue. As such, it is critical that the context be taken into account when determining what weight to give a particular piece of open source information.
For example, is the source of the information considered to be a highly reliable or typically unreliable source? Does the source have “an agenda”? What was the timeframe and location in which the information was published and what other events or activities might bias the source? Are there corroborating alternative sources, and if so how reliable are they?
Although open source can provide a wealth of information not easily obtained through targeted intelligence collection, it can also be used as a tool for counter intelligence through disinformation and distraction. As such, open source information needs to be treated with a healthy dose of scepticism.
While managing the sheer volume of publically available data is a huge challenge in itself, one of the more critical issues arising, particularly in government circles, is the classification of the intelligence gained by aggregating and analysing that data. With analysis its value and potential sensitivity increases.
This raises questions about what the threshold is to go from “open” to “secret”? Who makes the decision about how to classify data obtained from open sources? And with whom can that data be shared? Consider the advantages of being able to share intelligence across departments, between the public and private sector, and across international borders. Then look at the risks (think Wikileaks)!
Much of this comes back to the need to find balance between protecting sensitive data while being able to securely share that data between trusted parties. Sensitive data in the wrong hands can pose real danger. But valuable intelligence that is unavailable to decision makers is worthless.
Intel may need to be secured to protect the source. It may also need to be secured if could reveal sensitive information to adversaries. Classified data is information that a government claims is sensitive information – for example, in the U.S. classified data is defined as data the release of which would damage the national security of the U.S.
Post 9/11 there was a public call for greater cooperation between governments globally to share information against common threats. Many agencies publically promoted how they were using open source intelligence to better identify potential threats. However, in the last few years there has been debate about to what degree data should be shared – particularly if doing so would remove a competitive edge or “decision advantage”.
There is also some public backlash fuelled by fears of privacy being breached by governments that create digital dossiers on their citizens. Others query that if the government is funding OSINT using taxpayer funds then the results should be made available to its citizens. Perhaps this comes down to what the intelligence is being used for: national security vs where to focus investment on new schools.
How to Share Data Securely
Once it is determined who should have access to the intelligence, the issue comes down to how to share data securely. Traditionally restricting access to data has been achieved by restricting access to the network. This means that access to data on the network is restricted based on the highest level of classification found on that network (i.e. the high water mark). This inevitably led the proliferation of multiple networks to service different classification levels which required replication of data across those networks that needed to be maintained and updated. However, in today’s reality of constrained IT budgets, remote workers, cloud computing and mobility, the focus needs to shift to more efficient and effective means of securing the data based on the person trying to access it and the circumstance in which access is being requested.
Not all data is the same – the level of sensitivity varies. Different levels of classification of data can be grouped together by the level of confidence needed to ensure that it is secure. Then apply security measures based on this classification of data.
You must assume that even the internal network is a hostile environment. Rather than rely solely on controlling access to the data, look at securing the data itself via encryption. That way even if the wrong people gain access to where the data resides, they still can’t read the data. For example, Unisys Stealth, which was built for the U.S. Department of Defense to secure sensitive information, uses certified encryption, then bit-splits data into multiple slices or shares as it moves through the network or is written to local or remote storage.
Stealth also provides a means of controlling access to data by setting up communities of interest based on “need-to-know”. Stealth allows multiple communities of interest to share the same IT infrastructure without fear of another group accessing their data. That way only those roles that need to have access to data can do so.
In addition, attribute-based access control is an emerging technology that grants access based not only on the nature of the data and the individual requesting access, but also on the location from which access is being requested, the method used to authenticate your identity (for example using a password offers a lower level of identity authentication than using a biometric fingerprint and so may restrict access to sensitive information) and whether there is anything about the access request that is outside your normal pattern such as access to information you don’t normally access or at hours outside your normal work schedule.
These are all ways for access to be based on the data and the person trying to access it, rather than the ability to get into the network.
So open source intelligence is not a new concept, but with the explosion of online information and social media it has become a much bigger job than it used to be. Neural-network and related technologies are playing a key role in identifying patterns in both commercial and national security contexts that turn mountains of information into nuggets of valuable intelligence. The key is being able to securely share that intel with the right people.