Recorded Future, Data Mining, Intelligence Agencies

by **Wombaticus Rex** » Sun Aug 28, 2011 9:35 am

GOOD ANTIDOTE TO PARANOIA....

source: http://www.antipope.org/charlie/blog-st ... ction.html

Beyond Prediction

By Karl Schroeder

I've just spent two years working toward a Master's degree in Strategic Foresight and Innovation.

Because most people look at me blankly when I tell them this, I've developed two ways to describe what what I'm doing, and foresight is. The first is to say that foresight used to be called futurism, but that futurism has increasingly become associated with the idea of predicting the future. Foresight is not about predicting the future, it's about minimizing surprise. The second way I usually put it is that foresight is not about predicting the future; it's about designing the future.

Actually, I'll say it's just about anything, as long as it's understood that foresight is not about predicting the future.

The reason is that, frankly, I'm pretty tired of all those, "Dude, where's my flying car!" digs. There's always been a certain brand of futurist who's obsessed with getting it right: with racking up successful predictions like some modern-day Nostradamus. I'm sure you know who I'm talking about; some futurists play the prediction game very well, but in the end it is a game, and closer to charlatanism than it is to science. There's actually no method for seeing the future, and nobody's predictions are more reliable than anybody else's.

If actual prediction were possible, the insurance companies would be all over it. They don't try to predict how you're going to die, though, do they? They look at trends and probabilities, and try to minimize surprise for their investments. That's exactly how strategic foresight works--as a kind of institutional insurance policy against disruptive surprise. There's a whole raft of methodologies for this, ranging from Delphi polls to trends analysis and scenarios. For me, this way of looking at the future is complementary to my other way of looking, which is the more fun and disreputable wild-eyed prophet--that is, as a science fiction writer.

There are no limits on me when I write SF. In contrast, doing foresight is a disciplined activity. I like this combination; I'm finding that each way of looking forward influences and improves the other--as long as I don't get the two confused.

I'm still coming to grips with how these two years will affect my writing. One result of undertaken the programme is that I've developed a different attitude toward writing near-future SF. Most writers I know avoid at all costs writing about the near future, because nothing goes out of date quicker than next year. I've always tended to agree with this assessment and--because SF writers aren't in the job of predicting the future either--have tended to set my novels and short stories very, very far in the future. Thousands of years, usually.

I'm no longer satisfied with doing that. There's the little matter of my second way of describing what foresight is: not as prediction, but design. If you're afraid of being a poor predictor of the near future, you'll avoid writing about it. But what if you were never out to predict in the first place? What if you don't care if a story you set in 2012 gets immediately overtaken by events? What if you set the action there not to predict some event or outcome, but to encourage some action on the part of your readers?

In other words I have a new ambition for my own SF: not as prediction, and not cautionary, either--but aspirational.

The fact is that if I've learned one thing in two years of studying how we think about the future, it's that the one thing that's sorely lacking in the public imagination is positive ideas about where we should be going. We seem to do everything about our future except try to design it. It's a funny thing: nobody ever questions your credentials if you predict doom and destruction. But provide a rosy picture of the future, and people demand that you justify yourself. Increasingly, though, I believe that while warning people of dire possibilities is responsible, providing them with something to aspire to is even more important. The foresight programme has given me a lot of tools to do that in a justifiable way, so I might as well use them.

Now all I have to do is put my money where my mouth is. By, say, writing an optimistic, aspirational novel set in the near future and unflinchingly accurate to the possibilities, both positive and negative, of the next few years?

Yeah, okay. --At least, I'm going to try.

by **Searcher08** » Sun Aug 28, 2011 11:56 am

82_28 wrote:
Wombaticus Rex wrote:^^Actually, I am continually amazed how few data points exist online. The fact that Google Books exists in a wholly castrated state bothers me a lot -- opening those floodgates would radically increase the amount of actual information on the internet. There is a vast gulf separating information from mere content, which the internet has in kaleidoscopic abundance.

It would also be a good step to force all the academic archives currently behind paywalls onto the public net ASAP.

Very well said and me too! New information is bunk. It's only those who have taken the time to archive the real info, the stuff that came before the "information age" and have it be free of charge that is worth jack shit. And this is because you use their own self searching capabilities and know how when it comes to drilling down. Google is quickly becoming more and more of a joke to me. I still avidly use it, but google sucks anymore.

The data points that do exist are mundane as hell. More on this later. Gotta run to work. . .

There is also 'cognitive immune system' effect.

I think that we could get a Saudi - Mossad joint WTC demolition team come forward with them filming setting explosives "Hey Ahmed, look at you with your arms out like a jet" "Ah Benny, those were the days, eh? Then off to the strip clubs in the evening!"
"Look here, I am setting nanothermate on the elevator, then you will set us documenting the event -fnarr fnarr!" cutting to Wolfie B,
anchored in his yacht off Tuvalu with a 20 tonne smack cargo
"Me and Atta went waay back - shout out to MadCow and Peter Dale Scott for the Chechen drug link!. Please excuse me, I have a runway to finish and building on sand aint the easiest!"
From a bunker somewhere in the Middle East - we have the pTech crew
"Yo! Indira Singh! We DID do da wikkid false radar injects in the basement of the FAA! Respect!"
From the Whitehouse "Dick Cheney - Awww you guys - after my NDE's, I thought lifes too short. Tell the truth - so, I started it - I kicked it off - went a bit out of control though - those naughty Saudi Mossad guys and their demolition shenanigans!"
From the Tomb at Yale George and Stevie "Hey do you like our music selection? Goat's Head Soup! Only kidding! Shout out to all the egregore appreciators in da house!"

etc etc -
It would then cut to Barack Obama

"Mistakes have been made
in the past
but lessons
will be learned
I have asked Prosecuter
Patrick Fitzgerald
to see
if there was any
criminal wrongdoing
if there was - we will take action
if there wasnt -
we will hunt them down,
corner them
and kill them (*)
(*) sorry I misspoke
we will move on and grow together
as the American people."

Next back to the MAIN HEADLINES
Kim Kardassian says she is in love with J-LO
J-LO has admiited she feels the same way.

by **Wombaticus Rex** » Tue Apr 03, 2012 6:46 pm

Solid introduction to Data Mining, via today's Atlantic:
http://www.theatlantic.com/technology/a ... sk/255388/

Everything You Wanted to Know About Data Mining but Were Afraid to Ask

Big data is everywhere we look these days. Businesses are falling all over themselves to hire 'data scientists,' privacy advocates are concerned about personal data and control, and technologists and entrepreneurs scramble to find new ways to collect, control and monetize data. We know that data is powerful and valuable. But how?

This article is an attempt to explain how data mining works and why you should care about it. Because when we think about how our data is being used, it is crucial to understand the power of this practice. Without data mining, when you give someone access to information about you, all they know is what you have told them. With data mining, they know what you have told them and can guess a great deal more. Put another way, data mining allows companies and governments to use the information you provide to reveal more than you think.

To most of us data mining goes something like this: tons of data is collected, then quant wizards work their arcane magic, and then they know all of this amazing stuff. But, how? And what types of things can they know? Here is the truth: despite the fact that the specific technical functioning of data mining algorithms is quite complex -- they are a black box unless you are a professional statistician or computer scientist -- the uses and capabilities of these approaches are, in fact, quite comprehensible and intuitive.

For the most part, data mining tells us about very large and complex data sets, the kinds of information that would be readily apparent about small and simple things. For example, it can tell us that "one of these things is not like the other" a la Sesame Street or it can show us categories and then sort things into pre-determined categories. But what's simple with 5 datapoints is not so simple with 5 billion datapoints.

And these days, there's always more data. We gather far more of it then we can digest. Nearly every transaction or interaction leaves a data signature that someone somewhere is capturing and storing. This is, of course, true on the internet; but, ubiquitous computing and digitization has made it increasingly true about our lives away from our computers (do we still have those?). The sheer scale of this data has far exceeded human sense-making capabilities. At these scales patterns are often too subtle and relationships too complex or multi-dimensional to observe by simply looking at the data. Data mining is a means of automating part this process to detect interpretable patterns; it helps us see the forest without getting lost in the trees.

Discovering information from data takes two major forms: description and prediction. At the scale we are talking about, it is hard to know what the data shows. Data mining is used to simplify and summarize the data in a manner that we can understand, and then allow us to infer things about specific cases based on the patterns we have observed. Of course, specific applications of data mining methods are limited by the data and computing power available, and are tailored for specific needs and goals. However, there are several main types of pattern detection that are commonly used. These general forms illustrate what data mining can do.

Anomaly detection: in a large data set it is possible to get a picture of what the data tends to look like in a typical case. Statistics can be used to determine if something is notably different from this pattern. For instance, the IRS could model typical tax returns and use anomaly detection to identify specific returns that differ from this for review and audit.

Association learning: This is the type of data mining that drives the Amazon recommendation system. For instance, this might reveal that customers who bought a cocktail shaker and a cocktail recipe book also often buy martini glasses. These types of findings are often used for targeting coupons/deals or advertising. Similarly, this form of data mining (albeit a quite complex version) is behind Netflix movie recommendations.

Cluster detection: one type of pattern recognition that is particularly useful is recognizing distinct clusters or sub-categories within the data. Without data mining, an analyst would have to look at the data and decide on a set of categories which they believe captures the relevant distinctions between apparent groups in the data. This would risk missing important categories. With data mining it is possible to let the data itself determine the groups. This is one of the black-box type of algorithms that are hard to understand. But in a simple example - again with purchasing behavior - we can imagine that the purchasing habits of different hobbyists would look quite different from each other: gardeners, fishermen and model airplane enthusiasts would all be quite distinct. Machine learning algorithms can detect all of the different subgroups within a dataset that differ significantly from each other.

Classification: If an existing structure is already known, data mining can be used to classify new cases into these pre-determined categories. Learning from a large set of pre-classified examples, algorithms can detect persistent systemic differences between items in each group and apply these rules to new classification problems. Spam filters are a great example of this - large sets of emails that have been identified as spam have enabled filters to notice differences in word usage between legitimate and spam messages, and classify incoming messages according to these rules with a high degree of accuracy.

Regression: Data mining can be used to construct predictive models based on many variables. Facebook, for example, might be interested in predicting future engagement for a user based on past behavior. Factors like the amount of personal information shared, number of photos tagged, friend requests initiated or accepted, comments, likes etc. could all be included in such a model. Over time, this model could be honed to include or weight things differently as Facebook compares how the predictions differ from observed behavior. Ultimately these findings could be used to guide design in order to encourage more of the behaviors that seem to lead to increased engagement over time.

The patterns detected and structures revealed by the descriptive data mining are then often applied to predict other aspects of the data. Amazon offers a useful example of how descriptive findings are used for prediction. The (hypothetical) association between cocktail shaker and martini glass purchases, for instance, could be used, along with many other similar associations, as part of a model predicting the likelihood that a particular user will make a particular purchase. This model could match all such associations with a user's purchasing history, and predict which products they are most likely to purchase. Amazon can then serve ads based on what that user is most likely to buy.

Data mining, in this way, can grant immense inferential power. If an algorithm can correctly classify a case into known category based on limited data, it is possible to estimate a wide-range of other information about that case based on the properties of all the other cases in that category. This may sound dry, but it is how most successful Internet companies make their money and from where they draw their power.

by **Wombaticus Rex** » Wed Apr 04, 2012 5:38 pm

HUGE info-dump goldmine going up on public intelligence this week.

Meet Catalyst: IARPA's Entity and Relationship Extraction Program

The Office of the Director of National Intelligence (ODNI) is building a computer system capable of automatically analyzing the massive quantities of data gathered across the entire intelligence community and extracting information on specific entities and their relationships to one another. The system which is called Catalyst is part of a larger effort by ODNI to create software and computer systems capable of knowledge management, entity extraction and semantic integration, enabling greater analysis and understanding of complex, multi-source intelligence throughout the government.

The intelligence community has been working for years to develop software and analytical frameworks capable of large-scale data analysis and extraction. Technological advances have now made it possible for spy agencies to not just capture the incredible amount of data flowing through public and private networks around the world, but to parse, contextualize and understand the intelligence that is being gathered. Automated software programs are now capable of integrating data into semantic systems, providing context and meaning to names, dates, photographs and practically any kind of data you can imagine.

Many agencies within the intelligence community have already created systems to do this sort of semantic integration. The Office of Naval Intelligence uses a system called AETHER “to correlate seemingly disparate entities and relationships, to identify networks of interest, and to detect patterns.” The NSA runs a program called APSTARS that provides “semantic integration of data from multiple sources in support of intelligence processing.” The CIA has a program called Quantum Leap that is designed to “find non-obvious linkages, new connections, and new information” from within a dataset. Several similar programs were even initiated by ODNI including BLACKBOOK and the Large Scale Internet Exploitation Project (LSIE).

Catalyst is an attempt to create a unified system capable of automatically extracting complex information on entities as well as the relationships between them while contextualizing this information within semantic systems. According to its specifications, Catalyst will be capable of creating detailed histories of people, places and things while mapping the interrelations that detail those entities’ interactions with the world around them. A study conducted by IARPA states that Catalyst is designed to incorporate data from across the entire intelligence community, creating a centralized repository of available information gathered from all agencies:

Many IC organizations have recognized this problem and have programs to extract information from the resources, store it in an appropriate form, integrate the information on each person, organization, place, event, etc. in one data structure, and provide query and analysis tools that run over this data. Whereas this is a significant step forward for an organization, no organization is looking at integration across the entire IC. The DNI has the charter to integrate information from all organizations across the IC; this is what Catalyst is designed to do with entity data. The promise of Catalyst is to provide, within the security constraints on the data, access to “all that is known” within the IC on a person, organization, place, event, or other entity. Not what the CIA knows, then what DIA knows, and then what NSA knows, etc., and put the burden on the analyst to pull it all together, but have Catalyst pull it all together so that analysts can see what CIA, DIA, NSA, etc. all know at once. The value to the intelligence mission, should Catalyst succeed, is nothing less than a significant improvement in the analysis capability of the entire IC, to the benefit of the national security of the US.

To fully grasp the capabilities of such a system, it is important to understand the concepts of “semantic integration” and “entity extraction” that Catalyst will perform. Using an example described in the IARPA study, we will follow data through the stages of processing in a Catalyst system:

For example, some free text may include “… Joe Smith is a 6’11″ basketball player who plays for the Los Angeles Lakers…” from which the string “Joe Smith ” may be delineated as an entity of class Athlete (a subclass of People) having property Name with value JoeSmith and Height with value 6’11″ (more on this example below). Note that it is important to distinguish between an entity and the name of the entity, for an entity can have multiple names (JoeSmith, JosephSmith, JosephQSmith, etc.).

Once entities and their associated relationship values are determined, the information is then integrated into a knowledge base to produce a semantic graph:

To continue the example, one entry in the knowledge base is the entity of class Athlete with (datatype property) Name having value JoeSmith, another is the entity of class SportsFranchise with Name having value Lakers, and another is an entity of class City having value LosAngeles. If each of these is viewed as a node in a graph, then an edge connecting the node (entity) with Name JoeSmith to the node with Name Lakers is named MemberOf and the edge connecting the node with Name Lakers to the node with Name LosAngeles is named LocatedIn. Such edges, corresponding to relationships (object properties) and have a direction; for example, JoeSmith is a MemberOf the Lakers, but the Lakers are not a MemberOf JoeSmith (there may be an inverse relationship, such as HasMember, that is between the Lakers and JoeSmith.).

Data that has been extracted and integrated can then produce patterns that determine unknown relations between an entity and other entities that may be of concern to a particular intelligence agency:

Another simple pattern could be: JoeSmith Owns Automobile, or Person Owns an instance of the class Automobile with Manufacturer Lexus and LicensePlate VA-123456 or even JoeSmith has-unknown-relationship-with an instance of the class Automobile with Manufacturer Lexus and LicensePlate VA-123456. In these last three examples, one of the entities or the relationship is uninstantiated. Note that JoeSmith Owns an instance of the class Automobile with Manufacturer Lexus and LicensePlate VA-123456 is not a pattern, for it has no uninstantiated entities or relationships. A more complex pattern could be: Person Owns Automobile ParticipatedIn Crime HasUnknownRelationshipWith Organization HasAffiliationWith TerroristOrganization. Any one or more of the entities and the has-unknown-relationship-with relationship (but not all) can be instantiated and it would still be a pattern, such as JoeSmith Owns Automobile ParticipatedIn Crime PerpetratedBy Organization HasAffiliationWith HAMAS.

While this example only provides a limited view of Catalyst functionality, it nonetheless helps to demonstrate the potential capabilities of the system. Far more detailed explanations of the system, as well as a useful overview of similar government systems across the intelligence community, are provided in IARPA’s one-hundred and twenty-two page study.

http://publicintelligence.net/ufouo-iar ... al-report/

by **Hugh Manatee Wins** » Wed Apr 04, 2012 10:37 pm

The Atlantic is a CIA-gatekeeper of the intellectual Left.
So their datamining primer is here for the same reason as the CIA-Washington Post's 'Top Secret America' series, to maintain credibility and audience.

"Minimizing surprise." Good definition of the psyops used as psychological shock-absorber and forestalling counterpropaganda.

by **Wombaticus Rex** » Wed Apr 04, 2012 11:04 pm

Moving this to the Data Dump: Data Mining & Intelligence Agencies

by **Hugh Manatee Wins** » Wed Apr 04, 2012 11:13 pm

Good stuff to know, though.

Just don't credit 'The Atlantic' as 'on our side.'

I learned lots about datamining from reverse-engineering the movie, 'K-Pax.'
http://upload.wikimedia.org/wikipedia/e ... x-Kpax.jpg

by **Wombaticus Rex** » Wed Apr 04, 2012 11:28 pm

I'd normally be offended that you're attributing things to my beautiful mouth I never said, anywhere, ever, but I'm more fascinated by your closer.

What the hell did you learn about data mining from K-Pax? Seriously.

by **Hugh Manatee Wins** » Thu Apr 05, 2012 1:54 am

Wombaticus Rex wrote:I
.....
What the hell did you learn about data mining from K-Pax? Seriously.

P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In Technical Report SRI-CSL-98-04. CS Laboratory, SRI International, 1998.

Data mining terms reveal social control strategies.
In 1999 two cover-ups of intelligence used for social control intersected in 'K-Pax.'

1) Dr. Colin Ross' book on the CIA's intentional creation of dissociative identity evoked decoys.
2) Data mining surveillance algorithms on the now-mainstream internet evoked decoys.

#1 was obvious in 'K-Pax." The mental patients all joyously yelling "blue bird!" The mystery patient hiding from a major trauma. Cliche.
#2 was a learner. And was recently expanded on by NSA whistleblower Thomas Drake who described a surveillance plan rejected that retained citizen privacy in favor of a surveillance plan that didn't.

The NSA experienced the same schizm over domestic operations that the CIA did when Richard Ober ran CIA's Operation CHAOS against Vietnam War dissenters.

Recorded Future, Data Mining, Intelligence Agencies

Re: Fascinating White Paper from Google/CIA Project

Re: Fascinating White Paper from Google/CIA Project

Re: Fascinating White Paper from Google/CIA Project

Re: Fascinating White Paper from Google/CIA Project

Re: Fascinating White Paper from Google/CIA Project

Re: Recorded Future, Data Mining, Intelligence Agencies

Re: Recorded Future, Data Mining, Intelligence Agencies

Re: Recorded Future, Data Mining, Intelligence Agencies

Re: Recorded Future, Data Mining, Intelligence Agencies

Who is online