#SocialNetworkingAnalysis | Explore Tumblr posts and blogs

projectthinker · 12 years ago

Text

Using Social Networking Analysis To Measure Influence

Social Networking Analysis (SNA) employs analytical methods, techniques and models to analyze, understand and predict online social networking patterns, structure and dynamics through mining the wealth of data being constantly shared across the web. SNA also leverages sociology, psychology and statistics concepts to build specific network models for various types of social networks. Even though human relationships are governed by random interactions and motivations, most human networks both online and offline follow very similar patterns.

Network Models

One of the most important approaches to understanding social networks is to determine which mathematical models best describe typical social network structures. It turns out that the Normal or Gaussian distribution model (Figure 1), which is good for modeling many real-world quantities, is not an adequate model for degree distributions. In the study of networks, the degree is the number of connections between nodes/vertices and the degree distribution is the probability distribution of degrees over the entire network.

As such, the Power Law distribution is a much better model for representing degree distribution networks. Many social networks exhibit such distribution represented by the Power Law model. Understanding this phenomenon is important because the Power Law model has a different probability distribution than the Normal distribution.

Centrality

Often, regardless of the industry or organization performing social networking analysis, it is important to understand which models govern their specific target network. It is also critical to understand the smaller, local relationships between the actors (nodes). For example, for intelligence analysis purposes, it is critical to identify how information flows through the network and which nodes are the most active in collecting or sharing information. As such, a centrality of a network describes how important/influential a node is to a network.

Highly central networks operate similar to highly centralized governments such as theocracies or monarchies while least centralized networks mimic democratic system of governments. The centralization of a network is approximately an average of the maximum centrality of a single node over the entire network and can be calculated by Freeman’s general formula.

For practical purposes, it is not always required to calculate this number to be able to realize the centrality of a network. For example, comparing today’s terrorist groups to traditional ones it can be observed without going through the calculations that today’s groups are much less centralized and hence harder to target. Low centralized networks, though sometimes not as effective in terms of governance and implementation of an overall strategy, are much more resilient (‘anti-fragile’) to shocks. For example, it’s much easier to contain a virus in a highly centralized network than it is in a low centralized network.

The other concepts of centrality: ‘Closeness’ and ‘Betweenness’ attempt to measure the minimum number of nodes information or a meme would have to travel to get from one node to another. A very close network with many well-connected nodes (‘Betweenness’) would be much better and faster in communicating certain information, virus, knowledge, tradition, and meme across its entire network. A network with a very low ‘Closeness’ would hence be less effective and efficient in doing the same.

Influence

One of the most important outcomes of SNA is determining influencers across a network, as well as their level of influence. There are various ways to locate influencers such as number of followers, friends or connections as well as level of activity on social media. However more models are needed to better locate influencers.

PageRank One such model is the PageRank algorithm developed by Google. It assigns ‘PageRank’ values to each web page based on how many times it is linked to by other webpages. The ‘PageRank’ value of a site is also a function of the ‘PageRank’ value of the other sites that are linking to it. For example, an article on CNN.com linking to a small news organization website increases the ‘PageRank’ value of that site much more than if it was linked to from a local bloggers site with few followers. Applying the ‘PageRank’ concept, assuming the availability of the right information, it can be a simple process to figure out influential bloggers or social media actors across the web. For example, viewing the number of followers a twitter account has and measuring the number of times their tweets have been retweeted and by whom, can give an estimate for how influential the person is on Twitter. This can be also true on Linkedin by measuring a person’s number of contacts, endorsements and level and employers of those contacts.

In vs. Out Degree

This method is simple and more traditional than ‘PageRank’. It simply implies that people whom the rest of network engages with the most, has the most influence. For example, in online forums, experts are those whom reply to most questions but also their replies or answers are further replied to and commented on by others. This is also observed in online reviews where others can vote on the accuracy and reliability of a review. Another example is that often thousands of people can reply to a celebrity tweet but the celebrities often only acknowledge or engage replies from other celebrities or popular figures. Hence, for purposes of this paper, determining the users that an already known influencer or media outlet replies or engages in conversations with, regardless of platform, could point out a new influential node or user.

Community

In the crowded world of online social networking, often people are influenced not by a single friend or influential figure but rather their communities. Communities facilitate the spread of some information and memes while minimizing outside influence and information flow. In the social media age, however, it is hard not to be exposed to diverse thoughts or ideas. Social media also allows people to diversify their communities or sometimes become fanatics by adopting extreme versions of their offline communities. For example, a religious person belonging to a small religious offline community might become radicalized and a fanatic only when being exposed to extreme religious communities and views of his or her faith online. To track how influential a community is, few factors need to be considered. First, it’s important to consider how the community is structured and where within the community network the target people are located. People closer to the community hubs are much more likely to be active and follow community traditions than those on the edges. It can also be argued that the more friends and family members a person has belonging to the same community, his or her identity will more likely resemble that of the community. As such communities can be very influential as long as they have little competition from other communities especially if the target person’s social network is made up diverse communities. To illustrate, someone born and raised in Saudi Arabia will have a very different community and social network diversity and hence identity than someone born and raised in Lebanon or Turkey.

Lastly, social networks are ‘Assortative’, meaning people gravitate toward others with similar characteristics as them. However, the Internet is disassortative and allows for more diverse flow of information.

Implications

The methods and concepts above can be leveraged effectively to collect and analyze online social networks as well predict future contagions and viral memes. Specifically, identifying network ‘hubs’ or influencers and understanding the contagion threshold across various networks can be a very effective way of controlling or mitigating risks and opportunities. We have to look no further but the YouTube video of the Florida pastor attempting to burn Islam’s Holy book and the protests and deaths it caused across the Muslim world to comprehend how important social networking analysis can be. There are thousands of offensive videos uploaded on YouTube everyday; the question SNA tries to answer is why one particular video goes viral. In the case of the Florida pastor video, an Egyptian TV channel had found the video and broadcasted it across Egypt and eventually the Muslim world causing it to go viral and there is no guarantee that they won’t do it again. The question then becomes how to use social media effectively to mitigate and undermine such ideological and destructive contagions. Would the escalation of violence have been as bad if some influential clerics had dismissed the video as foolish and not worth the death and destruction it eventually caused? Could the Department of State have done a better job on social media to calm the tensions across the Muslim world? Or monitoring twitter activity and hashtags, could the news media have known the extreme anger that was building up, further fueled by various Middle Eastern governments?

Social networking data and analysis methodology are great for collecting and analyzing trends and contagions but SNA will have to further incorporate social, cultural and behavioral sciences to be most effective. Different cultures or ethnicities behave differently online and especially on social media. For example, Americans having been used to being targeted by advertising and commercials are much less likely to click on Facebook ads than their eastern counterparts where Internet, Facebook and advertising are all new concepts. Further, some culture have very indirect way of signaling or communicating as opposed to the direct communication style of westerners and the English language. For example, in Persian and Arabic it is common to use unusual or exaggerated analogies and metaphors to communicate an idea or a thought. A smart SNA approach would have customized profiles to understand these cultural nuances and differences to be most effective.

#SNA #SocialNetworkingAnalysis #Influence #Intelligence #Analysis

0 notes

projectthinker · 12 years ago

Text

How Twitter figures out the world with machine intelligence and Mechanical Turks

On Twitter's engineering blog, a fascinating description of how Twitter uses a blend of machine intelligence and Mechanical Turk tasks to figure out, in real time, what is going on in the world:

Overview

Before we delve into the details, here's an overview of how the system works.

First, we monitor for which search queries are currently popular. Behind the scenes: we run a Storm topology that tracks statistics on search queries. For example, the query [Big Bird] may suddenly see a spike in searches from the US.

As soon as we discover a new popular search query, we send it to our human evaluators, who are asked a variety of questions about the query. Behind the scenes: when the Storm topology detects that a query has reached sufficient popularity, it connects to a Thrift API that dispatches the query to Amazon's Mechanical Turk service, and then polls Mechanical Turk for a response. For example: as soon as we notice "Big Bird" spiking, we may ask judges on Mechanical Turk to categorize the query, or provide other information (e.g., whether there are likely to be interesting pictures of the query, or whether the query is about a person or an event) that helps us serve relevant Tweets and ads.

Finally, after a response from an evaluator is received, we push the information to our backend systems, so that the next time a user searches for a query, our machine learning models will make use of the additional information. For example, suppose our evaluators tell us that [Big Bird] is related to politics; the next time someone performs this search, we know to surface ads by @barackobama or @mittromney, not ads about Dora the Explorer.

Monitoring for popular queries

Storm is a distributed system for real-time computation. In contrast tobatchsystems like Hadoop, which often introduce delays of hours or more, Storm allows us to run online data processing algorithms to discover search spikes as soon as they happen. In brief, running a job on Storm involves creating a Storm topology that describes the processing steps that must occur, and deploying this topology to a Storm cluster. A topology itself consists of three things:

Tuple streamsof data. In our case, these may be tuples of (search query, timestamp).

Spoutsthat produce these tuple streams. In our case, we attach spouts to our search logs, which get written to every time a search occurs.

Boltsthat process tuple streams. In our case, we use bolts for operations like updating total query counts, filtering out non-English queries, and checking whether an ad is currently being served up for the query.

Here’s a step-by-step walkthrough of how our query topology works:

Whenever you perform a search on Twitter, the search request gets logged to a Kafka queue.

The Storm topology attaches a spout to this Kafka queue, and the spout emits a tuple containing the query and other metadata (e.g., the time the query was issued and its location) to a bolt for processing.

This bolt updates the count of the number of times we've seen this query, checks whether the query is "currently popular" (using various statistics like time-decayed counts, the geographic distribution of the query, and the last time this query was sent for annotations), and dispatches it to our human computation pipeline if so.

One interesting feature of our popularity algorithm is that we often re-judge queries that have been annotated before, since the intent of a search can change. For example, people may normally search for [Clint Eastwood] because they're interested in his movies, but during the 2012 Republican National Convention users may have wanted to see Tweets related to his speech there.

Read More @ the Twitter Engineering Blog

#twitter #SocialNetworkingAnalysis #DataMining

0 notes