This tutorial follows Melanie Walsh’s Networks tutorial from her course Introduction to Cultural Analytics with some minor modifications.
Network analysis is all about looking at the patterns of relationships we find in our data, and seeing how individual ‘nodes’ are positioned vis-a-vis all the other ones, or what the emergent, macroscopic patterns might imply about how the entire population of …whatever it is you’re studying… might experience things like group identity or flows of information. In this tutorial, we’re looking at a network of correspondence between important figures in the Republic of Texas.
Grab this data (which is derived from the data first encountered in the regex tutorial and then further cleaned in the open refine tutorial):
List of links (this downloads a CSV file)
List of nodes(this downloads a CSV file)
Fire up your notebook
Start a new Jupyter notebook (you can use the Anaconda Navigator to start one up).
First, we’re going to install the libraries that we need:
|
|
Then we’ll import them, along with matplotlib so we can make nice diagrams:
|
|
(You might need to !pip install
matplotlib too, depending on your setup.)
Now we’ll load up the data describing our network of letter-writers. The nodes are the people who sent and received letters; the links are the letters that connect them. Make sure the texaslinks.csv
and texasnodes.csv
are in the same folder as your notebook .ipynb file; or you can read them directly off the web:
|
|
Now let’s build a network from that data:
|
|
Let’s see if we can make that a bit more readable:
|
|
That’s a bit better. Try fiddling with the figsize, the width, and the fontsize.
Network metrics
By looking at how different nodes connect or not, we can begin to examine how individuals were in positions to control information (for instance) or we might ask, are there any subgroups implied by these connections? The simplest indication of importance in a network might be ‘degree’ or the number of connections a node has. (And, historically, what might being a prolific letter writer or receiver-of-letters imply for a politician in the Republic of Texas? Always try to imagine what these metrics actually imply.)
We can calculate degree like so:
|
|
…but the result is a bit hard to read. We’ll convert it to a kind of list and then add the result as another attribute of the individual nodes in the graph:
|
|
Then it’d be nice to actually have this information in our actual dataframe or table:
|
|
(Now, if you want to write your graph to file, you’d run this little snippet: networkx.write_graphml(G, 'Texas-network.graphml')
. You can open a graphml file using Gephi to make it pretty (see this help with gephi.)
We can make a graph of the individuals with the highest degree values:
|
|
We might ask, who is most central? And by ‘most central’, we mean, who is on the most shortest paths between any two correspondents? Such a person might be in a prime position to influence or control information flow.
|
|
(compare that command with the one for ‘degree’. What changed? You can google the networkx package for other metrics to calculate, and you’d form the command much the same way.)
Let’s add it to our data frame, same as we did before:
|
|
Let’s plot the highest betweeness values:
|
|
How do the results change? What might this mean, historically?
Let’s see if there are any ‘communities’ implied by the network.
First we get the piece of code we want from networkx, then we use a particular community detection routine to fine them, then we’ll print them out:
|
|
Then we’ll add this information into our graph:
|
|
then we’ll add it also to our dataframe:
|
|
Then, if you want to see who is a member of a particular group, you can select it like so:
|
|
…where you change the numeral to whichever group you want. You might want to see who is in what group by making a plot:
|
|
Finally, get all of your metrics into a nice table:
|
|
You can sort it like so: nodes_df.sort_values(by='betweenness', ascending=False)
Go Even Further
Try adapting Melanie Walsh’s code for an interactive network visualization built on Game of Thrones characters for your Texas data! Follow this link.