Friday, March 16, 2018

Data Analysis Using Gephi, a Digital Humanities Case Study

Chinese Canadian Stories – Uncommon Histories from a Common Past was a collaborative project that I was a part of during an earlier part of my career as a librarian, and one I'm re-visiting again in the context of digital humanities.   Interestingly, when we began, we had no idea of the term DH.  I was more involved in the community engagement aspect of the project (which is also an important ingredient in DH projects).  Between 2006-2008, a team of student researchers at UBC working with Prof. Peter Ward and Prof. Henry Yu who spent two years painstakingly recording the data for every one of the over 97,000 Chinese in the Chinese Head Tax Register.  For each of the Chinese who entered Canada, the data included names, age, height, villages and counties of origin and through a digital database, the project enabled us a powerful research tool for understanding who these migrants were, where they left, and where they were going in Canada.  The irony is that the practice of restricting immigration actually left researchers a rich data collection of those early Canadian migrants.

While the project collaborated with the local Chinese Canadian community to preserve their culture and history through outreach and actively collecting materials for its web portal, an unintended yet innovative result was the emergence of digital tools and techniques normally used in the sciences enabled us to examine the records of the migrants. Through the project, the researchers published a few peer-reviewed research papers documenting their use of Gephi, a visualization network analysis tool, which plotted locations in Saskatchewan based on longitude, latitude, and a tool called Ego Network, which it allows us to select any node in the network and filter the network to only see its connections.

In Gephi, to have a high betweenness centrality score would mean that you are integral in connecting elements within the network. In the Saskatchewan network produced by Gephi, three major destinations emerge: Saskatoon, Regina, and Moose Jaw. Swift Current, while a major destination, does not have as many links to highly-connected places as Saskatoon, Regina, and Moose Jaw (which predictably are very connected to each other). Some of the major families in this network also start to emerge: there was a strong Ma clan association that had through chain migration spread across Saskatchewan.  Combined with oral histories and analog research, this method of DH inquiry is a supremely powerful way to enhance discover and to visually tell the story of our findings. 

The Ma family appears to be much more important in the Regina network than in the Moose Jaw network. In order to produce these networks using Gephi, the researchers combed through all the immigrants that listed Saskatchewan as their destination but had to deduce the Romanized form of their surnames. There are quite a few Romanizations that have multiple possible Chinese surnames associated with Ma.  For instance, in anayzing the Regina network, it becomes clear that the Luo and Liu who are from Yuemingcun in Sen Ning are probably the same family--either Luo or Liu, not both.   My colleagues at Asian Library, including the now retired Eleanor Yuen, have pioneered the way for future research by mapping the villages and towns recorded in the Register of Chinese Immigration to Canada from 1885 to 1949 in their original Chinese character names. 
The project was furthered with the great help at the Spatial History Project, Center for Spatial and Textual Analysis [CESTA] at Stanford University.

Using the variables of family name, village origin, and destination in Saskatchewan, Stanford researcher Stephanie Chan used Gephi to produce network patterns for four Chinese family lineages that visualize the weighted correspondence of family name and village origin in creating family chains and connection between destinations. Of course, this preliminary visualization is limited to only describing the Ma family in Saskatchewan.   There's still much data to be analyzed.   The work has just begun. 
Historical Chinese Language Materials in British Columbia (HCLMBC) was a collaboration between UBC and SFU to digitize historical records and images related to Chinese settlement and life in British Columbia.

Sources for more reading:

Yu, Henry, and Stephanie Chan. " The Cantonese Pacific: Migration Networks and Mobility Across Space and Time." Trans-Pacific Mobilities: The Chinese and Canada (2017): 25. [Link]

Hermansen, S. and H. Yu. “The Irony of Discrimination: Mapping Historical Migration Using Chinese Head Tax Data.” In Historical GIS Research in Canada, J. Bonnel and M. Fortin (Eds.), University of Calgary Press, 2014. [Link]

 Murphy, Nathan. "Review of a Digital History Tool: Gephi–Networking through History."

Calma, Angelito, and Martin Davies. 2017. Geographies of influence: A citation network analysis of higher education 1972–2014. Scientometrics 110 (3): 1579-99. [Link]

More Networks in the Humanities or Did books have DNA? [Link]
Stanford Gephi Workshop materials. [Link]