Friday, March 16, 2018

Data Analysis Using Gephi, a Digital Humanities Case Study

Chinese Canadian Stories – Uncommon Histories from a Common Past was a collaborative project that I was a part of during an earlier part of my career as a librarian, and one I'm re-visiting again in the context of digital humanities.   Interestingly, when we began, we had no idea of the term DH.  I was more involved in the community engagement aspect of the project (which is also an important ingredient in DH projects).  Between 2006-2008, a team of student researchers at UBC working with Prof. Peter Ward and Prof. Henry Yu who spent two years painstakingly recording the data for every one of the over 97,000 Chinese in the Chinese Head Tax Register.  For each of the Chinese who entered Canada, the data included names, age, height, villages and counties of origin and through a digital database, the project enabled us a powerful research tool for understanding who these migrants were, where they left, and where they were going in Canada.  The irony is that the practice of restricting immigration actually left researchers a rich data collection of those early Canadian migrants.

While the project collaborated with the local Chinese Canadian community to preserve their culture and history through outreach and actively collecting materials for its web portal, an unintended yet innovative result was the emergence of digital tools and techniques normally used in the sciences enabled us to examine the records of the migrants. Through the project, the researchers published a few peer-reviewed research papers documenting their use of Gephi, a visualization network analysis tool, which plotted locations in Saskatchewan based on longitude, latitude, and a tool called Ego Network, which it allows us to select any node in the network and filter the network to only see its connections.



In Gephi, to have a high betweenness centrality score would mean that you are integral in connecting elements within the network. In the Saskatchewan network produced by Gephi, three major destinations emerge: Saskatoon, Regina, and Moose Jaw. Swift Current, while a major destination, does not have as many links to highly-connected places as Saskatoon, Regina, and Moose Jaw (which predictably are very connected to each other). Some of the major families in this network also start to emerge: there was a strong Ma clan association that had through chain migration spread across Saskatchewan.  Combined with oral histories and analog research, this method of DH inquiry is a supremely powerful way to enhance discover and to visually tell the story of our findings. 


The Ma family appears to be much more important in the Regina network than in the Moose Jaw network. In order to produce these networks using Gephi, the researchers combed through all the immigrants that listed Saskatchewan as their destination but had to deduce the Romanized form of their surnames. There are quite a few Romanizations that have multiple possible Chinese surnames associated with Ma.  For instance, in anayzing the Regina network, it becomes clear that the Luo and Liu who are from Yuemingcun in Sen Ning are probably the same family--either Luo or Liu, not both.   My colleagues at Asian Library, including the now retired Eleanor Yuen, have pioneered the way for future research by mapping the villages and towns recorded in the Register of Chinese Immigration to Canada from 1885 to 1949 in their original Chinese character names. 
The project was furthered with the great help at the Spatial History Project, Center for Spatial and Textual Analysis [CESTA] at Stanford University.

Using the variables of family name, village origin, and destination in Saskatchewan, Stanford researcher Stephanie Chan used Gephi to produce network patterns for four Chinese family lineages that visualize the weighted correspondence of family name and village origin in creating family chains and connection between destinations. Of course, this preliminary visualization is limited to only describing the Ma family in Saskatchewan.   There's still much data to be analyzed.   The work has just begun. 
Historical Chinese Language Materials in British Columbia (HCLMBC) was a collaboration between UBC and SFU to digitize historical records and images related to Chinese settlement and life in British Columbia.

Sources for more reading:

Yu, Henry, and Stephanie Chan. " The Cantonese Pacific: Migration Networks and Mobility Across Space and Time." Trans-Pacific Mobilities: The Chinese and Canada (2017): 25. [Link]

Hermansen, S. and H. Yu. “The Irony of Discrimination: Mapping Historical Migration Using Chinese Head Tax Data.” In Historical GIS Research in Canada, J. Bonnel and M. Fortin (Eds.), University of Calgary Press, 2014. [Link]

 Murphy, Nathan. "Review of a Digital History Tool: Gephi–Networking through History."

Calma, Angelito, and Martin Davies. 2017. Geographies of influence: A citation network analysis of higher education 1972–2014. Scientometrics 110 (3): 1579-99. [Link]

More Networks in the Humanities or Did books have DNA? [Link]
Stanford Gephi Workshop materials. [Link]

Monday, January 08, 2018

DH Projects in East Asian Studies


The ‘Digital Humanities’ is still a young and highly contested area.  Furthermore, as Tom Mullaney has argued, within Digital Humanities is an “Asia deficit”which is no small part the outcome of more entrenched divides within the platforms and digital tools that form the foundation of DH itself.   This divide between East and West runs very deep, and is not primarily a question of scholarly interest or orientation.  I was pleasantly impressed at the progress made in DH learning more about these projects.

A couple of projects that I had come across recently came from a presentation by Michael Hunter of Yale University.  He introduced the The Life of the Buddha (LOTB) project which addresses this challenge by presenting and analyzing for the first time monumental Tibetan murals depicting the Buddha’s life, their related literature, and their architectural and historical settings. LOTB also offers scholarly and learning communities the first tool to research and engage image, text, architecture, and history as an integrated and meaning-rich whole. The project’s impact for the humanities and the study of Buddhism are thus twofold: the largest study to date on visual and textual Buddha narratives in Tibet, and a new digital tool for synthetic teaching and research of Buddhist images and texts in context.  These murals date from the first decades of the 17th century and are among only a handful of fully preserved narrative paintings in Central Tibet. They are also among the few murals in Tibet explicitly linked to an extant collection of narrative, poetic, ritual, and technical painting literature about the Buddha. Practically nothing has been written about the Jonang murals, and no complete visual documentation has ever been attempted.

The Ten Thousand Rooms Project (廣廈千萬間項目) is a project led by Michael Hunter, and is a collaborative workspace (but not a database) for pre-modern textual studies.  Building on the Mirador Viewer developed by Stanford University, the platform allows users to upload images of manuscript, print, inscriptional, and other sources and then organize projects around their transcription, translation, and/or annotation. Both as a workspace for crowd-sourcing core textual research and as a publishing venue for scholarly contributions that are less well suited to conventional book formats, the Ten Thousand Rooms Project is really one of the early DH projects at Yale that establishes an international online community committed to making the East Asian textual heritage more accessible to a wider audience. All users are free to view projects on the site, and registered users can create their own projects and also to others as well. 

In all, the future of DH in Asian Studies is coming along now, certainly at a pace that suggest much is happening, either at conferences, digital podcasts, and the network of scholars and practitioners coming together in a vibrant community of practice in an area of scholarship that's long overlooked.

Wednesday, December 27, 2017

Horizon Report 2017 - Will Be Missed. Rest in Peace NMC



Happy holidays and New Year to you all.   While I look forward to 2018, I'm sad that the New Media Consortium, after so many years of great work, has ended. Best known for producing its “Horizon Reports” on the future of technology at K-12 schools, universities, and museums, I remember every year about this time anxiously anticipating the release of the latest technology trends that the report releases.  Unfortunately, alas, the NMC has abruptly shut down this month after officials discovered the organization was out of money. It's an embarrassing end to an illustrious organization. How "because of apparent errors and omissions by its former Controller and Chief Financial Officer, the organization finds itself insolvent" is beyond me.  I'm too surprised to be even angry at this point, though I think we should be upset at this development. Not even time for us to say good-bye, or a proper sendoff.

Started in 1994, the NMC has served hundreds of college and university organizations, organizing conferences and events, and published reports in its goal of encouraging exploration and use of new media and technologies for learning and creative expression, particularly in 2002 when it began publishing its must-read Horizon Reports.  With that said, I hope these technology trends for academic libraries will not be the last time that we'll see the report coming out with findings.

1. Big Data Time-to-Adoption Horizon: One Year or Less

2. Digital Scholarship Technologies Time-to-Adoption Horizon: One Year or Less

3. Library Services Platforms Time-to-Adoption Horizon: Two to Three Years

4. Online Identity Time-to-Adoption Horizon: Two to Three Years

5. Artificial Intelligence Time-to-Adoption Horizon: Four to Five Years

6. The Internet of Things Time-to-Adoption Horizon: Four to Five Years

Though it's filed Chapter 7 bankruptcy, it's still too early to say that the NMC is completely gone since the NMC’s assets may still be sold as part of the bankruptcy process and another entity (maybe another nonprofit?) may yet go forward with the annual summer conference. Who know's, it's still too early to say. Maybe that will be our goal moving into 2018 - to wish the NMC back into existence.