Wednesday, October 30, 2019

Using Palladio and Gephi as Data Visualization Tools

Much has been published about data visualization tools.  Miriam Posner has written in this area which I often use as a reference.   Some have even commented on the variations and differences of Gephi and Palladio

Over the last year, I've been using Palladio to examine datasets of the Chinese headtax project, which makes it easy to create bivariate network graphs to illustrate relationships between two dimensions. By default, Palladio creates a force-directed layout, which is different from Gephi.   Palladio, at the same time, is only limited to this layout. The platform has no way of doing computational or algorithmic analysis of your graphs; you will need a more powerful program like Gephi to do that work.  The most powerful method for creating networks come from programming languages such as R, Python, and Javascript. These languages allow you to control various algorithmic and aesthetic aspects of network visualizations.  Any dimension of the data can be used as the source and target of a graph.

Regardless, I still find that knowing a bit of each of the data visualization tools would be helpful for any researcher, in any phase of their research process and lifecycle.   The following video tutorials is what helps me keep myself informed about not only how to use the tools, but also weighing the strengths and weaknesses of a particular approach to playing around with the data.  I'd be interested in hearing how you approach your data.  How do you learn the tools of your trade and then decide which would be the best for your own analyses? 





Thursday, October 10, 2019

Was Shakespeare Really Shakespeare? "Shakespeare has now fully entered the era of Big Data."

Is Shakespeare really Shakespeare?  This is a question I pose whenever I'm asked about what is digital humanities.  In Shakespeare Beyond Doubt: Evidence, Argument, Controversy, two chapters are devoted to application of stylometry to Shakespeare's works and goes into much detail.   "Authorship and the evidence of stylometrics" by MacDonald Jackson and "What does textual evidence reveal about the author?" by James Mardock and Eric Rasmussen discuss an interesting aspect of these studies is that computer models using different algorithms come to similar conclusions as scholars from the "analog" era.

In 2013, The New Oxford Shakespeare made ripples in the literary world credited Christopher Marlowe as a co-author of Shakespeare’s “Henry VI,” Parts 1, 2, and 3.  Now, I've along with many throughout our literary studies have been told that there's an inevitable Marlowe-Shakespeare connection, but it isn't until more recently that scholars using distant reading techniques have used computer-aided analysis of linguistic patterns across databases to further this argument, and as Gary Taylor proposes that "Shakespeare has now fully entered the era of Big Data."   Daniel Pellock-Pelzner points out that writing a play in the sixteenth century was a bit like writing a screenplay today, with many hands revising a company’s product.   The difference is that scholars from the New Oxford Shakespeare reduces the long-held hypothesis since the Victorian era that algorithms can truly tease out the work of individual hands. 

I'm really fascinated to continue exploring this facet of literary studies, and I'm just at the beginning of my own journey.  I'm currently working on data in the sense of using R programming (which is also used in stylometry) to study the early Chinese migrants coming to Canada, and studying the data to discern patterns of migration and kinship networks.   Certainly, dipping into the literary and the historical analysis is very much in the spirit of DH. 


Thursday, June 20, 2019

Mining Register of Headtax Records using R and Palladio

In 2009, I began working with researchers and librarians UBC Library and SFU Library on a project that sought to collect and digitize materials from Chinese Canadian organizations across Canada.   That project ended in 2012 when funding from the federal government was completed.   Recently, Sarah Zhang and I began examining the 97,123 migrants who arrived in Canada between 1886 to 1949 that was painstakingly transformed to a Microsoft Excel spreadsheet but has been largely untouched for the most part by researchers other than a few research papers.

Between 1885 and 1923, the Canadian government imposed a head tax on Chinese immigrants entering Canada in order to restrict immigration. While a print register was created to keep track of the influx of migrants, these detailed recordings have actually provided researchers and historians with years of demographic information about the immigrants and have become a rich source of data for researchers. Thanks to two scholars, Peter Ward and Henry Yu, and their teams at the History Department of the University of British Columbia, the Register of Chinese Immigrants to Canada (1886-1949) has been transformed to a digital spreadsheet, openly accessible from UBC Open Collection, and a searchable database accessible from Library and Archives Canada.

The main challenge of this headtax project from its inception is that as an impressively large-scale dataset, the records are for the most part incoherent as they show idiosyncratic dialects of the immigrants which result in variations of place names and titles. The inconsistencies in place names, unfortunately, lead to difficulties for anyone who wishes to exercise any analysis associated with the immigrants’ origins. In other words, while there is a treasure trove of data to use, it may be unusable for most unless there can be data manipulation that can unlock a better understanding of the missing gaps.  In other words, not much sense could be made of the data even though it was readily available.

https://osf.io/9zr6f/


To address these inconsistencies, in 2008 Eleanor Yuen from the UBC Asian Library initiated a project to normalize various transliterations of the immigrants’ origins and had laid the groundwork for more in-depth research for future researchers. The immigrants’ origins are represented at two hierarchical levels: county and villages/towns; there are eight counties and numerous villages in the registry. Of the eight counties, the names of villages/towns in three counties have been mapped: Sun Woy (now knownas Xinhui), Zhongshan, and Taishan. Although just a snippet of the records, this normalized data offers a true glimpse into the full impact of what is available in the research.

Since the completion of the digitization work, scholarship has drawn on the digital records from the project, manifesting differing methods and research findings. W. Peter Ward’s publication in 2013focused on the changes on the wellbeing of Chinese headtax immigrants, particularly analyzing the immigrants’ stature, a statistical indicator for wellbeing. He contrasted mean height by age of different age cohorts (one decade apart), and found a rising trend in stature over time: “a slow but significant increase in stature within the immigrant population from the middle of the 19th century to the early years of the Sino-Japanese War."  This increase in height, Ward speculated, can be attributed to the migration process itself.

In terms of methodology, Sarah and I felt that the previous studies discussed above haven’t yet demonstrated the potential of a great variety of computational tools, such as R, a statistical computational language, and Palladio, a network analysis tool developed by the Humanities + Design Lab at Stanford University.   We decided to continue with the research by building some datasets and opening up our discoveries in the Open Science Framework with intentions that our study can demonstrate and share the untapped potential of the head tax data while also providing testimony for new modes that librarians help shape digital scholarship and create promising new research questions for researchers.   Stay tuned for more!  In the meantime, please download the data and try it out!

Friday, May 31, 2019

Supporting Diversity, Equity, and Inclusion in Our Canadian Libraries - Reflections From The Last Decade


I recently presented at the Saskatchewan Libraries Association (SLA) 2019 with my colleagues Maha Kumaran and Jian Wang.   It was a self-reflective exercise, to distill a decade's worth of professional work as an academic librarian.   Perhaps Miu Chung Yan, a social work scholar puts it best when he asserts that a profession such as social work has its roots deeply embedded in colonialist origins, with a history steeped in British methodologies and history. Librarianship offers similar comparisons as it is deeply influenced by British and Anglo-American thinkers and practitioners.  As far back as 1946, Sidney Ditzion had already proposed that since America drew much of its cultural influence from the European continent, it is not surprising that librarianship should be one of them. To understand the bridge between librarianship and cultural diversity, one also needs to understand that the phenomenon is intrinsically tied to society as much as the profession. ALA leaders constituted an elite corps of Western Anglo-Saxon Protestants (WASP) – mostly male, middle-class professionals immersed in the disciplinary and literary canons of the dominant culture and had shared a common ideology. However, when a profession lacks diversity, it, unfortunately, loses relevance for many of its users.  Libraries are a microcosm of society, and if libraries are not a reflection of our society, then there is a real cause for concern. 
As a librarian earning his stripes in a profession steeped in tradition and unwritten rules, it feels overwhelming at times. But I survived, and although still on my journey as a visible minority librarian, I have found some strategies that have worked for me in coping and performing at a high level as a professional librarian. Not only is being connected to fellow colleagues critical, but one must have commitment to his own professional and personal development at all times. Keeping abreast of technical knowledge and other developments in the field of librarianship is important, but equally vital is the soft skills such as interpersonal relations, confidence, and a positive mindset. As the oft-quoted screenwriter Melissa Rosenberg puts it, “It doesn't matter if you're the smartest person in the room: If you're not someone who people want to be around, you won't get far.”
I've written and published some of these strategies in Aboriginal and Visible Minority Librarians: Oral Histories From Canada, and shared some of these thoughts and reflections at SLA 2019.   I've been a part of VIMLOC for a number of years now, and I'm encouraged and proud to see how far it's come, but also how much more it needs to go to truly make an impact in Canadian libraries (and beyond).  Is it enough?   What do we need more to help us do more?   I encourage us all as librarians to think more broadly about our place in not only the profession but also in society: how we do help shape the future that is so highly influenced from the past?   How do we instill change, even though we are powerless in our own ways?   I challenge each and every one of you to start making a positive contribution by changing our perceptions of the status quo.

Thursday, April 11, 2019

The Emotional Fatigue of Unseen Labour in Librarianship

Photo by Fahrul Azmi on Unsplash
Recently, there was an article that had ripples across the academic community at UBC.  Although it's not a revelation that racialized faculty dedicate a huge amount of time, energy and passion for helping students of colour who struggle, often much more than their white colleagues, the work is often invisible, or unseen, labour.   In my years as a professional librarian, I have consulted with a fair share of students of colour who either want to enter the profession or already studying in a graduate program.   Although thankfully, there is no need for me to offer boxes of kleenexes in these meetings, my long conversations with students often do veer into serious confessions of identity, self-doubt, and then, experiences of discrimination.

As a Canadian-born Chinese (CBC), I have personally experienced and seen some of the barriers that visible minority librarians face entering the library profession. Like most ethnic minority librarians, I have faced challenges of misperceptions and biases that are attached to librarians of colour, and like most, I strive to as professional as possible in dealing with and learning from cultural barriers in the workplace. Oftentimes, I have heard from mentors and colleagues that librarians such as myself need to be more outgoing and sociable or to “break out of the shell” and engage them more. Research studies have supported these conceptions of certain Asian groups as a “model minority” with labels “conservative,” and “lacking in interpersonal skills."  

In fact, one case study found that supervisors can evaluate the performances differently for different ethnic groups because of preconceived biases.  This is especially problematic as librarianship is a social profession.  Without opportunities for social expression, the career of an individual is at a severe disadvantage.   But that's just the way things are, and librarians such as myself do our best to listen and empathize with visible minorities breaking into the profession. It's emotional labour, and physical toll on one's psyche when hearing stories of not racial or gender discrimination.  It demands time and creates emotional fatigue. I often come out of it tearing up on the inside, but remaining calm on the outside.  It's important work, unpaid and unrecognized, but work I am proud to do on my own time if it helps another individual and advances my profession in the future.  

Thanks to some great mentors and relationships with colleagues, I have for the most part experienced positive and rewarding experiences as a librarian, but it has not been without its rocky moments. Perfecting the craft of reference work, collection development techniques, and best practices for information literacy instruction classes is challenging as it is with vast amounts of time and dedication required, but in addition to that, visible minority librarians must also learn the nuances of fitting into a particular organizational culture firmly while still feeling comfortable in one’s own skin. I've written about this in the past and will be sharing my thoughts and research at the Saskatchewan Libraries Association in May 2019 One of the proudest initiatives that I'll be talking about is one that I've been a part for many years, the Visible Minority Librarians Network of Canada (ViMLoC), which offers advice and guidance to visible minority librarians in the areas of education, training, and mentorship.   The panel will also be examining the Census of Canadian Academic Librarians of 2016 and 2018, the panel will share its view of the censuses of the two years and discuss how much we have progressed with diversity as a profession in light of the recent controversy at the ALA Midwinter in Seattle.  I look forward to reporting back.  Stay tuned.


Tuesday, April 09, 2019

The Big Academic Publishers Going Into Data Analytics Business

The latest SPARC Landscape Analysis is a fascinating read.  It's surprising to learn that not only are the so-called big-three academic publishers - Elsevier, Pearson and Cengage - are doing extremely well financially, they are keeping ahead of the curve radically transforming themselves into data analytics companies built atop their content, continuously looking at approaches to monetize its content.   It's an interesting question I often get from students who ask me about the citation manager Mendeley (owned by Elsevier), and why it's free and offers 2 gigabytes of free space of storage.

None of these companies shows any inclination to abandon its traditional content business, and for sound reasons.   These publishers continue to use data and data analytics services to their customers, not content to just growing their traditional core business.   Why should we as academics care?  Well, the move by publishers into the core research and teaching missions of colleges and universities, with tools aimed at evaluating productivity and performance, means that the academic community could lose control over vast areas of its core activities.   While Elsevier is the example, it could be followed by any of the other big publishers.  Here's the type of influence that publishers have:

(1) Research Prediction - Publishers could identify, through the analysis of research and publication patterns and the quality and reach of their collaboration networks, which researchers are likely to grow into future leaders in their respective fields and offer them editorial board positions on their journals ahead of other publishers.

(2) Disciplines - They could also identify which segments of various disciplines are likely to evolve into the next growth area for research by looking (for example) at project participation patterns, size, and quality of teams, and funding bodies’ decisions, targeting these segments with new, dedicated journals ahead of other publishers.

(3) Funding -  They could isolate in advance new trends in interdisciplinary studies, allowing it to establish publication forums where none exist today and even driving funding decisions which lead to accelerated growth for those types of research.

As we can see, we are heading into uncharted territory, at least in the digital and data age.   While Elsevier and these other publishers have been duly noted for their questionable practices and growing influence in academic publishing, (for better or worse, mostly for the worst) publishers need to face more scrutiny and the types of data they offer disguised as better services.  Question is: will we listen?