Friday, November 30, 2007

Digital Libraries in the Semantic Age

Brian Matthews of CCLRC Appleton Laboratory offers some interesting insights in Semantic Web Technologies. In particular, he argues that libraries are increasingly converting themselves to digital libraries. A key aspect for the Digital library is the provision of shared catalogues which can be published and browsed. This requires the use of common metadata to describe the fields of the catalogue (such as author, title, date, and publisher), and common controlled vocabularies to allow subject identifiers to be assigned to publications.

As Matthew proposes, by publishing controlled vocabularies in one place, which can then be accessed by all users across the Web, library catalogues can use the same Web-accessible vocabularies for cataloguing, marking up items with the most relevant terms for the domain of interest. Therefore, search engines can use the same vocabularies in their search to ensure that the most relevant items of information are returned.

The Semantic Web opens up the possibility to take such an approach. It offers open standards that can enable vendor-neutral solutions, with a useful flexibility (allowing structured and semi-structured data, formal and informal descriptions, and an open and extensible architecture) and it helps to support decentralized solutions where that is appropriate. In essence, RDF can be used as this common interchange for catalogue metadata and shared vocabulary, which can then be used by all libraries and search engines across the Web.

But in order to use the Semantic Web to its best effect, metadata needs to be published in RDF formats. There are several initiatives involved with defining metadata standards, and some of them are well known to librarians:

(1) Dublin Core Metadata Initiative

(2) MARC

(3) ONIX

(3) PRISM

Wednesday, November 21, 2007

Postmodern Librarian - Part Two

To continue where we had left off. True, Digital Libraries and the Future of the Library Profession intimates that libraries and perhaps librarianship has entered the postmodern age. But Joint hasn't been the first to author such an argument; many others have also argued likewise. In fact, I had written about it before, too. But I believe to stop at the modernist-postmodernist dichotomy misses the point.

In my opinion, perhaps this is where Web 2.0 comes in. Although the postmodern information order is not clear to us, it seems to be the dynamic behind Web 2.0, in which interactive tools such as blogs, wikis, RSS facilitate social networking and the anarchic storage of unrestrained distribution of content. According to Joint, much of our professional efforts to impose a realist-modernist model on our library will fail. The old LIS model needs to be re-theorized, just as Newtonian Physics had to evolve into Quantum Theory, in recognition of the fact that super-small particles simply were not physically located where Newtonian Physics said they should be. In this light, perhaps this is where we can start to understand what exactly is Web 2.0. And beyond.

Friday, November 16, 2007

Semantic Web: A McCool Way of Explaining It

Yahoo's Rob McCool argues in Rethinking the Semantic Web, Part 1 that the Semantic Web will never happen. Why? Because the Semantic Web has three fundamental parts, and they just don't fit together based on current technologies. Here is what we have. The foundation is the set of data models and formats that provide semantics to applications that use them (RDF, RDF Schema, OWL). The second layer is composed of services - purely machine-accessible programs that answer Web requests and perform actions in response. At the top are the intelligent agents, or applications.

Reason? Knowledge representation is a technique with mathematical roots in the work of Edgar Codd, widely known as the one whose original paper using set theory and predicate calculus led to the relational database revolution in the 1980's. Knowledge representation uses the fundamental mathematics of Codd's theory to translate information, which humans represent with natural language, into sets of tables that use well-defined schema to defined schema to define what can be entered in the rows and columns.

The problem is that this creates a fundamental barrier, in terms of richness of representation as well as creation and maintenance, compared to the written language that people use. Logic, which forms the basis of OWL, suffers from an inability to represent exceptions to rules and the contexts in which they're valid.

Databases are deployed only by corporations whose information-management needs require them or by hobbyists who believe they can make some money from creating and sharing their databases. Because information theory removes nearly all context from information, both knowledge representation and relational databases represent only facts. Complex relationships, exceptions to rules, and ideas that resist simplistic classifications pose significant design challenges to information bases. Adding semantics only increases the burden exponentially.

Because it's a complex format and requires users to sacrifice expressively and pay enormous costs in translation and maintenance, McCool believes Semantic Web will not achieve widespread support. Never? Not until another Edgar Codd comes along our way. So we wait.

Wednesday, November 14, 2007

The Postmodern Librarian?

Are we in the postmodern era? Nicholas Joint's Digital Libraries and the Future of the Library Profession seems to think so. In it, he argues that unique contemporary cultural shifts are leading to a new form of librarianship that can be characterized as "postmodern" in nature, and that this form of professional specialism will be increasingly influential in the decades to come.

According to Joint, the idea of the postmodern digital library is clearly very different from the interim digital library. In the summer of 2006, a workshop at the eLit conference in Loughborough on the cultural impact of mobile communication technologies, there emerged the Five Theses of Loughborough. Here they are:

(1) There are no traditional information objects on the internet with determinate formats or determinate formats or determinate qualities: the only information object and information forat on the internet is "ephemera"

(2) The only map of the internet is the internet itself, it cannot be described

(3) A hypertext collection cannot be selectively collected because each information object is infinite and infinity cannot be contained

(4) The problem of digital preservation is like climate change; it is man-made and irreversible, and means that much digital data is ephemeral; but unlike climate change, it is not necessarily catastrophic

(5) Thus, there is no such thing as a traditional library in a postmodern world. Postmodern information sets are just as accessible as traditional libraries;: there are no formats, no descriptions, no hope of collection management, no realistic possibility of preservation. And they work fine.

Monday, November 12, 2007

New York City In a Semantic Web

Tim Krichel in The Semantic Web and an Introduction to Resource Description Framework makes a very astute analogy for understanding the technology behind the Semantic Web, particularly the nuances of XML and RDF, where the goal is to move away from the present Web - where pages are essentially constructed for use by human consumption - to a Web where more information can be understood and treated by machines. The analogy goes like this:
We fit each car in New York City with a device that lets a reverse geographical position system reads its movements. Suppose, in addition, that another machine can predict the weather or some other phenomenon that impacts traffic. Assume that a third kind of device has the public transport timetables. Then, data from a collaborative knowledge picture of these machines can be used to advise on the best means of transportation for reaching a certain destination within the next few hours.
The computer systems doing the calculations required for the traffic advisory are likely to be controlled by different bodies, such as the city authority or the national weather service. Therefore, there must be a way for software agents to process the information from the machine where it resides, to proceed with further processing of that information to a form in which a software agent of the final user can be used to query the dataset.

Wednesday, November 07, 2007

Genre Searching

At today's SLAIS colloquium, Dr. Luanne Freund gave a presentation on Genre Searching: A Pragmatic Approach to Information Retrieval. Freund argues for taking a pragmatics approach in genre searching and genre classification. But there are two perspectives of pragmatics: socio-pragmatic and cognitive-pragmatic. Using a case study, a high-tech firm, Freund and her colleagues built a unique search engine called X-Cite, which culls together documents from the corporate intranet (which include anything from FAQ's to specialize manuals) with tags. In ranking documents based on title, abstract, and keywords as part of the search engine, the algorithm uniquely cuts down on the ambiguity and guesswork of searching. Using a software engineering workplace domain as its starting point, Freund believes that genre searching has the potential to make a significant contribution to the effectiveness of workplace search systems, by incorporating genre weights into the ranking algorithm.

In genre analysis, three steps must be taken:

(1) Identify - The core genre repertoire of the work domain

(2) Develop - A standard taxonomy to represent it

(3) Develop - Operational definitions of the genre classes in the taxonomy, including identifying features in terms of form, function and content to facilitate manual and automatic genre classification.

Throughout the entire presentation, my mind kept returning to the question: is this not another specialized form of social searching? A tailorized search engine which narrows its search to a specific genre? Although the two are entirely different things, I keep thinking that creating your own search engine is certainly much easier.

Simple Knowledge Organization System (SKOS) & Librarians

Miles and Perez-Aguera's SKOS: Simple Knowledge Organization for the Web introduces SKOS, a Semantic Web language for representing structured vocabularies, including thesauri, classification schemes, subject heading systems, and taxonomies -- tools that cataloguers and librarians use everyday in their line of work.

It's interesting that the very essence of librarianship and cataloging will play a vital role in the upcoming version of the Web. It's hard to fathom how this works: how can MARC records and the DDC have anything to do with the intelligent agents which form the layers of architecture of the Semantic Web and Web 3.0? The answer: metadata.

And even more importantly: the messiness and disorganization of the Web will require information professionals with the techniques and methods to reorganize everything coherently. Web 1.0 and 2.0 were about creating -- but the Semantic Web will be about orderliness and regulating. By controlled structured vocabulary, SKOS is built on the following features. Take a closer look at Miles & Perez-Aguera's article -- it's well worth a read.

(1) Thesauri - Broadly conforming to the ISO 2788:1986 guidelines such as the UK Archival Thesaurus (UKAT, 2004), the General Multilingual Environmental Thesaurus (GEMET), and the Art and Architecture Thesaurus

(2) Classification Schemes - Such the Dewey Decimal Classification (DDC), the Universal Decimal Classification (UDC), and the Bliss Classification (BC2)

(3) Subject Heading Systems - The Library of Congress Subject Headings (LCSH) and the Medical Subject Headings (MeSH)

Friday, November 02, 2007

New Librarians, New Possibilities?

Are newer, incoming librarians changing the profession? Maybe. But not yet. University Affairs has published an article called The New Librarians, which highlights some of the new ideas that newer librarians are bringing into academic libraries. Everyone's favourite University Librarian (at least for me), Jeff Trzeciak, who has his own blog, is featured in the piece, and in it, he describes how he has swiftly hired new Library 2.0-ready librarians as well as overturning the traditional style decor and culture of McMaster Library, with items such as a "café, diner-style booths, stand-up workstations, oversized ottomans, and even coffee tables with pillows on the floor will take their place, all equipped for online access. Interactive touch-screen monitors will line the wall."

University of Guelph Chief Librarian Michael Ridley, similarly sees a future where the university library serves as an “academic town square,” a place that "brings people and ideas together in an ever-bigger and more diffuse campus. Services in the future will include concerts, lectures, art shows – anything that trumpets the joy of learning."

Is this the future of libraries? Yes, it's a matter of time. That's where we're heading -- that's where we'll end up. It is a matter of time. Change is difficult, particularly in larger academic institutions where bureaucracy and politics play an essential role in all aspects of operations. There is great skepticism towards Jeff Trzeciak's drastic changes to McMaster Library -- he's either a pioneer if he succeeds, or an opportunist if he fails. A lot is riding on Jeff's shoulders.