Thursday, February 14, 2008

The Googling Librarian

An article from the Chronicle of Higher Education popped up which once again highlighted the information (or lack of) needs of college students. It has been a recent phenomenon -- this argument and counter-arguments of the necessity of libraries and librarians in the face of Google-ization. For every viewpoint that the Internet has replaced the information services of libraries, there is the stance that users' are even more confused about information overload and the mess that is the Web.

I tend to agree with a what Dennis Dillon says in a new article, Google, Libraries, and Knowledge Management: From the Navajo to the National Security Agency. Libraries and the 'Net play are different entities: libraries play the library game, not the information game. Google is the same for everyone. It is not tailored for different user groups, and it does not change, as local users need shift. Google's very nature is different from that of libraries.

Here's the kicker folks: We could wake up tomorrow to the news that a banking conglomerate has purchased Google and intends to turn it into a private corporate information tool, and wants to convert the content to French. Although just a silly hypothetical situation, Dillon makes a good point that the nature of people and organizations such as Google are not playing the same games as libraries.

Perhaps this is what libraries with foresight such as McMaster University Libraries are doing. They're integrating new technologies to supplement and complement existing facilities. Before it's too late. I personally talk a great deal about emergent technologies, particularly Web 3.0 and the Semantic Web, but in the end, I believe that these are mere tools that facilitate for the growing organism of libraries. In the end, interior design is as every bit relevant to how users perceive the physical spaces of the library as Facebook's uses for increasing outreach to students. But put the two together: and we pack a powerful punch. Dillon leaves us with a freshly yet somewhat disconcerting commenting:
Libraries have become so enamoured of technology that we sometimes cannot see what is in front of our faces, which is that there are still people in our buildings and they are there for a reason.

Wednesday, February 06, 2008

The Future of Digital Librarians

My colleague and mentor The Google Scholar discussed a bit about the Semantic Web and Web 2.0. Is it relevant to the profession of librarianship? Absolutely. How do we achieve it? Edie Rasmussen and Youngok Choi released a study in 2006 that surveys the skills that practitioners lack in What is Needed to Educate Future Digital Librarians. In this study, the two authors found that while many librarians are young and fresh out of graduate LIS school, they often lack the skills that are necessary for them to thrive in the increasingly digital world of libraries. LIS curricula are often limited to introductory classification and rudimentary information technology courses. There appears to be a real disjunct between the actual job descriptions that are required for newer positions and the actual skills that librarians receive in LIS school. Rasmussen and Choi's study finds that respondents are often frustrated over the "training gaps" during their studies for the following:

(1) Overall understanding of the complex interplay of software

(2) Lack of vocabulary to communicate to technical staff

(3) Knowledge of Web-related languages and technologies

(4) Web design

(5) Digital imaging and formatting

(6) Digital technology

(7) Programming and scripting languages

(8) XML standards and technologies

(9) Basic systems administration

In my own experience as an information professional, I find that these skills are sorely lacking in my own education. I'm finding it increasingly my own initiative to get caught up in the literature and the technologies. Who really has time to learn OAI-PMH metadata standards, XML, EAD, and TEI? Many librarians keep abreast of their field -- but on top of their current duties. But the problem remains that LIS schools do not to train technicians even though that is what they're doing - their mandate is to nurture scholars. Which I can understand. Yet, we can't fit a square peg into a circle. There lies the conundrum: something's got to give. But what? That has remained the intense tension in the field of LIS since its inception. With the advent of the Web and newer technologies, this gap will only widen.

Thursday, January 31, 2008

Web 3.0 as in Automation?

I often wonder what kind of automation will make it possible for the Semantic Web. I know there needs to be an automated web browser (or something similar), but what would it look like? The solution could look something like Automatic Character Switch (ACtS), which is a strategy and a philosophy rather than a standard, meaning community moderators can independently implement their own ACtS methods. Similar to AJAX, ACtS is invoked only when it is necessary; that is, only when a web space is connected to a community.

So what is ACtS? According to Yihong Ding, ACtS only allows different communities to recognize whatever they can identify from a web space. A web user can set up a local web spce that stores his web resources. When he subscribes to a new web community, he uploads his local web space to the site while the site customizes its resources based on the community specifications. ACts begins with a user's subscribing a web space to a community. The community server thus performs a community-sensitive resource identification procedure to categorize (information retrieval) and annotate (semantic annotation) public resources stored in the web space. Thus, the local web space creates a community-specific view over its resources, which composes a community-specific sub-space. But ACtS is only a theory. For it to be realized, there needs to be two premises:

(1) A uniform representation - Web spaces similar to what is on Web 1.0. This requires advancement on HTML encoding. In particular, this means independent HTML encoding of individual web resources.

(2) Character recognition and casting technology - A combination of information retrieval and semantic annotation methods.

Wednesday, January 30, 2008

Public Library 2.0?

Much has been discussed about the role of public libraries as they are increasingly facing budget cuts while facing greater needs for technological innovations. Some have argued that this is natural, as we have entered Library 2.0, which is all about rethinking library services in the light of re-evaluating user needs and the opportunities produced by new technologies. Although there have been great resources written about Library 2.0, there hasn't been one as thorough in its analysis of public libraries as Public Library 2.0: Towards a new mission for public libraries as a "network of community knowledge"? Chowdhury, Poulter, and McMenemy proposes Public Library 2.0, inspired by Ranganathan's famous five principles. They make great fodder for further discussion, don't they?

(1) Community knowledge is for use
- Since the value of a community is the knowledge it possesses, people who leave a community will have memories. Yet, little has been carried out in public libraries to digitize local resources.

(2) Every user should have access to his or her community knowledge - Knowledge is for sharing; community knowledge becomes valuable only when it can be accessed and used by others. Facilitating the creation and wider use of this knowledge should be the new role of public libraries.

(3) All community knowledge should be made available to its users - No community knowledge should be allowed to be wasted. Rather, public libraries should facilitate the creation of such knowledge so that it is recorded and preserved. Nothing should be lost.

(4) Save the time of the user in creating and finding community knowledge - Just like the paper records of past lives, the digital records of current lives are accumulating in an ad hoc manner but in a much greater quantity and variety. Hence, public library staff should fill the role of advisors on local content creation, management, and implementation of controlled description, as well as access schemes.

(5) Local community knowledge grows continually - Because community knowledge creation is a continual process, public libraries should act as local knowledge hubs must use existing standards and technology for digitization as well as metadata for the management of, and access to, the digitized resources

Sunday, January 27, 2008

The Semantic Catalogue

It's important that librarians keep at the back of their minds how to integrate the Semantic Web into the catalogue, which is ultimately the bridge that users cross to access the library's resources. But it's easy to forget about it, particularly since many libraries have difficulty keeping up with Web 2.0 technologies. But regardless of how far we've come along, it's necessary to peer into the future and see what kinds of changes we'll need to embrace. It could be ten years down the road before we hit the Semantic Web . . . or five . . . or even less. Take a look at Campbell and Fast's Academic Libraries and the Semantic Web: What the Future May Hold for Research Supporting Library Catalogues. They make an excellent case for integrating existing web resources into a dynamic, information-rich, and user-centred catalogue.

Meshing services such as IMDB, Amazon, AFI's Catalogue, the authors suggest that academic libraries could use the Semantic Web as a source of rich metadata that can be retrieved and inserted into bibliographic records to enhance the user's information searches and to expand the role of the library cataloguer as a research tool rather than a mere locating device. (Something along the lines of the Pipl search engine technology). In doing so, the cataloguer acts as an information intermediary, using a combination of subject knowledge and information expertise to facilitate the growth of semantically encoded metadata. In a Web 3.0 world, the cataloguer's new responsibilities would include the following:

(1) Locate - RDF-encoded information on specific subjects, scrutinizing its reliability, and assessing its usefulness in meeting cataloguing objectives

(2) Select - RDF resources for the specific item being catalogue

(3) Participate - In markup projects within a specific knowledge domain, thus promoting the growth of open-access domain-specific metadata

Thursday, January 24, 2008

Google Scholar, Windows Live Academic Search, and LIS 2.0

That School of Information and Library Science at the University of North Carolina at Chapel Hill sure churns out some great theses. The latest one is Josiah Drewey's Google Scholar, Windows Live Academic Search, and Beyond: A Study of New Tools and Changing Habits in ARL Libraries offers remarkable insight into these two academic search engines. Little has been written about Windows Live Academic Search, so much so that it appears most people have forgotten about it. (Including its own creators). Drewey's paper reveals that such is not the case. It's worth a read. Here are my favourite points that Drewey makes about GS and WLAS. I'll share them with you all, it deserves some attention here:

(1) Citation Ranking - Search results are largely influenced by citation counts generated by Google's link-analysis, which means that users see the most highly cited (and therefore, the most influential) articles

(2) Citation Linking - GS rivals Web of Science and Scopus with its ability to link to each article through a "cited by" feature that allows users to see which other authors have cited that particular article. GS is superior in this aspect as it stretches into the Humanities as well.

(3) Versioning - GS compiles each different version of a particular article or other work in one place. Different versions can come from publisher's databases, preprint repositories or even faculty homepages.

(4) Open Access - GS increasingly brings previously unknown or unpublicized content to users.

(5) Ability to link to libraries - GS has the bility to link to content already paid for by libraries. Thus, search results from GS can lead directly to the libraries' databases.

(6) Federated Search Engine - Instead of searching many databases as a query is made, GS' resources are compiled prior to the search and return very quickly.

In contrast, Drewey makes some great insights into Windows Live Academic Search. Here are the main strengths of WLAS:


(1) Better interface - WLAS uses a "preview pane" to display initial search results, which the user can mouse over a citation to show the abstract in another pane to the right, whereas GS is inflexible

(2) Names of authors are hyperlinked - Search results take the user to other works by each author

(3) Citations Export - Although GS allows this, WLAS are much more easily visible to export to BibTeX, RefWorks, and EndNote

(4) User-friendly - In many ways, WLAS offers more features tailored for users. Not only does it offer RSS feeds, it enables uses to store their preferences and save search parameters. GS surprisingly does not have such features.

Tuesday, January 22, 2008

The Long Tail and Libraries

To date, Lorcan Dempsey's Libraries and the Long Tail has offered the most insightful analysis of the Long Tail's importance in libraries. As I've written before, the Long Tail is an effective strategy to utilize when implementing Library 2.0 for the modern library. The question is: could it be implemented without a huge overhaul of most existing libraries? These are some points that Dempsey argues:

(1) Transaction Costs - The better connected libraries are, the lower the transaction costs

(2) Data about choice and behavious - Transactional behavioural data is used to adapt and improve systems. Examples of such data are holdings data, circulation and ILL data, and database usage data.

(3) Inventory - As more materials are available electronically, we will see more interest in managing the print collection in a less costly way. Although historical library models have been based on physical distribution of materials, resources are decreasingly needed to be distributed in advance of need; they can be held in consolidated stores

(4) Navigation - There are better ways to exploit large bibliographic resources. Ranking, recommendations, and relation help connect users to relevant material and also help connect the more heavily used materials to potentially useful, but less used, materials

(5) Aggregation of Demand - The library resource is fragmented. In the new network environment, this fragmentation reduces gravitational pull, which means that resources are prospected by the persistent or knowledgeable user, but they may not be reached by others to whom the resources are potentially useful. What OCLC is doing is making metadata about those books available to the major search engines and routing users back to library services

Saturday, January 19, 2008

Google = God?

Maybe Google got it right all along. But is it God? That often appears to be the way that most people do their searching online nowadays, expecting to find the answer to just about anything. Yihong Ding calls this kind of searching, "oracle-based" web searching, which search engines such as Google are assumed to know everything. But this worked relatively well in the early days of the Web because it a pragmatic and affordable strategy; at that time, the quantity of web resources was comparatively small. We rarely searched for meaning. Based on this premise, to build a semantic oracle (i.e. Semantic Google) is equiavalent to create a real God (who knows everything) to human beings.

Perhaps, according to Ding, a better alternative is collaborative searching. Since current answer-based search strategy is motivated by questions, collaborative search is motivated by answers. In our answer-based search model, the ones who answer questions may not have passion (or enough knowledge) to questions. But an inanimate search engine such as Google doesn't know this -- nor does it care.

However, Web 2.0 is slowly changing this course of searching. Already, search engines such as Cha Cha are harvesting collective intelligence and wisdom of the crowds to retrieve more "relevant" results. Ding goes one point further: Web 3.0 will be based on community-sensitive link resources. It will reverse the relation between horizontal search engines and vertical search engines. The current model of vertical search engines being built upon generic search engines are not working well because they are too immature to provide communicate-specific search by themselves. (Just look at the limitations of Rollyo). What will the Semantic Web search engine look like? Maybe something like this.

Friday, January 18, 2008

The Future of I.S.

Meet Ramesh Srinivasan, professor of Information Studies at UCLA. During my trip to Los Angeles, I met with the IS faculty and visited some of the libraries there at UCLA. My conversation with this up and coming academic star was fascinating to say the least. Ramesh's interests includes exploring connections between diasporic/indigenous communities and new media and how information technologies shape, transform, and differentially impact nations, cultures, societies along educational, political, health-related, social, and infrastructural dimensions.

Among his more interesting projects is the Emerging Databases, Emerging Diversity (ED2): National Science Foundation-funded initiative to study methods by which digital collections can be shared via systems that maintain diverse tags, ontologies, and interfaces. In collaboration with Cambridge University's Museum of Anthropology and Archaeology, and the Zuni community of New Mexico, the $300,000-funded project inquires how digital access to ancestral objects affects diverse communities. Ramesh's work involves extensive field work in places like Kyrgyzstan and India. (Exciting!)

The faculty at UCLA represents Library and Information Science's gradual shift towards the iSchool movement. Academics such as Ramesh Srinivasan represent the new face of LIS. This has important implications for librarians, who will ultimately be bred and nurtured by these new scholars nontraditional perspectives to LIS. Rather than basing their studies on users of libraries, newer scholars such as Srinivasan, whose background is as diverse as his research (his PhD is in Design), go beyond the traditional domain of LIS. Inevitably, librarianship will change because of this new approach. New ways of thinking and research will be injected into the profession -- perhaps this is where the source of innovation in libraries will come from as well. From the classroom.

Wednesday, January 16, 2008

Metcalfe's Law

As I had opined in previous posts, the next stage of the Web will be built on the existing infrastructure of Web 2.0. One of the foremost thinkers of the Semantic Web makes an insightful analysis of the progress from Web 2.0 to the Semantic Web. Along with Jennifer Golbeck, James Hendler puts forth the idea of Metcalfe's Law, arguing that value increases as the number of users increases. Because of this, potential links increase for every user as a new person joins. Not surprisingly, Metcalfe's Law is the essence of Web 2.0.

As the number of people in the network grows, the connectivity increases, and if people can link to each other's content, the value grows at an enormous rate. The Web, if it were simply a collection of pages of content, would not have the value it has today. Without linking, the Web would be a blob of disconnected pages.

As information professionals and librarians, we shouldn't miss out on the obvious links between Web 2.0 and the Semantic Web. Social networking is critical to the success of Web 2.0; but by combining the social networks of Web 2.0 with the semantic networks of the Semantic Web, a tremendous value is possible. Here's a scenario from Tom Gruber which I find very compelling:

Real Travel "seeds" a Web 2.0 travel site with the terms from a gazetteer ontology. This allows the coupling of place names and locations, linked together in an ontology structure, with the dynamic content and tagging of a Web 2.0 travel site. The primary user experience is of a site where travel logs (essentially blogs about trips), photos, travel tools and other travel-related materials are all linked together. Behind this, however, is the simple ontology that knows that Warsaw is a city in Poland, that Poland is a country in Europe, etc. Thus a photo taken in Warsaw is known to be a photo from Poland in a search, browsing can traverse links in the geolocation ontology, and other "fortuitous" links can be found. The social construct of the travel site, and communities of travelers with like interests, can be exploited by Web 2.0 technology, but it is given extra value by the simple semantics encoded in the travel ontology.
Genius.

Monday, January 07, 2008

Pragmatic Web as HD TV

The Pragmatic Web: A Manifesto makes a return to simplification. For all the hype about Web 3.0, we've still seen very little substantial evidence that it exists. Schoop, De Moor, and Dietz proposes a "Pragmatic Web" as a solution which does not replace the current web but rather, extend the Semantic Web.

Rather than waiting for everyone to come together and collaborate -- that could take forever or worse yet . . . never -- the best hope might be to encourage the emergence of communities of interest and practice that develop their own consensus knowledge on the basis of which they will standardize their representations. Thus, the vision of the Pragmatic Web is to augment human collaboration effectively by appropriate technologies. Thus, the Pragmatic Web complements the Semantic Web by improving the quality and legitimacy of collaborative, goal-oriented discourses in communities.

I liken this scenario to High-definition Television. By 2010, the majority of programming in North America will move to HDTV specifications, thus effectively removing other TV formats such as plasma TV's from competition. In the meantime, consumers are free to continue using their existing TV sets. The Web could very well employ this model, as it's logical and crosses the path of least damage. Using the HD TV scenario, Web users can continue using their current browsers and existing ways of surfing while those who want to maximize the full potential of the Web will use Semantic Web browsers (e.g. Piggy Bank) that are designed specifically to utilize the portion of the Web that is "Semantic Web-compliant."

Meanwhile, in the background, semantic annotation will be slowly integrated into Web pages, programs, and services. As time progresses, users will eventually catch onto the "rave" that is the Semantic Web . . .

Saturday, January 05, 2008

E-Commerce 2.0

Web 2.0 has been quite the hype over the past few years, perhaps too much. Much of it pertains to best practices using blogs, wikis, RSS feeds, and mashups. But not very much has been discussed - well, not enough in my opinion - about practical commercial applications other than the ubiquitous eBay and Amazon. Not anymore. Meet Zopa, the world's first social finance company. In 2005 Zopa pioneered a way for people to lend and borrow directly with each other online as part of our continuing mission to give people around the world the power to help themselves financially at the same time that they help others. According to Kupp and Anderson's Zopa: Web 2.0 Meets Retail Banking, here's how Zopa works:

(1) Zopa looks at the credit scores of people looking to borrow and determines whether they're an A*, A, B, or C-rated borrower. If they're none of the those, then Zopa's not for them

(2) Leners make lending offers such as "I'd like to lend this much to A-rated borrowers for this long and at this time

(3) Borrowers review the rates offered to them and acept the ones they like. If they are dissatisfied with the offered rates on any particular day, they can come back on subsequent days to see if rates have changed

(4) To reduce any risk, Zopa spreads lender capital widely. A lender putting forth, for instance, 500 pounds or more would have his or her money across at least 50 borrowers

(5) Borrowers enter into legally binding contracts with their lenders

(6) Borrowers repay monthly by direct debit. If repayments are defaulted, a collections agency uses the same recovery process that the High Street banks use

Thursday, January 03, 2008

Mashups for '09

It's already been two years since my publication of an article on library web mashups. There have been developments, but still no breakthroughs with that killer application that could popularize mashups for the masses. The main challenge with mashups is that they are still a programmer's world. In merging two or more web programs together, web mashups are the next stage of Web 2.0 and are changing the way that the web is being used. There are already several mashup editors that help user create or edit mashups. Yahoo pipes, Google Mashup Editor, Microsoft Popfly, and Mozilla Ubiquity. But they require some programming skills. I believe mashups are the next stage of the web, the Semantic Web. Why? Because mashups open up data, breaking down the information silos.

I've updated my last article with Mashups, Social Software, and Web 2.0: How Remixing Programming Code Has Changed The Web. In taking a look at mashups, I think libraries need to pay attention, as they open up virtual information services to a much larger audience.

When Times Are Tough . . .

I love libraries, everything from the smell of books, to the warmth of staff, the comfy carpets, to the great DVD collections that are all free to borrow with just a library card and nothing more. But we are in tough times lately, and the downfall of the economy has proven just how useful libraries are to society. As the Los Angeles Times has reported, that although retail stores may be quiet these days, but libraries are hopping as people look for ways to save money. The Los Angeles Public Library is “experiencing record use,” said spokesman Peter Persic, with 12% more visitors during fiscal 2008 than the previous year. At the San Francisco Public Library, about 12% more items were checked out in October than a year earlier. The Chicago Public Library system experienced a 35% increase in circulation. The New York Public Library saw 11% more print items checked out (a spokesman said that could be partly explained by extended hours) . . .

And I`ve begun to experience this myself. Patrons are starting to use collections more, and realizing the financial pinch that the economy has given us. Fear not. The library isn`t going anywhere anytime soon.

Wednesday, January 02, 2008

Mashups for '09

It's almost two years since I first researched on web mashups. I still remember having a working draft of an article I had been doing for the Journal of Canadian Health Libraries on New Year's Eve. (Hey, it was a slow day). Lo and behold, two years later, and there still have only been a handful of articles on mashups. My idol Michelle "The Krafty Librarian" Kraft has written an excellent chapter in Medical Librarian 2.0 which is perhaps the most concise to date.

I've recently written another entry on mashups, Mashups, Social Software, and Web 2.0
How Remixing Programming Code Has Changed The Web
. The challenge with mashups is that it's still unfortunately a web programmer's tool. However, the next stage of the Web will be mashups. It's about opening data for others, and breaking down information silos.

11 Ways to the Library of 2012

Don't blink. It's only five years away. Inundated with the day-to-day duties working in a large academic library has sometimes removed me from the "larger" picture of what libraries look like not only to users, but ultimately how libraries will look like in the future. I've written a great deal about the Semantic Web and Web 2.0; but how do they fit libraries: physically and conceptually? Visions: The Academic Library in 2012 offers a meta-glimpse of how libraries might look like in 2012. As you'll notice, some of the features are suspiciously Web 2.0 and Library 2.0? Let's take a look, shall we?

(1) Integrated Library System - the system will recognize the patron and quickly adapt and respond to the patron's new questions and needs (A Semantic Web portal?)

(2) Information Available - collections will undergo dramatic transformations, as they will be largely patron-selected, featuring multi-media resources and databases, many provided collaboriatvely through extensive consortial arrangements with other libraries and information providers (Think longtail?)

(3) Access to Information - print-on-demand schemes will be developed utilizing the dissertation production experience of UMI but providing mechanisms by which the user can return the fresh, undamaged manuscript for credit, and for binding and future use (Kindle?)

(4) Study Space - Space for work and study will be adaptable, with easily reconfigured physical and virtual spaces (Information Commons? Learning commons?)

(5) Information Instruction - Training and learning support, delivered both in person and through appliance-delivered (desktop, hand-held, and small-group), videoconferencing, will characterize all this

(6) Information Printouts - Articles, videos, audios, an on-demand printing of various formats will not only be commonplace, but displays of titles will be coordinated with publishers and booksellers to enhance information currency, to market small-run monographs, and to generate revenues

(7) Organizational Aspects - Library staff will be engaged, networked, matrix-structured, and largely "transparent" unless the patron is standing inside the facility facing the individual

(8) Orientation - Library's perspective will be "global" - ubiquitous automatic translators will facilitate truly global information-accessing programs

(9) Computer Access - From OPACS to wireless access for collapsible laptops and personal appliances

(10) Financial - the viable library will have developed dependable revenue streams to facilitate ongoing innovation and advancement (Library as Bookstore model?)

(11) Consortia - Collaborating to create and publish academic journals and resources, particularly e-journals, e-books, and collections of visual resouces in various media (Open Access?)

Tuesday, December 25, 2007

Happy Holidays and Seasons Greetings

Seasons Greetings to all. It is indeed a wonderful holidays as the Google Scholar has published an important piece to the Semantic Web literature. He's done it again, writing an concise and cogent piece on the key elements which differentiates Web 3.0 from Web 2.0. In other news, a reader recently made a comment from a previous entry which I found to be very interesting. Here's what he said:

I (as a librarian) found the article and the whole topic very important. I especially enjoyed the conclusion. You wrote that "Web 3.0 is about bringing the miscellaneous back together meaningfully after it's been fragmented into a billion pieces."I was wondering if in your opinion this means that the semantic web may turn a folksonomy into some kind of structured taxonomy. We all know the advantages and disadvantages of a folksonomy. Is it possible for web 3.0 to minimize those disadvantages and maybe even make good use out of them?

My response? It'll sound cliched and tired: it's really too early to tell. But although it's murky as to what the Semantic Web will look like, all directions point to the possibility that folksonomies will play a key role. Here's why:

(1) Underneath the messiness of the Web, is a fairly organized latent structure, whose backbones are web threads. A scale-free network is significantly dominated by few highly connected hubs.

(2) What this means is that folksonomies and tagging are in fact controlled vocabularies in their own right. Lots have been written about this. Recent studies have shown that the frequency distribution of tags in folksonomies tends to stabilize into power-law distributions. When a substantial number of users tag content for a long period of time, stable tags start appearing in the resulting folksonomy.

(3) Such a use of folksonomies could help overcome some of the inherent difficulties in ontology construction, thus potentially bridging Web 2.0 and the Semantic Web. By using folksonomies' collective categorization scheme as an initial knowledge base for constructing ontologies, the ontology author could then use the tagging distribution's most common tags as concepts, relations, or instances. Folksonomies do not a Semantic Web make -- but it's a good start.




Thursday, December 20, 2007

Information Science As Web 3.0?

In the early and mid-1950’s, scientists, engineers, librarians, and entrepreneurs started working enthusiastically on the problem and solution defined by Vannevar Bush. There were heated debates about the “best” solution, technique, or system. What ultimately ensued became information retrieval (IR), a major subfield of Information Science.

In his article Information Science, Tefko Saracevic makes a bold prediction:

fame awaits the researcher(s) who devises a formal theoretical work, bolstered by experimental evidence, that connects the two largely separated clusters i.e. connecting basic phenomena (information seeking behaviour) in the retrieval world (information retrieval). A best seller awaits the author that produces an integrative text in information science. Information Science will not become a full-fledged discipline until the two ends are connected successfully.

As Saracevic puts it, IR is one of the most widely spread applications of any information system worldwide. So how come Information Science has yet to produce a Nobel Prize winner?

But the World Wide Web changed everything, particularly IR. Because the Web is a mess, everybody is interested in some form of IR as a solution to fit it. A number of academic-based efforts were initiated to develop mechanisms, such as search engines, “intelligent” agents and crawlers. Some of those were IR scaled, and adapted to the problem; others were a variety of extensions of IR.

Out of all this emerged commercial ventures, such as Yahoo!, whose basic objective was to provide search mechanisms for finding something of relevance for users on demand. Not to mention making lots of money. Disconcertingly, the connection of the information science community is tenuous, and almost non-existent – the flow of knowledge is one sided, from IR research results into proprietary search engines . The reverse contribution to public knowledge is zero. A number of evaluations of these search engines have been undertaken simply by comparing some results between them or comparing their retrieval against some benchmarks.

As I've opined before, LIS will play a prominent role in the next stage of the Web. So who's it gonna be?

Tuesday, December 18, 2007

The Semantic Solution - A Browser?

In a recent discussion with colleagues about Web 2.0, we ran into the conundrum of what lies beyond Web 2.0 that would solve some of the limitations that it has. I offered the idea of an automated Web browser - a portal - one that would not be unlike an Internet Explorer browser with which a user could just sign in, and enter his or her password, and then freely surf the the Semantic Web (or whatever parts of it exist). It would be an exciting journey. Dennis Quan and David Karger's How to Make a Semantic Web Browser proposes the following:

Semantic Web browser—an end user application that automatically locates metadata and assembles point-and-click interfaces from a combination of relevant information, ontological specifications, and presentation knowledge, all described in RDF and retrieved dynamically from the Semantic Web. With such a tool, naïve users can begin to discover, explore, and utilize Semantic Web data and services. Because data and services are accessed directly through a standalone client and not through a central point of access . . . . new content and services can be consumed as soon as they become available. In this way we take advantage of an important sociological force that encourages the production of new Semantic Web content by remaining faithful to the decentralized nature of the Web

I like this idea of a portal. To have everyone agree about how to implement W3C standards - RDF, SPARQL, OWL - is unrealistic. Not everyone will accept the extra work for no real sustainable incentive. That is perhaps why there is no current real invested interest by companies and private investors to channel funding to Semantic Web research. However, the Semantic Web portal is one method to combat the malaise. In many ways, it resembles the birth of Web 1.0, before Yahoo!'s remarkable directory and search engines. All we need is one Jim Clark and one Marc Andreeson, I guess.

(Maybe a librarian and an information scientist, or two?)

Friday, December 14, 2007

"Web 3.0" AND OR the "Semantic Web"

Although I have worked in health research centres and medical libraries, I have never worked professionally as a librarian in a health setting. That is why I have great admiration for health librarians such as The Google Scholar, who can multitask, working as a top-notch librarian while at the same time keeping up with cutting edge technology. The Google Scholar recently made a wonderful entry about Web 3.0 and the Semantic Web:

In medicine, there is virtually no discussion about web 3.0 (see this PubMed search for web 3.0 (zero results) and most of the discussion on the semantic web (see this PubMed search - ~100 results) is from the perspective of biology/ bioinformatics.

The dichotomy in the literature is both perplexing and unsurprising. On the one hand, semanticists are looking at a new intelligent web has 'added meaning' to documents, and machine interoperability. On the other, web 3.0 advocates use '3.0' to be trendy, hip or to market themselves or their websites. That said, I prefer the web 3.0 label to the semantic web because it follows web 2.0 and suggests continuity.

I find it perplexing, too, that academics tend to subscribe to the term "Semantic Web" whereas practitioners and technology experts tend to refer to "Web 3.0." For example, the Journal of Cataloging and Classification recently had an entire issue devoted to the Semantic Web - without one mention of the term "Web 3.0."

Although the dichotomy in the literature is apparent, it's interesting that for most of us, we associate Web 3.0 and the Semantic Web together. It's not unlike a decade ago when we used the terms "Internet" and "Web" interchangeably -- even though they are not.

Tim Berners-Lee and the W3C envisioned for the Web to eventually progress to becoming the Semantic Web. Standards such as RDF and DAML+OIL emerged as early as 1998, long before Web 2.0. Web 2.0 is not even mentioned in the W3C because it has no standards. In my opinion, Web 3.0 and the Semantic Web are separate entities. Web 3.0 goes one step further in that it will extend beyond the web browser and will not be limited to just the personal computer.

It is important that medical librarians -- all librarians for that matter -- join in (and even lead) the discourse, particularly since the Semantic Web & Web 3.0 will be based heavily on the principles of knowledge and information organization. Whereas Web 1.0 and 2.0 could not distinguish among Acetaminophen, Paracetamol, and Tylenol -- Web 3.0 will.

Tuesday, December 11, 2007

Google and End of Web 2.0

Google Scholar recently celebrated its third birthday. There were some old friends who showed up at the party (the older brother Google arrived a bit late though) -- but overall, it was a fairly quiet evening atop of Mountain View. So where are we now with Google Scholar? Has the tool lived up to its early hype? What improvements have been made to Scholar in the past year? In a series of fascinating postings, my colleague, The Google Scholar, made some insightful comments, particularly when he argues:
What Google scholar has done is bring scholars and academics onto the web for their work in a way that Google alone did not. This has led to a greater use of social software and the rise of Web 2.0. For all its benefits, Web 2.0 has given us extreme info-glut which, in turn, will make Web 3.0 (and the semantic web) necessary.

I agree. Google Scholar (and Google) are very much Web 2.0 products. As I had elaborated in my previous entry, AJAX (which is Web 2.0-based), produced many remarkable programs such as Gmail and Google Earth.

Was this destiny? Not really. As Yihong Ding proposes, Web 2.0 did not choose Google; rather, it was Google that had decided to follow Web 2.0. If Yahoo had only known about the politics of the Web a little earlier, it might have precluded Google. (But that's for historians to analyze). Yahoo! realized the potential of Web 2.0 too late; it purchased Flickr without really understanding how to fit it into Yahoo!'s Web 1.0 universe.

Back to Dean's point. Google's strength might ultimately lead to its own demise. The PageRank algorithm might have a drawback similar to Yahoo!'s once dominant directory. Just as Yahoo! failed to catch up with the explosion of the Web, Google's PageRank will slowly lose its dominance due to the explosion caused by Web 2.0. With richer semantics, Google might not be willing to drastically alter its algorithm since it is Google's bread-and-butter. So that is why Google and Web 2.0 might be feeling the weight of the future fall too heavily on their shoulders.

Sunday, December 09, 2007

AJAX'ing our way to Web 2.0

Part of my day job entails analyzing technologies and how they better serve users. But one of the things we seem to forget when promoting Web 2.0 is the flaws it brings with it. Because one of the core technologies of Web 2.0 is AJAX, I've been looking around for a good analysis of it. David Best's Web 2.0: Next Big Thing or Next Big Internet Bubble seems to do the job. AJAX is a core component of Web 2.0, as it introduces an engine that runs on the client side - the Web browser. Certain actions can be carried out in the engine and need no data transfer to the server; thus, they are carred out only on the client's computer and is thus quite fast, comparable to desktop applications. In the HTML-world of Web 1.0, a Web page has to completely reload after a user action, such as clicking on links, or entering data in a form.

Gmail, Google Maps, and Flickr are all AJAX (and therefore Web 2.0) applications. Yet, just because it's got the Web 2.0 label does not necessarily mean it is "better." Why? Let's take a look at Gmail and Flickr, and see the advantages and disadvantages of their reliance on AJAX-technology:

(1) Rich User Experience - Fasst! Response to user actions are quick and the Web applications behave like desktop applications (e.g. drag and drop).

(2) Javascript - AJAX is made up of JavaScript. Unlike Web 1.0 applications, JavaScript excludes ten percent of all Web users, an issue the W3C is concerned about. Without going into the technology, JavaScript bars many users from AJAX use (such as Active X - a known security problem in Internet Explorer)

(3) The Back Button - Because Web browsers usually keep a history of whole Web pages in Web 1.0, many are often surprised that Gmail does not allow this as it is an AJAX application, for single actions are not cacheable for the browser.

(4) Bookmarking - Web 2.0 is based on rich user experience; unfortunately, this means that as with many dynamically generated pages, bookmarking or linking to a certain state of such a page is nearly impossible, as those states are not uniquely identifiable by URL. (Try bookmarking on Flickr!)

Thursday, December 06, 2007

Are You Ready For Library 3.0?

Are you ready for Library 2.0? We might just be too late because Library 3.0 is just around the corner according to some observers. How can libraries learn from the other service industries, how will librarians keep up with subject specific skills (evidence-based medicine, law, problem-based learning? Are librarian skills out of alignment with these trends? As Saw and Todd point out in Library 3.0: Where Are Our Skills, the future of academic libraries will be a digital one, where the successful librarian will be flexible, adaptable, and multi-skilled in order to survive in an environment of constant and rapid change. Drivers for change will require this new generation of librarians to navigate not only new technologies as well as understanding their users’ behaviour, but ultimately themselves (Generation X and Y’s). So what are some attributes of Librarian 3.0?

(1) Institutionalization – Creating the right culture. Flexible hours and attractive salaries, without micromanagement while encouraging working in teams and individual praise and recognition for their accomplishment. The key to retaining these employees is the quality of relationships they have with their managers - Gen X and Y's see their work demand a better balance in their work and personal lives.

(2) Innovation – Doing things differently – Innovative services will mean taking-the-service to the clients. An example would be “Librarian With a Latte” program from the University of Michigan at Ann Arbor.

(3) Imagination – Changing the rules. Collaboration with a wide range of information providers, where rethinking of the catalogue means it is no longer relevant in its current form – the catalogue should be a “one-stop shop” for searching resources, providing access beyond local collections, and to different types of resources in a seamless way

(4) Ideation – A Culture that encourages ideas – In creating the appropriate working environment, it is necessary to be also supported by professional associations.

(5) Inspiration – Doing things differently – As competition increases for the future workforce, ongoing professional development as opposed to formal training in a library school is necessary. Already free web-based instruction similar to the popular Five Weeks to a Social Library are already popping up.

So what does this all mean? It might sound like an eye-rolling cliche: information professionals of the future will have to be prepared for lifelong learning. This is a challenge for many professionals, who argue that their plates are already full to the brim. What to do? The authors leave us with a daunting reference from Charles Darwin:

It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change

Tuesday, December 04, 2007

I See No Forests But the Trees . . .

"So where is it?" is the question that most information professionals and scholars say when they approach the topic of the Semantic Web. Everyone's favourite Computer Scientist, Yihong Ding's Web Evolution Theory and The Next Stage: Part 2 makes an interesting observation, one which I agree wholeheartedly:

The transition from Web 1.0 to Web 2.0 is not supervised. W3C had not launched a special group for a plot of Web 2.0; and neither did Tim O'Reilly though he was one of the most insightful observers who caught and named this transition and one of the most anxious advocates of Web 2.0. In comparison, W3C did have launched a special group about Semantic Web that was engaged by hundreds of brilliant web researchers all over the world. The progress of WWW in the past several years, however, shows that the one lack of supervision (Web 2.0) advanced faster than the one with lots of supervision (Semantic Web). This phenomenon suggests the existence of web evolution laws that is objective to individual willingness.

Even Tim O'Reilly pointed out that Web 2.0 largely came out of a conference when exhausted software engineers and computer programmers from the dot.com disaster saw common trends happening on the Web. Nothing is scripted in Web 2.0. Perhaps that's why there can never be a definitive agreement on what it constitutes. As I give instructional sessions and presentations of Web 2.0 tools, sometimes I wonder, how wikis, blogs, social bookmarking, and RSS feeds will look like two years from now. Will they be relevant? Or will they transmute into something entirely different? Or will we continue on as status quo?

Is Web 2.0 merely an interim to the next planned stage of the Web? Are we seeing trees, but missing the forest?

Friday, November 30, 2007

Digital Libraries in the Semantic Age

Brian Matthews of CCLRC Appleton Laboratory offers some interesting insights in Semantic Web Technologies. In particular, he argues that libraries are increasingly converting themselves to digital libraries. A key aspect for the Digital library is the provision of shared catalogues which can be published and browsed. This requires the use of common metadata to describe the fields of the catalogue (such as author, title, date, and publisher), and common controlled vocabularies to allow subject identifiers to be assigned to publications.

As Matthew proposes, by publishing controlled vocabularies in one place, which can then be accessed by all users across the Web, library catalogues can use the same Web-accessible vocabularies for cataloguing, marking up items with the most relevant terms for the domain of interest. Therefore, search engines can use the same vocabularies in their search to ensure that the most relevant items of information are returned.

The Semantic Web opens up the possibility to take such an approach. It offers open standards that can enable vendor-neutral solutions, with a useful flexibility (allowing structured and semi-structured data, formal and informal descriptions, and an open and extensible architecture) and it helps to support decentralized solutions where that is appropriate. In essence, RDF can be used as this common interchange for catalogue metadata and shared vocabulary, which can then be used by all libraries and search engines across the Web.

But in order to use the Semantic Web to its best effect, metadata needs to be published in RDF formats. There are several initiatives involved with defining metadata standards, and some of them are well known to librarians:

(1) Dublin Core Metadata Initiative

(2) MARC

(3) ONIX

(3) PRISM

Wednesday, November 21, 2007

Postmodern Librarian - Part Two

To continue where we had left off. True, Digital Libraries and the Future of the Library Profession intimates that libraries and perhaps librarianship has entered the postmodern age. But Joint hasn't been the first to author such an argument; many others have also argued likewise. In fact, I had written about it before, too. But I believe to stop at the modernist-postmodernist dichotomy misses the point.

In my opinion, perhaps this is where Web 2.0 comes in. Although the postmodern information order is not clear to us, it seems to be the dynamic behind Web 2.0, in which interactive tools such as blogs, wikis, RSS facilitate social networking and the anarchic storage of unrestrained distribution of content. According to Joint, much of our professional efforts to impose a realist-modernist model on our library will fail. The old LIS model needs to be re-theorized, just as Newtonian Physics had to evolve into Quantum Theory, in recognition of the fact that super-small particles simply were not physically located where Newtonian Physics said they should be. In this light, perhaps this is where we can start to understand what exactly is Web 2.0. And beyond.

Friday, November 16, 2007

Semantic Web: A McCool Way of Explaining It

Yahoo's Rob McCool argues in Rethinking the Semantic Web, Part 1 that the Semantic Web will never happen. Why? Because the Semantic Web has three fundamental parts, and they just don't fit together based on current technologies. Here is what we have. The foundation is the set of data models and formats that provide semantics to applications that use them (RDF, RDF Schema, OWL). The second layer is composed of services - purely machine-accessible programs that answer Web requests and perform actions in response. At the top are the intelligent agents, or applications.

Reason? Knowledge representation is a technique with mathematical roots in the work of Edgar Codd, widely known as the one whose original paper using set theory and predicate calculus led to the relational database revolution in the 1980's. Knowledge representation uses the fundamental mathematics of Codd's theory to translate information, which humans represent with natural language, into sets of tables that use well-defined schema to defined schema to define what can be entered in the rows and columns.

The problem is that this creates a fundamental barrier, in terms of richness of representation as well as creation and maintenance, compared to the written language that people use. Logic, which forms the basis of OWL, suffers from an inability to represent exceptions to rules and the contexts in which they're valid.

Databases are deployed only by corporations whose information-management needs require them or by hobbyists who believe they can make some money from creating and sharing their databases. Because information theory removes nearly all context from information, both knowledge representation and relational databases represent only facts. Complex relationships, exceptions to rules, and ideas that resist simplistic classifications pose significant design challenges to information bases. Adding semantics only increases the burden exponentially.

Because it's a complex format and requires users to sacrifice expressively and pay enormous costs in translation and maintenance, McCool believes Semantic Web will not achieve widespread support. Never? Not until another Edgar Codd comes along our way. So we wait.

Wednesday, November 14, 2007

The Postmodern Librarian?

Are we in the postmodern era? Nicholas Joint's Digital Libraries and the Future of the Library Profession seems to think so. In it, he argues that unique contemporary cultural shifts are leading to a new form of librarianship that can be characterized as "postmodern" in nature, and that this form of professional specialism will be increasingly influential in the decades to come.

According to Joint, the idea of the postmodern digital library is clearly very different from the interim digital library. In the summer of 2006, a workshop at the eLit conference in Loughborough on the cultural impact of mobile communication technologies, there emerged the Five Theses of Loughborough. Here they are:

(1) There are no traditional information objects on the internet with determinate formats or determinate formats or determinate qualities: the only information object and information forat on the internet is "ephemera"

(2) The only map of the internet is the internet itself, it cannot be described

(3) A hypertext collection cannot be selectively collected because each information object is infinite and infinity cannot be contained

(4) The problem of digital preservation is like climate change; it is man-made and irreversible, and means that much digital data is ephemeral; but unlike climate change, it is not necessarily catastrophic

(5) Thus, there is no such thing as a traditional library in a postmodern world. Postmodern information sets are just as accessible as traditional libraries;: there are no formats, no descriptions, no hope of collection management, no realistic possibility of preservation. And they work fine.

Monday, November 12, 2007

New York City In a Semantic Web

Tim Krichel in The Semantic Web and an Introduction to Resource Description Framework makes a very astute analogy for understanding the technology behind the Semantic Web, particularly the nuances of XML and RDF, where the goal is to move away from the present Web - where pages are essentially constructed for use by human consumption - to a Web where more information can be understood and treated by machines. The analogy goes like this:
We fit each car in New York City with a device that lets a reverse geographical position system reads its movements. Suppose, in addition, that another machine can predict the weather or some other phenomenon that impacts traffic. Assume that a third kind of device has the public transport timetables. Then, data from a collaborative knowledge picture of these machines can be used to advise on the best means of transportation for reaching a certain destination within the next few hours.
The computer systems doing the calculations required for the traffic advisory are likely to be controlled by different bodies, such as the city authority or the national weather service. Therefore, there must be a way for software agents to process the information from the machine where it resides, to proceed with further processing of that information to a form in which a software agent of the final user can be used to query the dataset.

Wednesday, November 07, 2007

Genre Searching

At today's SLAIS colloquium, Dr. Luanne Freund gave a presentation on Genre Searching: A Pragmatic Approach to Information Retrieval. Freund argues for taking a pragmatics approach in genre searching and genre classification. But there are two perspectives of pragmatics: socio-pragmatic and cognitive-pragmatic. Using a case study, a high-tech firm, Freund and her colleagues built a unique search engine called X-Cite, which culls together documents from the corporate intranet (which include anything from FAQ's to specialize manuals) with tags. In ranking documents based on title, abstract, and keywords as part of the search engine, the algorithm uniquely cuts down on the ambiguity and guesswork of searching. Using a software engineering workplace domain as its starting point, Freund believes that genre searching has the potential to make a significant contribution to the effectiveness of workplace search systems, by incorporating genre weights into the ranking algorithm.

In genre analysis, three steps must be taken:

(1) Identify - The core genre repertoire of the work domain

(2) Develop - A standard taxonomy to represent it

(3) Develop - Operational definitions of the genre classes in the taxonomy, including identifying features in terms of form, function and content to facilitate manual and automatic genre classification.

Throughout the entire presentation, my mind kept returning to the question: is this not another specialized form of social searching? A tailorized search engine which narrows its search to a specific genre? Although the two are entirely different things, I keep thinking that creating your own search engine is certainly much easier.

Simple Knowledge Organization System (SKOS) & Librarians

Miles and Perez-Aguera's SKOS: Simple Knowledge Organization for the Web introduces SKOS, a Semantic Web language for representing structured vocabularies, including thesauri, classification schemes, subject heading systems, and taxonomies -- tools that cataloguers and librarians use everyday in their line of work.

It's interesting that the very essence of librarianship and cataloging will play a vital role in the upcoming version of the Web. It's hard to fathom how this works: how can MARC records and the DDC have anything to do with the intelligent agents which form the layers of architecture of the Semantic Web and Web 3.0? The answer: metadata.

And even more importantly: the messiness and disorganization of the Web will require information professionals with the techniques and methods to reorganize everything coherently. Web 1.0 and 2.0 were about creating -- but the Semantic Web will be about orderliness and regulating. By controlled structured vocabulary, SKOS is built on the following features. Take a closer look at Miles & Perez-Aguera's article -- it's well worth a read.

(1) Thesauri - Broadly conforming to the ISO 2788:1986 guidelines such as the UK Archival Thesaurus (UKAT, 2004), the General Multilingual Environmental Thesaurus (GEMET), and the Art and Architecture Thesaurus

(2) Classification Schemes - Such the Dewey Decimal Classification (DDC), the Universal Decimal Classification (UDC), and the Bliss Classification (BC2)

(3) Subject Heading Systems - The Library of Congress Subject Headings (LCSH) and the Medical Subject Headings (MeSH)

Friday, November 02, 2007

New Librarians, New Possibilities?

Are newer, incoming librarians changing the profession? Maybe. But not yet. University Affairs has published an article called The New Librarians, which highlights some of the new ideas that newer librarians are bringing into academic libraries. Everyone's favourite University Librarian (at least for me), Jeff Trzeciak, who has his own blog, is featured in the piece, and in it, he describes how he has swiftly hired new Library 2.0-ready librarians as well as overturning the traditional style decor and culture of McMaster Library, with items such as a "café, diner-style booths, stand-up workstations, oversized ottomans, and even coffee tables with pillows on the floor will take their place, all equipped for online access. Interactive touch-screen monitors will line the wall."

University of Guelph Chief Librarian Michael Ridley, similarly sees a future where the university library serves as an “academic town square,” a place that "brings people and ideas together in an ever-bigger and more diffuse campus. Services in the future will include concerts, lectures, art shows – anything that trumpets the joy of learning."

Is this the future of libraries? Yes, it's a matter of time. That's where we're heading -- that's where we'll end up. It is a matter of time. Change is difficult, particularly in larger academic institutions where bureaucracy and politics play an essential role in all aspects of operations. There is great skepticism towards Jeff Trzeciak's drastic changes to McMaster Library -- he's either a pioneer if he succeeds, or an opportunist if he fails. A lot is riding on Jeff's shoulders.

Tuesday, October 30, 2007

Introducing Semantic Searching

Just as we had Google and Web 2.0 nearly figured out, the Semantic Web is just around the corner. Introducing hakia, one of the first truly Semantic Web search engines. As we had argued, the Semantic Web is a digital catalogue, and many of the key components is the understanding of ontologies and taxonomies. Built on Semantic Web technologies, hakia is a new "meaning-based" (semantic) search engine with the purpose of improving search relevancy and interactivity -- the potential benefits for end users are search efficiency, richness of information, and saving time. Here are the elements which makes hakia. Will this hakia team be the next Brin and Page? Why don't you try it?

(1) Ontological Semantics (OntoSem) - A formal and comprehensive linguistic theory of meaning in natural language. As such, it bears significantly on philosophy of language, mathematical logic, and cognitive science

(2) Query Detection and Extraction (QDEX) - A system invented to bypass the limitations of the inverted index approach when dealing with semantically rich data

(3)
SemanticRank algorithm - Deploys a collection of methods to score and rank paragraphs that are retrieved from the QDEX system for a given query. The process includes query analysis, best sentence analysis, and other pertinent operations

(4) Dialogue -
In order establish a human-like dialogue with the user, the dialogue algorithm's goal is to convert the search engine's role into a computerized assistant with advanced communication skills while utilizing the largest amount of information resources in the world.

(5)
Search mission - Google mission was to organize the world's information and make it universally accessible and useful. hakia's mission is to search for better search.