Wednesday, November 07, 2007

Simple Knowledge Organization System (SKOS) & Librarians

Miles and Perez-Aguera's SKOS: Simple Knowledge Organization for the Web introduces SKOS, a Semantic Web language for representing structured vocabularies, including thesauri, classification schemes, subject heading systems, and taxonomies -- tools that cataloguers and librarians use everyday in their line of work.

It's interesting that the very essence of librarianship and cataloging will play a vital role in the upcoming version of the Web. It's hard to fathom how this works: how can MARC records and the DDC have anything to do with the intelligent agents which form the layers of architecture of the Semantic Web and Web 3.0? The answer: metadata.

And even more importantly: the messiness and disorganization of the Web will require information professionals with the techniques and methods to reorganize everything coherently. Web 1.0 and 2.0 were about creating -- but the Semantic Web will be about orderliness and regulating. By controlled structured vocabulary, SKOS is built on the following features. Take a closer look at Miles & Perez-Aguera's article -- it's well worth a read.

(1) Thesauri - Broadly conforming to the ISO 2788:1986 guidelines such as the UK Archival Thesaurus (UKAT, 2004), the General Multilingual Environmental Thesaurus (GEMET), and the Art and Architecture Thesaurus

(2) Classification Schemes - Such the Dewey Decimal Classification (DDC), the Universal Decimal Classification (UDC), and the Bliss Classification (BC2)

(3) Subject Heading Systems - The Library of Congress Subject Headings (LCSH) and the Medical Subject Headings (MeSH)

Friday, November 02, 2007

New Librarians, New Possibilities?

Are newer, incoming librarians changing the profession? Maybe. But not yet. University Affairs has published an article called The New Librarians, which highlights some of the new ideas that newer librarians are bringing into academic libraries. Everyone's favourite University Librarian (at least for me), Jeff Trzeciak, who has his own blog, is featured in the piece, and in it, he describes how he has swiftly hired new Library 2.0-ready librarians as well as overturning the traditional style decor and culture of McMaster Library, with items such as a "café, diner-style booths, stand-up workstations, oversized ottomans, and even coffee tables with pillows on the floor will take their place, all equipped for online access. Interactive touch-screen monitors will line the wall."

University of Guelph Chief Librarian Michael Ridley, similarly sees a future where the university library serves as an “academic town square,” a place that "brings people and ideas together in an ever-bigger and more diffuse campus. Services in the future will include concerts, lectures, art shows – anything that trumpets the joy of learning."

Is this the future of libraries? Yes, it's a matter of time. That's where we're heading -- that's where we'll end up. It is a matter of time. Change is difficult, particularly in larger academic institutions where bureaucracy and politics play an essential role in all aspects of operations. There is great skepticism towards Jeff Trzeciak's drastic changes to McMaster Library -- he's either a pioneer if he succeeds, or an opportunist if he fails. A lot is riding on Jeff's shoulders.

Tuesday, October 30, 2007

Introducing Semantic Searching

Just as we had Google and Web 2.0 nearly figured out, the Semantic Web is just around the corner. Introducing hakia, one of the first truly Semantic Web search engines. As we had argued, the Semantic Web is a digital catalogue, and many of the key components is the understanding of ontologies and taxonomies. Built on Semantic Web technologies, hakia is a new "meaning-based" (semantic) search engine with the purpose of improving search relevancy and interactivity -- the potential benefits for end users are search efficiency, richness of information, and saving time. Here are the elements which makes hakia. Will this hakia team be the next Brin and Page? Why don't you try it?

(1) Ontological Semantics (OntoSem) - A formal and comprehensive linguistic theory of meaning in natural language. As such, it bears significantly on philosophy of language, mathematical logic, and cognitive science

(2) Query Detection and Extraction (QDEX) - A system invented to bypass the limitations of the inverted index approach when dealing with semantically rich data

(3)
SemanticRank algorithm - Deploys a collection of methods to score and rank paragraphs that are retrieved from the QDEX system for a given query. The process includes query analysis, best sentence analysis, and other pertinent operations

(4) Dialogue -
In order establish a human-like dialogue with the user, the dialogue algorithm's goal is to convert the search engine's role into a computerized assistant with advanced communication skills while utilizing the largest amount of information resources in the world.

(5)
Search mission - Google mission was to organize the world's information and make it universally accessible and useful. hakia's mission is to search for better search.

Monday, October 22, 2007

A Defintion of the Semantic Web

Parker, Nitse, and Flowers' Libraries as Knowledge Management Centers makes a good point about special libraries. Libraries need to be at the forefront of technology, or else they'll be an endangered species. As libraries struggle with the fallout of the digital age, they must find a creative way to remain relevant to the twenty first century user who has the ability and means of finding vast amounts of information without even setting foot in a library. The authors go on to suggest that an understanding of the Semantic Web is necessary for those working in libraries. An excellent definition of the Semantic Web is made -- one of the best I've seen so far:

Today's web pages are designed for human use, and human interpretation is required to understand the content. Because the content is not machine-interpretable, any type of automation is difficult. The Semantic Web augments today's web to eliminate the need for human reasoning in determining the meaning of web-based data. The Semantic Web is based on the concept that documents can be annotated in such a way that their semantic content will be optimally accessible and comprehensible to automated software agents and other computerized tools that function without human guidance. Thus, the Semantic Web might have a more significant impact in integrating resources that are not in a traditional catalog system than in changing bibliographic databases.

Thursday, October 11, 2007

Three Perspectives of the Semantic Web

Catherine Marshall and Frank Shipman has interesting insight in Which Semantic Web? In it, they argue that the plethora of interpretations of the Semantic Web can be traced back to three different perspectives. Here they are:

(1) A Universal Library - Readily accessed and used by humans in a variety of information use and contexts. This perspective arose as a reaction to the disorder of the Web, which was not ordered in categorization until search engines came along. Metadata, cataloguing, and schemas were seen as the answer.

(2) Computational Agents - Completing sophisticated activities on behalf of their human counterparts. Tim Berners-Lee envisioned an infrastructure for knowledge acquisition, representation, and utilization across diverse use contexts. This global knowledge base wil be used by personal agents to collect and reason about information, assisting people with tasks common to everyday life.

(3) Federated Data and Knowledge Base - In this vision, federated components are developed with some knowledge of another or at least with a shared anticipation of the type of applications that will use the data. In essence, this Web encompasses languages used for syntactically sharing data rather than having to write specialized converters for each pair of languages.

Wednesday, October 10, 2007

Knowledge Management 3.0

Michael Koenig and T. Kanti Srikantaiah proffer the idea that Knowledge Management is in its third phase. Here are the different stages:

Stage 1 - Internet of Intellectual Capital - this initial stage of KM was driven primarily by IT. In this stage, organizations realized that their stock in trade was information and knowledge -- yet the left hand rarely knew what the right hand did. When the Internet emerged, KM was about how to deploy the new technology to accomplish those goals.

Stage 2 - Human & Cultural dimensions - the hallmark phrase is communities of practice. KM during this stage was about knowledge creation as well as knowledge sharing and communication.

Stage 3 - Content & Retrievability - consists of structuring content and assigning descriptors (index terms). In content management and taxonomies, KM is about arrangement description, and structure of that content. Interestingly, taxonomies are perceived by the KM community as emanating from natural scientists, when in fact they are the domain of librarians and information scientists. To take this one step further, The Semantic Web is also built on taxonomies and ontologies. Anyone see a trend? Perhaps a convergence?

Monday, October 08, 2007

When is an Apple, an Apple?

In Linked: How Everything Is Connected to Everything Else and What It Means, Albert-Laszlo Barabasi proposes that the ultimate search engine is one that can tap into the input of every person here on Earth. Although none such search engines existed, he argues that Google is the closest we have to a “worldly” search engine because of its PageRank algorithm.

I argue that we can go one step further because with the advent of Web 2.0, social search is actually the closest that we have to gathering input from all of the world’s users. How? Why? Let me explain with an analogy.

It’s not a matter of how, but a matter of when. Web 2.0 is very much like an apple. An apple can be food, a paperweight, a target, or a weapon if needed. It can be whatever you want it to be when you want it to be. The same goes for social searching. It is not search engines.

Del.icio.us is a social bookmarking web service. But it can be a powerful search tool if used properly; essentially, it taps into the social preferences of other users. Same goes for Youtube: it’s a video sharing website, but what’s to say that it can’t be used for searching videos for relevant topics, what’s to say that you can’t search related videos based on videos bookmarked by others? Social search is not based on program; it is mindset, a metaphorical sweet fruit, if you will.

In many ways, social searching is not unlike what librarians did (and still do) in the print-based world where an elegant craft of creativity and perserverence was required to find the right materials and putting them into the hands of the patron; the only difference is that the search has become digital.

Friday, October 05, 2007

Youtube University

UC Berkeley has become the first university to formally offer videos of full course lectures via YouTube. Two hundred clips, representing eight full classes, have been uploaded so far. Here is "SIMS 141 - Search, Google, and Life: Sergey Brin - Google." Enjoy.


Wednesday, October 03, 2007

Of Ontologies + Taxonomies

In 2002 -- two years before Tim O'Reilly's famous coining of the term, "Web 2.0," Katherine Adams of the Los Angeles Public Library had already argued that librarians will be an essential piece to the Semantic Web equation. In The Semantic Web: Differentiating Between Taxonomies and Ontologies, Adams makes a few strong arguments that is strikingly ahead of their time. Long before wikis, blogs, and RSS feeds had come to prominence, (5 years ago!) Adams had the foresight to point out the importance of librarians in reply to Berners-Lee et al's vision. Here are Adams main points, all of which I find fascinating based on pre-Web 2.0 knowledge:

(1) Taxonomies: An Important Part of the Semantic Web - The new Web entails adding an extra layer of infrastructure to the current HTML Web - metadata in the form of vocabularies and the relationships that exist between selected terms will make this possible for machines to understand conceptual relationships as humans do.

(2) Defining Ontologies and Taxonomies - Ontologies and taxonomies are used synonymously -- Computer Scientists refer to hierarchies of structured vocabularies as "ontology" while librarians call them "taxonomy."

(3) Standardized Language and Conceptual Relationships - Both taxonomies and ontologies consist of a structured vocabulary that identifies a single key term to represent a concept that could be described using several words.

(4) Different Points of Emphasis - Computer Science is concerned with how software and associated machines interact with ontologies; librarians are concerned with how patrons retrieve information with the aid of taxonomies. However, they're essential different sides of the same coin.

(5) Topic Maps As New Web Infrastructure - Topic maps will ultimately point the way to the next stage of the Web's development. They represent a new international standard (ISO 13250). In fact, even the OCLC is looking to topic maps in its Dublin Core Initiative to organize the Web by subject.

Monday, October 01, 2007

Web 3.0 Librarian

My colleague Dean Giustini and I have collaborated on an article, The Semantic Web as a large, searchable catalogue: a librarian’s perspective. In it, we argue that librarians will play a prominent role in Web 3.0. The current Web is disjointed and disorganized, and searching is much like looking for a needle in the haystack.

It's not unlike the library before Melvil Dewey introduced the idea of organizing and cataloguing books in a classification system. In many ways, we see the parallels here 130 years later. It's not surprising at all to see the OCLC at the forefront in developing Semantic Web technologies. Many of the same techniques of bibliographic control apply to the possibilities of the Semantic Web. It was the computer scientists and computer engineers who had created Web 1.0 and 2.0, but it will ultimately be individuals from library science and information science who will play a prominent role in the evolution of organizing the messiness into a coherent whole for users. Are we saying that Web 2.0 is irrelevant? Of course not. Web 2.0 is an intermediary stage. Folksonomies, social tagging, wikis, blogs, podcasts, mashups, etc -- all of these things are essential basic building blocks to the Semantic Web.

Thursday, September 27, 2007

Libraries and the Semantic Web

Interestingly, not much has been talked about in terms of librarianship and Semantic Web technologies. It's as if there's a gap that can never be bridged: the rustic gatekeeper of books and high-end cutting edge programmer-speak. Quite recently, Jane Greenberg, professor of Library and Information Science at the University of North Carolina at Chapel Hill, has pointed out in Advancing the Semantic Web via Library Functions that there are many similarities between the library and Semantic Web. Here are some:

(1) Each has developed as a response to an abundance of information

(2) Both have mission statements grounded in service, information access, and knowledge discovery

(3) Both have advanced as a result of international and national standards

(4) Both have grown due to a collaborative spirit

(5) Both have become a part of society's fabric (although not so much yet for the Semantic Web)

Monday, September 24, 2007

Four Ways to Look at the Web

The Semantic Web is far from the monolithic artificial intelligent machine which could seemingly process the whim of a user's thoughts. Cade Metz's Web 3.0: Tomorrow's Web, Today offers an excellent and concise glimpse into the different multitude of possibilities of this new Web. Although still in its hyper-conceptual stages, Metz envisions four directions which Web 3.0 could take:

(1) The Semantic Web - A Web where machines can read sites as easily as humans read them. You ask your machine to check your schedule against the schedules of all the dentists and doctors within a 10-mile radius—and it obeys.

(2) The 3D Web - A Web you can walk through. Without leaving your desk, you can go house hunting across town or take a tour of Europe. Or you can walk through a Second Life–style virtual world, surfing for data and interacting with others in 3D.

(3) The Media-Centric Web - A Web where you can find media using other media—not just keywords. You supply, say, a photo of your favorite painting and your search engines turn up hundreds of similar paintings.

(4) The Pervasive Web - A Web that's everywhere. On your PC. On your cell phone. On your clothes and jewelry. Spread throughout your home and office. Even your bedroom windows are online, checking the weather, so they know when to open and close

Tuesday, September 18, 2007

The Seminal on The Semantic

Before Tim O'Reilly, there was Sir Tim Berners-Lee, who often credited as the creator of the Internet. However, what many do not know is that Berners-Lee also preceded many so-called Web 2.0 experts when he had envisioned the Semantic Web (or as many refer to it synonymously as "Web 3.0"). While O'Reilly came along in 2004 to coin Web 2.0, Berners-Lee had long ago created the conceptual foundations in an article co-produced with James Hendler and Ora Lassila, titled The Semantic Web in Scientific American in 2001. Although librarians and information professionals don't need to know the specifics behind the coding technology behind the Semantic Web (that would be asking too much, for much of it is still in development), it is important to have a good grasp of the concepts and a strong understanding of the history and evolution of the Web. Thus, it is important to know that the Semantic Web will be defined by five concepts:

(1) Expressing Meaning - Bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users. Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.

(2) Knowledge Representation - For Web 3.0 to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning: this is where XML and RDF comes in, but are they only preliminary languages?

(3) Ontologies - But for a program that wants to compare or combine information across two databases, it has to know what two terms are being used to mean the same thing. This means that the program must have a way to discover common meanings for whatever database it encounters. Hence, an ontology has a taxonomy and a set of inference rules.

(4) Agents - The real power of the Semantic Web will be the programs that actually collect Web content from diverse sources, process the information and exchange the results with other programs. Thus, whereas Web 2.0 is about applications, the Semantic Web will be about services.

(5) Evolution of Knowledge - The Semantic Web is not merely a tool for conducting individual tasks; rather, its ultimate goal is to advance the evolution of human knowledge as a whole. Whereas human endeavour is caught between the eternal struggle of small groups acting independently and the need to mesh with the greater community, the Semantic Web is a process of joining together subcultures when a wider common language is needed.

Saturday, September 15, 2007

Web 3.0 & the Sem-antic Web

Ready or not, like it or not, Web 3.0 is around the corner. It's coming - so it's best to understand the technologies. Particularly for librarians, we need to understand the intricate technologies behind what the semantic web will look like, how it runs, and what to expect from its much anticipated (although still hyper-theoretical) features.

Ora Lassila and James Hendler, who co-authored along with Tim Berners-Lee, on the article which predicted what the semantic web would look like in 2001, argues in their most recent article, Embracing "Web 3.0" that the technologies that make it possible for the semantic web is slowly but surely maturing. In particular,

As RDF acceptance has grown, the need has become clear for a standard query language to be for RDF what SQL is for relational data. The SPARQL Protocol and RDF Query Language (SPARQL), now under standardization at the W3C, is designed to be that language.


But that doesn't mean that Web 2.0 technologies are obsolete. Rather, they are only a terminal stage of the evolution to Web 3.0. In particular, it is interesting that the authors note

(1) Folksonomies - tagging provides and organic, community-driven means of creating structure and classification vocabularies.

(2) Microformats
- the use of HTML markup to decode structured data are a step toward "semantic data." Of course, although not in Semantic Web formats, microformatted data is easy to transform into something like RDF or OWL.


As you can see, we're moving along. Take a look at this: on the surface, Yahoo Food looks just like any Web service; underneath, it is made from SPARQL which really does "sparkle."

Monday, September 10, 2007

Six Kinds of (Social) Searching

Librarians need to be aware of social searching. It's important and it's here to stay. What makes social searching so integral for librarians' information retrieval skills is that it requires knowledge of Web 2.0 (mashups, wisdom of crowds, long tail, etc.) It doesn't mean that "traditional" search skills are obsolete. Far from it. Rather, social searching just adds another layer in the librarian's toolkit. Here are some of my favourites.

1. Social Q&A sites - Cha Cha, Live QnA, Yahoo! Answers, Answer Bag, Wondir

2. Shared bookmarks and web pages - Del.icio.us, Shadows, Yahoo's MyWeb, Furl, Diigo, Connotea

3. Collaborative directories - Open Directory Project, Prefound, Zimbio, Wikipedia

4. Taggregators - Technorati, Bloglines, Wikipedia

5. Personalized verticals - PogoFrog, Eurekster, Rollyo

6. Collaborative harvesters - iRazoo, Digg, Flickr, Youtube, Netscape, Reddit, Tailrank, popurls.com