Allan's Library: December 2007

Tuesday, December 25, 2007

Happy Holidays and Seasons Greetings

Seasons Greetings to all. It is indeed a wonderful holidays as the Google Scholar has published an important piece to the Semantic Web literature. He's done it again, writing an concise and cogent piece on the key elements which differentiates Web 3.0 from Web 2.0. In other news, a reader recently made a comment from a previous entry which I found to be very interesting. Here's what he said:

I (as a librarian) found the article and the whole topic very important. I especially enjoyed the conclusion. You wrote that "Web 3.0 is about bringing the miscellaneous back together meaningfully after it's been fragmented into a billion pieces."I was wondering if in your opinion this means that the semantic web may turn a folksonomy into some kind of structured taxonomy. We all know the advantages and disadvantages of a folksonomy. Is it possible for web 3.0 to minimize those disadvantages and maybe even make good use out of them?

My response? It'll sound cliched and tired: it's really too early to tell. But although it's murky as to what the Semantic Web will look like, all directions point to the possibility that folksonomies will play a key role. Here's why:

(1) Underneath the messiness of the Web, is a fairly organized latent structure, whose backbones are web threads. A scale-free network is significantly dominated by few highly connected hubs.

(2) What this means is that folksonomies and tagging are in fact controlled vocabularies in their own right. Lots have been written about this. Recent studies have shown that the frequency distribution of tags in folksonomies tends to stabilize into power-law distributions. When a substantial number of users tag content for a long period of time, stable tags start appearing in the resulting folksonomy.

(3) Such a use of folksonomies could help overcome some of the inherent difficulties in ontology construction, thus potentially bridging Web 2.0 and the Semantic Web. By using folksonomies' collective categorization scheme as an initial knowledge base for constructing ontologies, the ontology author could then use the tagging distribution's most common tags as concepts, relations, or instances. Folksonomies do not a Semantic Web make -- but it's a good start.

Thursday, December 20, 2007

Information Science As Web 3.0?

In the early and mid-1950’s, scientists, engineers, librarians, and entrepreneurs started working enthusiastically on the problem and solution defined by Vannevar Bush. There were heated debates about the “best” solution, technique, or system. What ultimately ensued became information retrieval (IR), a major subfield of Information Science.

In his article Information Science, Tefko Saracevic makes a bold prediction:

fame awaits the researcher(s) who devises a formal theoretical work, bolstered by experimental evidence, that connects the two largely separated clusters i.e. connecting basic phenomena (information seeking behaviour) in the retrieval world (information retrieval). A best seller awaits the author that produces an integrative text in information science. Information Science will not become a full-fledged discipline until the two ends are connected successfully.

As Saracevic puts it, IR is one of the most widely spread applications of any information system worldwide. So how come Information Science has yet to produce a Nobel Prize winner?

But the World Wide Web changed everything, particularly IR. Because the Web is a mess, everybody is interested in some form of IR as a solution to fit it. A number of academic-based efforts were initiated to develop mechanisms, such as search engines, “intelligent” agents and crawlers. Some of those were IR scaled, and adapted to the problem; others were a variety of extensions of IR.

Out of all this emerged commercial ventures, such as Yahoo!, whose basic objective was to provide search mechanisms for finding something of relevance for users on demand. Not to mention making lots of money. Disconcertingly, the connection of the information science community is tenuous, and almost non-existent – the flow of knowledge is one sided, from IR research results into proprietary search engines . The reverse contribution to public knowledge is zero. A number of evaluations of these search engines have been undertaken simply by comparing some results between them or comparing their retrieval against some benchmarks.

As I've opined before, LIS will play a prominent role in the next stage of the Web. So who's it gonna be?

Tuesday, December 18, 2007

The Semantic Solution - A Browser?

In a recent discussion with colleagues about Web 2.0, we ran into the conundrum of what lies beyond Web 2.0 that would solve some of the limitations that it has. I offered the idea of an automated Web browser - a portal - one that would not be unlike an Internet Explorer browser with which a user could just sign in, and enter his or her password, and then freely surf the the Semantic Web (or whatever parts of it exist). It would be an exciting journey. Dennis Quan and David Karger's How to Make a Semantic Web Browser proposes the following:

Semantic Web browser—an end user application that automatically locates metadata and assembles point-and-click interfaces from a combination of relevant information, ontological specifications, and presentation knowledge, all described in RDF and retrieved dynamically from the Semantic Web. With such a tool, naïve users can begin to discover, explore, and utilize Semantic Web data and services. Because data and services are accessed directly through a standalone client and not through a central point of access . . . . new content and services can be consumed as soon as they become available. In this way we take advantage of an important sociological force that encourages the production of new Semantic Web content by remaining faithful to the decentralized nature of the Web

I like this idea of a portal. To have everyone agree about how to implement W3C standards - RDF, SPARQL, OWL - is unrealistic. Not everyone will accept the extra work for no real sustainable incentive. That is perhaps why there is no current real invested interest by companies and private investors to channel funding to Semantic Web research. However, the Semantic Web portal is one method to combat the malaise. In many ways, it resembles the birth of Web 1.0, before Yahoo!'s remarkable directory and search engines. All we need is one Jim Clark and one Marc Andreeson, I guess.

(Maybe a librarian and an information scientist, or two?)

Friday, December 14, 2007

"Web 3.0" AND OR the "Semantic Web"

Although I have worked in health research centres and medical libraries, I have never worked professionally as a librarian in a health setting. That is why I have great admiration for health librarians such as The Google Scholar, who can multitask, working as a top-notch librarian while at the same time keeping up with cutting edge technology. The Google Scholar recently made a wonderful entry about Web 3.0 and the Semantic Web:

In medicine, there is virtually no discussion about web 3.0 (see this PubMed search for web 3.0 (zero results) and most of the discussion on the semantic web (see this PubMed search - ~100 results) is from the perspective of biology/ bioinformatics.
The dichotomy in the literature is both perplexing and unsurprising. On the one hand, semanticists are looking at a new intelligent web has 'added meaning' to documents, and machine interoperability. On the other, web 3.0 advocates use '3.0' to be trendy, hip or to market themselves or their websites. That said, I prefer the web 3.0 label to the semantic web because it follows web 2.0 and suggests continuity.

I find it perplexing, too, that academics tend to subscribe to the term "Semantic Web" whereas practitioners and technology experts tend to refer to "Web 3.0." For example, the Journal of Cataloging and Classification recently had an entire issue devoted to the Semantic Web - without one mention of the term "Web 3.0."

Although the dichotomy in the literature is apparent, it's interesting that for most of us, we associate Web 3.0 and the Semantic Web together. It's not unlike a decade ago when we used the terms "Internet" and "Web" interchangeably -- even though they are not.

Tim Berners-Lee and the W3C envisioned for the Web to eventually progress to becoming the Semantic Web. Standards such as RDF and DAML+OIL emerged as early as 1998, long before Web 2.0. Web 2.0 is not even mentioned in the W3C because it has no standards. In my opinion, Web 3.0 and the Semantic Web are separate entities. Web 3.0 goes one step further in that it will extend beyond the web browser and will not be limited to just the personal computer.

It is important that medical librarians -- all librarians for that matter -- join in (and even lead) the discourse, particularly since the Semantic Web & Web 3.0 will be based heavily on the principles of knowledge and information organization. Whereas Web 1.0 and 2.0 could not distinguish among Acetaminophen, Paracetamol, and Tylenol -- Web 3.0 will.

Tuesday, December 11, 2007

Google and End of Web 2.0

Google Scholar recently celebrated its third birthday. There were some old friends who showed up at the party (the older brother Google arrived a bit late though) -- but overall, it was a fairly quiet evening atop of Mountain View. So where are we now with Google Scholar? Has the tool lived up to its early hype? What improvements have been made to Scholar in the past year? In a series of fascinating postings, my colleague, The Google Scholar, made some insightful comments, particularly when he argues:

What Google scholar has done is bring scholars and academics onto the web for their work in a way that Google alone did not. This has led to a greater use of social software and the rise of Web 2.0. For all its benefits, Web 2.0 has given us extreme info-glut which, in turn, will make Web 3.0 (and the semantic web) necessary.

I agree. Google Scholar (and Google) are very much Web 2.0 products. As I had elaborated in my previous entry, AJAX (which is Web 2.0-based), produced many remarkable programs such as Gmail and Google Earth.

Was this destiny? Not really. As Yihong Ding proposes, Web 2.0 did not choose Google; rather, it was Google that had decided to follow Web 2.0. If Yahoo had only known about the politics of the Web a little earlier, it might have precluded Google. (But that's for historians to analyze). Yahoo! realized the potential of Web 2.0 too late; it purchased Flickr without really understanding how to fit it into Yahoo!'s Web 1.0 universe.

Back to Dean's point. Google's strength might ultimately lead to its own demise. The PageRank algorithm might have a drawback similar to Yahoo!'s once dominant directory. Just as Yahoo! failed to catch up with the explosion of the Web, Google's PageRank will slowly lose its dominance due to the explosion caused by Web 2.0. With richer semantics, Google might not be willing to drastically alter its algorithm since it is Google's bread-and-butter. So that is why Google and Web 2.0 might be feeling the weight of the future fall too heavily on their shoulders.

Sunday, December 09, 2007

AJAX'ing our way to Web 2.0

Part of my day job entails analyzing technologies and how they better serve users. But one of the things we seem to forget when promoting Web 2.0 is the flaws it brings with it. Because one of the core technologies of Web 2.0 is AJAX, I've been looking around for a good analysis of it. David Best's Web 2.0: Next Big Thing or Next Big Internet Bubble seems to do the job. AJAX is a core component of Web 2.0, as it introduces an engine that runs on the client side - the Web browser. Certain actions can be carried out in the engine and need no data transfer to the server; thus, they are carred out only on the client's computer and is thus quite fast, comparable to desktop applications. In the HTML-world of Web 1.0, a Web page has to completely reload after a user action, such as clicking on links, or entering data in a form.

Gmail, Google Maps, and Flickr are all AJAX (and therefore Web 2.0) applications. Yet, just because it's got the Web 2.0 label does not necessarily mean it is "better." Why? Let's take a look at Gmail and Flickr, and see the advantages and disadvantages of their reliance on AJAX-technology:

(1) Rich User Experience - Fasst! Response to user actions are quick and the Web applications behave like desktop applications (e.g. drag and drop).

(2) Javascript - AJAX is made up of JavaScript. Unlike Web 1.0 applications, JavaScript excludes ten percent of all Web users, an issue the W3C is concerned about. Without going into the technology, JavaScript bars many users from AJAX use (such as Active X - a known security problem in Internet Explorer)

(3) The Back Button - Because Web browsers usually keep a history of whole Web pages in Web 1.0, many are often surprised that Gmail does not allow this as it is an AJAX application, for single actions are not cacheable for the browser.

(4) Bookmarking - Web 2.0 is based on rich user experience; unfortunately, this means that as with many dynamically generated pages, bookmarking or linking to a certain state of such a page is nearly impossible, as those states are not uniquely identifiable by URL. (Try bookmarking on Flickr!)

Thursday, December 06, 2007

Are You Ready For Library 3.0?

Are you ready for Library 2.0? We might just be too late because Library 3.0 is just around the corner according to some observers. How can libraries learn from the other service industries, how will librarians keep up with subject specific skills (evidence-based medicine, law, problem-based learning? Are librarian skills out of alignment with these trends? As Saw and Todd point out in Library 3.0: Where Are Our Skills, the future of academic libraries will be a digital one, where the successful librarian will be flexible, adaptable, and multi-skilled in order to survive in an environment of constant and rapid change. Drivers for change will require this new generation of librarians to navigate not only new technologies as well as understanding their users’ behaviour, but ultimately themselves (Generation X and Y’s). So what are some attributes of Librarian 3.0?

(1) Institutionalization – Creating the right culture. Flexible hours and attractive salaries, without micromanagement while encouraging working in teams and individual praise and recognition for their accomplishment. The key to retaining these employees is the quality of relationships they have with their managers - Gen X and Y's see their work demand a better balance in their work and personal lives.

(2) Innovation – Doing things differently – Innovative services will mean taking-the-service to the clients. An example would be “Librarian With a Latte” program from the University of Michigan at Ann Arbor.

(3) Imagination – Changing the rules. Collaboration with a wide range of information providers, where rethinking of the catalogue means it is no longer relevant in its current form – the catalogue should be a “one-stop shop” for searching resources, providing access beyond local collections, and to different types of resources in a seamless way

(4) Ideation – A Culture that encourages ideas – In creating the appropriate working environment, it is necessary to be also supported by professional associations.

(5) Inspiration – Doing things differently – As competition increases for the future workforce, ongoing professional development as opposed to formal training in a library school is necessary. Already free web-based instruction similar to the popular Five Weeks to a Social Library are already popping up.

So what does this all mean? It might sound like an eye-rolling cliche: information professionals of the future will have to be prepared for lifelong learning. This is a challenge for many professionals, who argue that their plates are already full to the brim. What to do? The authors leave us with a daunting reference from Charles Darwin:

It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change

Tuesday, December 04, 2007

I See No Forests But the Trees . . .

"So where is it?" is the question that most information professionals and scholars say when they approach the topic of the Semantic Web. Everyone's favourite Computer Scientist, Yihong Ding's Web Evolution Theory and The Next Stage: Part 2 makes an interesting observation, one which I agree wholeheartedly:

The transition from Web 1.0 to Web 2.0 is not supervised. W3C had not launched a special group for a plot of Web 2.0; and neither did Tim O'Reilly though he was one of the most insightful observers who caught and named this transition and one of the most anxious advocates of Web 2.0. In comparison, W3C did have launched a special group about Semantic Web that was engaged by hundreds of brilliant web researchers all over the world. The progress of WWW in the past several years, however, shows that the one lack of supervision (Web 2.0) advanced faster than the one with lots of supervision (Semantic Web). This phenomenon suggests the existence of web evolution laws that is objective to individual willingness.

Even Tim O'Reilly pointed out that Web 2.0 largely came out of a conference when exhausted software engineers and computer programmers from the dot.com disaster saw common trends happening on the Web. Nothing is scripted in Web 2.0. Perhaps that's why there can never be a definitive agreement on what it constitutes. As I give instructional sessions and presentations of Web 2.0 tools, sometimes I wonder, how wikis, blogs, social bookmarking, and RSS feeds will look like two years from now. Will they be relevant? Or will they transmute into something entirely different? Or will we continue on as status quo?

Is Web 2.0 merely an interim to the next planned stage of the Web? Are we seeing trees, but missing the forest?

Pages