Tuesday, July 29, 2008


I've written about the potential of Resource Description & Access playing a role in the Semantic Web, and the importance of librarians in this development. Not only that, but Resource Description Framework would be the crux of this new Web. Brett Bonfield, a graduate student in the LIS program at Drexel University, intern at the Lippincott Library at the University of Pennsylvania and an aspiring academic librarian, has pointed out that the WHATWG, "Web Hypertext Application Technology Working Group," is a growing community of people interested in evolving the Web. It focuses primarily on the development of HTML and APIs needed for Web applications -- might have some influence in how things will play out.

The WHATWG was founded by individuals of Apple, the Mozilla Foundation, and Opera Software in 2004, after a W3C workshop. Apple, Mozilla and Opera were becoming increasingly concerned about the W3C’s direction with XHTML, lack of interest in HTML and apparent disregard for the needs of real-world authors. So, in response, these organisations set out with a mission to address these concerns and the Web Hypertext Application Technology Working Group was born.

There was a time when RDF’s adoption would have been a given, when the W3C was seen as nearly infallible. Its standards had imperfections, but their openness, elegance, and ubiquity made it seem as though the Semantic Web was just around the corner. Unfortunately, that future has yet to arrive: we’re still waiting on the next iteration of basic specs like CSS; W3C bureaucracy persuaded the developers of Atom to publish their gorgeous syndication spec with IETF instead of W3C; and, perhaps most alarmingly, the perception that W3C’s HTML Working Group was dysfunctional encouraged Apple, Mozilla, and Opera to team with independent developers in establishing WHATWG to create HTML’s successor spec independently from the W3C. As more non-W3C protocols took on greater prominence, W3C itself seemed to be suffering a Microsoft-like death of a thousand cuts.

This is interesting indeed. As Bonfield reveals, on April 9, WHATWG’s founders proposed to W3C that it build its HTML successor on WHATWG’s draft specification. On May 9, W3C agreed. W3C may never again be the standard bearer it once was, but this is compelling evidence that it is again listening to developers and that developers are responding. The payoff in immediate gratification—the increased likelihood of a new and better HTML spec—is important, but just as important is the possibility of renewed faith in W3C and its flagship project, the Semantic Web. Things are moving along just fine, I think.

Fascinating. There're two roads that lead to the same path. But the question remains. Are we any closer to the SemWeb?

Tuesday, July 22, 2008

Web 3.0 in 600 words

I've just penned an article on Web 3.0 from a librarian's standpoint. In my article, What is Web 3.0? The Next Generation Web: Search Context for Online Information, I lay out what I believe are the essential ingredients of Web 3.0. (Note I don't believe the SemWeb and Web 3.0 are synonymous even though some may believe them to be so - and I explain why). Writing it challenged me tremendously in coming to grips with what exactly constitutes Web 3.0. It forced me to think more concisely and succinctly about the different elements that bring it together.

It's conceptual; therefore, it's murky. And as a result, we overlook the main elements which are already in place. One of the main points I make is, whereas Web 2.0 is about information overload, Web 3.0 will be about regaining control. So, without further adieu, please take a look at this article, and let me know your thoughts. The article should not leave out the excellent help of the legendary librarian, the Google Scholar, Dean. He helped me out a great deal in fleshing out these ideas. Thanks DG.

Sunday, July 20, 2008

Web 3.0 and Web Parsing

Ever thought how Web 3.0 and the SemWeb can read webpages in an automated, intelligent fashion? Take a look at how Website Parse Template (WPT) works. WPT is an XML based open format which provides HTML structure description of website pages. WPT format allows web crawlers to generate Semantic Web RDFs for web pages.

Website Parse Template consists of three main entities:

1) Ontologies - The content creator defines concepts and relations which are used in on the website.

2) Templates - The creator provides templates for groups of web pages which are similar by their content category and structure. Publisher provides the HTML elements’ XPath or TagIDs and links with website Ontology concepts

URLs - The creator provides URL Patterns which collect the group of web pages linking them to "Parse Template". In the URLs section publisher can separate form URLs the part as a concept and link to website Ontology.

Friday, July 18, 2008

Kevin Kelly on Web 3.0

At the Northern California Grantmakers & The William and Flora Hewlett Foundation Present: Web & Where 2.0+ on Feb. 14th, 2008, Kevin Kelly talks about Web 3.0. Have a good weekend everyone. Enjoy.

Thursday, July 17, 2008

EBSCO in a 2.0 World

EBSCOhost 2.0 is here. It's got a brand new look and feel, based on extensive user testing and feedback, and provides users with a powerful, clean and intuitive interface available. This is the first redesign of the EBSCOhost interface since 2002, and its functionality incorporates the latest technological advances.

1) Take a look at EBSCOhost 2.0 Flash demonstration here.

2) It's also got a spiffy marketing web site also features new EBSCOhost 2.0 web pages, where you can learn more about its key features, here. (http://www.ebscohost.com/2.0)

EBSCO has really moved into the 2.0 world: simple, clean, and Googleized. But perhaps that's the way that information services need to go. We simply must keep up. I had gone to a presentation at Seattle SLA '08, and EBSCO gave an excellent presentation (not to mention a lunch) in which it showed the 2.0-features of the new EBSCO interface. In essence, it's customizable for users: you can have it as simple as a search box or as complex as it is currenly. The retrieval aspects have not changed that much. Yet, perception is everything don't you think?

Wednesday, July 09, 2008

Why Be a Librarian?

There seems to be a real fear by some to be called 'librarians.' There's a mysterious aura around what a librarian does. In fact, some have cloaked their librarian status as 'metadata specialist' or 'information specialist' or even 'taxonomist.' Why be a librarian? That's a good question. I like some of the answers offered by Singapore Library Association's Be A Librarian :

As technology allows the storage and uploading of information at ever greater speeds and quantities, people are becoming oerwhelmed by the “information overload”. The information professional is a much needed guide to aid people in their search for knowledge.

The librarian learns to seek, organize and locate information from a wide variety of sources, from print materials such as books and magazines to electronic databases. This knowledge is needed by all industries and fields, allowing librarians flexibility in choosing their working environments and in developing their areas of expertise.

The librarian keeps apace with the latest technological advances in the course of their work. They are web authors, bloggers, active in Second Life. They release podcasts, produce online videos and instant message their users. The librarian rides at the forefront of the technology wave, always looking out for new and better ways to organize and retrieve information
for their users.

At the same time, librarians remember their roots, in traditional print and physical libraries, and continue to acquire and preserve books, journals and other physical media for their current users and for future generations.

Well said. I like it!

Tuesday, July 08, 2008

Expert Searching in a SemWeb World

If we are to move into a Web 3.0 SemWeb-based world, taking a closer look at initiatives such as Expert System makes sense. This company is a provider of semantic software, which discovers, classifies and interprets text information. I like the approach it's taking, by offering a free online seminar to make its pitch. In "Making Search Work for a Living," the webinar shows users how to improve searching. Here's what it is:

As an analysts or knowledge worker you are busy everyday searching for information, often in onerous and time consuming ways. The goal of course is to locate the strategic knuggets of information and insight that answer questions, contribute to reports and inform all levels of management. Yet current search technology proves to be a blunt tool for this task. What you are looking for is trapped in the overwhelming amount of information available to you in an endless parade of formats and forced user interfaces. Immediate access to strategic information is the key to support monitoring, search, analysis and automatic correlation of information.

Join this presentation and roundtable discussion with Expert System on semantic technology that solves this every day, every business problem.

This is a free webinar brought to you by Expert System.
To register send an e-mail to webinar@expertsystem.net

  • You are looking for a semantic indexing, search and analysis innovative tool to manage your strategic internal and external information.
  • You want to overcome the limits of traditional search systems to manage the contents of large quantities of text.
  • You have ever wondered how you can improve the effectiveness of the decision making process in your company.

DATE/TIME: July 10th 2008, 9:00 am PT, 12:00 pm ET USA; 5:00 pm UK.
Duration: 60 Minutes
Focus On: semantics as a leading technology to understand, search, retrieve, and analyze strategic contents.

The webinar will teach how to:

  • Conceptualize search and analysis on multilingual knowledge bases;
  • Investigate the documents in an interactive way through an intuitive web interface;
  • Highlight all the relations, often unexpected, that link the elements across the documents.
  • Monitor specific phenomena constantly and then easily generate and distribute ways for others to understand them.

It's worth a look-see, I think.

Sunday, July 06, 2008

End of Science? End of Theory?

Chris Anderson has done it again, this time with an article about the end of theory. How? In short: raw data. In End of Theory, he believes that with massive data, the millennial-long scientific model of hypothesize, model, test is becoming obsolete, Anderson believes.

Consider physics: Newtonian models were crude approximations of the truth (wrong at the atomic level, but still useful). A hundred years ago, statistically based quantum mechanics offered a better picture — but quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality. The reason physics has drifted into theoretical speculation about n dimensional grand unified models over the past few decades (the "beautiful story" phase of a discipline starved of data) is that we don't know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on.

And according to Anderson, biology is heading in the same direction. What does this say about science and humanity? In February, the National Science Foundation announced the Cluster Exploratory, a program that funds research designed to run on a large-scale distributed computing platform developed by Google and IBM in conjunction with six pilot universities. The cluster will consist of 1,600 processors, several terabytes of memory, and hundreds of terabytes of storage, along with the software, including IBM's Tivoli and open source versions of Google File System and MapReduce.

Anderson's been right before. See Long Tail and Free. But this one's just speculation of course. Perhaps one commentator hit the point when he says, "Yeah, whatever. We still can't get a smart phone with all the bells and whistles to be able to use any where in the world with over 2 hours worth of use and talk time...so get back to me when you've perfected all of that." Well said. Let's wait and see some more.

Tuesday, July 01, 2008

Catalogue 2.0

It's blogs like Web 2.0 Catalog that keep me going. Catalogues have been the crux of librarianship, from the card catalogue to the OPAC. But for libraries, the catalogue has always seemed to be a separate entity. It's as if there is a dichotomy: the Social Web and the catalogue -- from there, the twine shall never meet. What would a dream catalogue look like to me? I have 8 things I’d like to see. Notice that it’s not out of the stretch of imagination. Here they are:

(1) Wikipedia – What better way to get the most updated information for a resource than the collective intelligence of the Web? Can we integrate this into the OPAC records? We should try.

(2) Blog – “Blog-noting” as I call it. To a certain extent, some catalogues already allow users to scribble comments on records. But blog-noting allows users to actually write down reflections of what they think of the resource. The catalogue should be a “conversation” among users.

(3) Amazon.ca - Wouldn’t it be nice to have an idea what a book costs out on the open market? And wouldn’t it make sense to throw in an idea of how much the used cost would be?

(4) Worldcat - Now that you know the price, wouldn’t it be useful to have an idea of what other libraries carry the book?

(5) Google-ability – OPAC resources are often online, but “hidden” in the deep web. If opened up to search engines, it makes it that much accessible.

(6) Social bookmarking – If the record is opened to the Web, then it naturally makes sense to be linked to Delicious, Refshare & Citulike (or similar bibliographic management service).

(7) Cataloguer’s paradise – Technical servicemen and women are often hidden in the pipelines of the library system, their work often unrecognized. These brave men and women should have their profiles right on the catalogue, for everyone to see, to enjoy. Makes for good outreach, too. (Photo is optional).

(8) Application Programming Interface - API's are sets of declarations of the functions (or procedures) that an operating system, library or service provides to support requests made by computer programs. It's like the interoperable sauce which adds taste to web service. It's the crux of Web 2.0, and will be important for the Semantic Web when the Open Web will finally arrive. As a result, API's need to be explored in detail by OPACs, for ways to integrate different programs and provide open data for reuse for others.

Are these ideas out of the realms of possibility? Your thoughts?

Monday, June 23, 2008

Seth Godin at SLA in Seattle

Seth Godin is a best-selling author, entrepreneur and agent of change. He is the author of Permission Marketing, a New York Times best seller that revolutionizes the way corporations approach consumers. Fortune Magazine named it one of their Best Business Books, and Promo magazine called Godin "The Prime Minister of Permission Marketing" and "ultimate entrepreneur for the information age" by Business Week Magazine.

Best known as being an author of books such as Unleashing the Ideavirus, the Purple Cow, and Permission Marketing, Godin’s blog is not only one of the most popular blogs in the world, Godin also helped create a a popular website Squidoo, which is
a network of user-generated lenses --single pages that highlight one person's point of view, recommendations, or expertise . According to Godin, the way marketing works now is not by interrupting large numbers of people; rather, it is through soliciting a small segment of rabid fans who can eagerly spread the word about one's idea. The challenge is how to engage each person to go and bring five friends. What tools do we give them so that they can reach out to colleagues? A website like Zappos is so successful not because it sells shoes, but because it connects consumers to products, and then encourages consumers to spread the word to their friends and colleagues -- and hence, more consumers.

In this new era of permission marketing, spamming no longer works. Services such as PayPal which connect users to products or Sonos, which engage users as customers through recreating data into knowledge, and producing a conversation using the web as its platform are the new models of success. "Be remarkable," Godin argues, and "tell a story to your sneezers" so that they could spread the word and "get permission" from consumers for their attention to the product. Godin concludes with a controversial assertion. "Books are souvenirs," he said, to a hushed audience. Most people find everyday facts and information from digital documents. "When was the last time you got your information from a book?" Although Godin might have made a gross generationalization, his assertion of the divergence between the digital and the physical is a reality. In the Web 2.0 world, our enemy is obscurity, not piracy.

Together, Abram and Godin's sessions at SLA 2008 in Seattle were both rewarding experiences. They ultimately propose that information professionals need to shift their mentality from one of passivity to one of actively promoting themselves, of engaging information services in new ways, and of accepting change with an open mind.

Thursday, June 19, 2008

Stephen Abrams at SLA in Seattle

Day #2 of SLA was full of fascinating discussions. Stephen Abram's session, "Reality 2.0 - Transforming ourselves and our Associations" offered the most thought provoking ideas - definitely the highlight of my experience at this conference.

For those who don't already know, Stephen Abram is President 2008 of SLA and was past-President of the Canadian Library Association. He is Vice President Innovation for SirsiDynix and Chief Strategist for the SirsiDynix Institute.

Here's a flavour of what I thought were key points that really gave me food for thought:

(1) What's wrong with Google and Wikipedia? - It's okay for librarians to refer to Google or Wikipedia. Britannica has 4% error; Wikipedia has 4% error, plus tens of thousands of more entries. It's not wrong to start with Wikipedia & Google, but it is wrong when we stop there.

(2) Don't dread change - This is perhaps the whiniest generation this century. The generation that dealt with two world wars and a depression did fine learning new tools like refrigerators, televisions, radios, and typewriters. And they survived. Why can't we? Is it so hard to learn to use a wiki?

(3) Focus! - We need to focus on the social rather than the technology. Wikis, blogs, and podcasts will come and go. But connecting with users won't. We must not use technology just for the sake of catching up. There has to be a reason to use them.

(4) Don't Be Anonymous - Do we give our taxes to a nameless accountant? Our teeth to a nameless dentist? Heart surgeon who has no title? If these professions don't, then why are information professionals hiding behind their screens. Go online! Use social networking as your tools to reach out to users!

(5) Millennials - This is perhaps the 1st generation in human history that its younger generation teaches its previous generation. However, though there is much to learn from youths about technology, there is also much need to mentor and train for this profession to prosper and flourish.

(6) Change is to come! - Expect the world to be even more connected than it already has. With HDTV, that means more cables are freed up for telecommunications. Google's endgame is to provide wireless accesss through electricity. There're already laser keyboards where you can type on any surface. The world is changing. So must information professionals.

(7) Build paths, not barriers - When there are pathlines created by pedestrians, libraries commonly erect fences to prevent walking. Why not create a path where one exists already so that the library becomes more accessible? Librarians must go to the user, not the other way around. If patrons are using Facebook, then librarians need to use that as a channel for communication.

Stephen's power presentation is here for your viewing pleasure as well.

Tuesday, June 17, 2008

SLA Day #1

Just when one thought that bibliographic control has changed, it might change some more. On Day 1 of SLA in Seattle, I went to a fascinating session given by Jose-Marie Griffiths, called On the Record: Report of the Library of Congress Working Group on the Future of Bibliographic Control offered a fascinating multifaceted glimpse into the current situation of bibliographic control and cataloguing. What is intriguing about this working group is the fact that it comprises both the library world and the private sector. Led by a tri-membership of Google, the American Library Association, and the Library of Congress, the working group created a recommendation document which proposed five general recommendations: 1) increasing efficiency; 2) enhancing access; 3) positioning technology; 4) positioning the community for the future; and 5) strengthening the profession.

What is controversial about the proposal is the suspension of the Resource and Description Access (RDA). Not only does the working group believe that the RDA is too confusing and difficult to implement, it also requires much more testing. The report also proposes for a more continue education in bibliographic control for professionals and students alike. By designing an LIS curriculum and building an evidence base for LIS research can the profession be strengthened for the future.

Although the session had a fairly spare audience, I found this session to be highly engaging and perhaps even ominous for the future of librarianship. Because the Library of Congress accepted the report with support (although unofficially), this could mean a schism in progress of RDA, which is viewed as the successor of the AACR2. Also, the fact that this working group included the non-library world (i.e. Google and Microsoft), the future of bibliographic control won't be limited to librarians. Rather, it will involve input from the private sector, including publishers, search firms, and the corporate world. Is this a good thing? Time will tell. For better or for worse.

Thursday, June 12, 2008

B2B in a World of Controlled Vocabularies and Taxonomies

The e-readiness rankings have been released. And it reveals that the US and Hong Kong are the leaders in this category for e-readiness. How do you measure it? According to the Economic Intelligence Unit, connectivity is a measure of e-readiness. Digital channels in a country which are plentiful, fast, and reliable enough for its people and its organizations to make the most of the Internet is the basic infrastructure and measure for this to happen. But if individuals and businesses do not find the available channels useful in completing transactions, then the number of PC's or mobile phones in a country is a worthless measure.

Hence, the EIU designed its findings by looking at the opportunities that a country provides them to businesses and consumers to complete transactions. Market analysts Forrester estimate that online retail sales in the US grew by 15% in 2006; US $44 billion was spent online in the third quarter and the firm estimates that 2006 online sales in the Christmas holiday season alone reached US 427 billion. Another research firm, IDC, estimates that business-to-business (B2B) transaction volume in the US will reach US $650 billion by 2008, which amounts to two-thirds of the world's US $1 trillion B2B market by that time.

Even though there is concern that the great weight of the US in online activity takes away from the rest of the world, the fact that its online adoption also benefits other countries; China is one beneficiary of the growth of B2B volumes in the US, so much so that there has been the creation of some sizeable and sophisticated B2B transaction service providers, including one of the world's largest online B2B marketplaces, Alibaba.

Over 15 million business and consumer customers in China user Alibaba's online platform. While most do not pay to use basic services, more than 100,000 businesses do. In fact, Yahoo! had bought 40% stake in Alibaba for US $1 billion in 2005). The chinese firm is evolving into a comprehensive supplier of online business development resources for Chinese customers, many of whom would not be doing business online at all if not for Alibaba.

What does this mean for information professionals? A great deal. Look at the financial implications of B2B in the current telecommunications infrastructure. We're essentially running the online and digital economy on the bricks and mortars of outdated networks. We're in a good position to take advantage of the this upcoming economy.

Thursday, June 05, 2008

Talis on Web 2,0, Semantic Web, and Web 3.0

I was honoured to have been interviewed by Richard Wallis of Talis. I was also quite humbled by the whole experience, as I learned just how far I've come in my understanding of the SemWeb and how much more I have to go. We had a good chat about Web 2.0, Semantic Web, and Web 3.0. Have a listen to the podcast. Any comments are welcomed. For those who want a synopsis of what we had discussed, here is my distilled version:

1. Why librarians? - Librarians have an important role to play in the SemWeb. Information organization are traits and skills that librarians have which are relevant to the SemWeb architecture. Cataloguing, classification, indexing, metadata, taxonomies & ontologies -- these are the building blocks of LIS.

2. What will the SemWeb look like? - Think HDTV. I believe the SemWeb will be a seamless transition, one that will be lead by innovators - companies and individuals that will pave the way with the infrastructure for it to happen, yet at the same time will not alienate those who don't want to encode their applications and pages with SemWeb standards. But like HDTV, those who fall behind will realize that they'll need they'll eventually need to convert...

3. Is this important right now? - Not immediately. The SemWeb might have minimal effect on the day-to-day work of librarians, but the same could be said for computer programmers and software engineers. Right now, we are all waiting for that killer application that will drive home the potentials of the SemWeb. So until that transpires, there is much speculation and skepticism.

4. What do librarians need to do? -
Learn XML, join the blogosphere's discussion of the SemWeb, discuss with colleagues, pay attention to RDA, continue questioning the limitations of Web 2.0. Just because we don't see it yet, doesn't mean it should stop us from joining the discourse. Think string theory.

Wednesday, June 04, 2008

Easterlin Paradox of Information Overload

According to Wikipedia, the Easterlin Paradox is a key concept in happiness economics. Theorized by economist Richard Easterlin in the 1974 paper "Does Economic Growth Improve the Human Lot? Some Empirical Evidence," it proposes that, contrary to expectation, economic growth doesn't lead to greater happiness. It quickly caught fire, as Easterlin became famous beyond famous, and the paradox quickly became a social science classic, cited in academic journals and the popular media. As the New York times says, the Easterlin Paradox tapped into a near-spiritual human instinct to believe that money can’t buy happiness. Although there have been attempts to debunk the Easterlin Paradox, I believe the concept applies quite well to Web 2.0 and the information overload it has presented to the current state of the Web.

As one information expert has put it, Web 2.0 is about searching, Web 3.0 will be about finding. Well said. That is exactly the problem about Web 2.0. There are a plethora of excellent free and very useful tools out there - blogs, wikis, RSS feeds, mashups - but at what point does it become too much? Recently, I noticed that my Google Reader has gotten out of hand. I just can't keep up anymore. I skim and I skim and I skim. I'm pulling in a lot of information, but am I really processing it? Am I really happy with the over abundance of rich content of Web 2.0? Not really. Are you?