Sunday, July 20, 2008

Web 3.0 and Web Parsing

Ever thought how Web 3.0 and the SemWeb can read webpages in an automated, intelligent fashion? Take a look at how Website Parse Template (WPT) works. WPT is an XML based open format which provides HTML structure description of website pages. WPT format allows web crawlers to generate Semantic Web RDFs for web pages.

Website Parse Template consists of three main entities:

1) Ontologies - The content creator defines concepts and relations which are used in on the website.

2) Templates - The creator provides templates for groups of web pages which are similar by their content category and structure. Publisher provides the HTML elements’ XPath or TagIDs and links with website Ontology concepts

URLs - The creator provides URL Patterns which collect the group of web pages linking them to "Parse Template". In the URLs section publisher can separate form URLs the part as a concept and link to website Ontology.

No comments: