전체 페이지뷰

2014년 12월 22일 월요일

Needlebase

Needlebase provides a point-and-click interface for extracting
    structured information from web pages. As a user, you select elements on
    an example page that contain the data you’re interested in, and the tool
    then uses the patterns you’ve defined to pull out information from other
    pages on a site with a similar structure. For example, you might want to
    extract product names and prices from a shopping site. With the tool, you
    could find a single product page, select the product name and price, and
    then the same elements would be pulled for every other page it crawled
    from the site. It relies on the fact that most web pages are generated by
    combining templates with information retrieved from a database, and so
    have a very consistent structure.Once you’ve gathered the data, it offers some features that are a
    bit like Google Refine’s for de-duplicating and cleaning up the data. All
    in all, it’s a very powerful tool for turning web content into structured
    information, with a very approachable interface.

from Big Data GlossaryPete Warden

댓글 없음:

댓글 쓰기