NAME
    Web::PageMeta - get page open-graph / meta data

SYNOPSIS
        use Web::PageMeta;
        my $page = Web::PageMeta->new(url => "https://www.apa.at/");
        say $page->title;
        say $page->image;

    async fetch previews and images:

        use Web::PageMeta;
        my @urls = qw(
            https://www.apa.at/
            http://www.diepresse.at/
            https://metacpan.org/
            https://github.com/
        );
        my @page_views = map { Web::PageMeta->new( url => $_ ) }
                @urls;
        Future->wait_all( map { $_->fetch_image_data_ft, } @page_views )->get;
        foreach my $pv (@page_views) {
            say 'title> '.$pv->title;
            say 'img_size> '.length($pv->image_data);
        }

        # alternativelly instead of Future->wait_all()
        use Future::Utils qw( fmap_void );
        fmap_void(
            sub { return $_[0]->fetch_image_data_ft },
            foreach    => [@page_views],
            concurrent => 3
        )->get;

DESCRIPTION
    Get (not only) open-graph web page meta data. can be used in both normal
    and async code.

    For any other than 200 http status codes during data downloads,
    HTTP::Exception is thrown.

ACCESSORS
  new
    Constructor, only "url" is required.

  url
    HTTP url to fetch data from.

  timeout
    In addition to AnyEvent::HTTP timeout will also check time during
    download as the data are being downloaded and dies when over the limit.
    Default 5 minutes.

  max_size
    Will die when the document or image size is greater than this limit.
    Default 100MB.

  user_agent
    User-Agent header to use for http requests. Default is one from Chrome
    89.0.4389.90.

  extra_headers
    HashRef with extra http request headers.

  cookie_jar
    Accepts optional HTTP::Cookies compatible object that must provide
    "get_cookies()" method. If set will send http cookie headers with each
    request.

  title
    Returns title of the page.

  description
    Returns description of the page.

  canonical_url
    Returns open-graph url. If not present returns "url".

  image
    Returns image location of the page.

  image_data
    Returns image binary data of "image" link.

    Will throw 404 exception if there is not "image" link.

  page_meta
    Returns hash ref with all open-graph data.

  extra_scraper
    Web::Scraper::LibXML object to fetch image, title or description from
    different than default location.

        use Web::Scraper::LibXML;
        use Web::PageMeta;
        my $escraper = scraper {
            process_first '.slider .camera_wrap div', 'image' => '@data-src';
        };
        my $wmeta = Web::PageMeta->new(
            url => 'https://www.meon.eu/',
            extra_scraper => $escraper,
        );

  page_body_hdr
    Returns array ref with page [$body,$headers]. Can be useful for
    post-processing or special/additional data extractions.

    Only "text/html" content-type is accepted for fetching.

  fetch_page_meta_ft
    Returns future object for fetching paga meta data. See "ASYNC USE". On
    done "page_meta" hash is returned.

  fetch_image_data_ft
    Returns future object for fetching image data. See "ASYNC USE" On done
    "image_data" scalar is returned.

  fetch_page_body_hdr_ft
    Returns future object for fetching page content and headers. See "ASYNC
    USE" On done "page_body_hdr" array ref is returned.

ASYNC USE
    To run multiple page meta data or image http requests in parallel or to
    be used in async programs "fetch_page_meta_ft" and fetch_image_data_ft
    returning Future object can be used. See "SYNOPSIS" or t/02_async.t for
    sample use.

SEE ALSO
    <https://ogp.me/>

AUTHOR
    Jozef Kutej, "<jkutej at cpan.org>"

LICENSE AND COPYRIGHT
    Copyright 2021 jkutej@cpan.org

    This program is free software; you can redistribute it and/or modify it
    under the terms of either: the GNU General Public License as published
    by the Free Software Foundation; or the Artistic License.

    See http://dev.perl.org/licenses/ for more information.