Ubuntu Manpage: URI::Normalize - Normalize URIs according to RFC 3986

name
version
synopsis
description
subroutines
caveats
author
copyright and license

Provided by: liburi-normalize-perl_0.002-2_all

NAME

       URI::Normalize - Normalize URIs according to RFC 3986

VERSION

       version 0.002

SYNOPSIS

           use URI;
           use URI::Normalize qw( normalize_uri remove_dot_segments );
           my $uri = URI->new('HTTPS://www.Example.com:443/../test/../foo/index.html');

           say normalize_uri($uri);       #> https://www.example.com/foo/index.html
           say remove_dot_segments($uri); #> HTTPS://www.Example.com:443/foo/index.html

DESCRIPTION

       Section 6 of RFC 3986 describes a process of URI normalization. This implements syntax-
       based normalization and may include some schema-based and protocol-based normalization.
       This includes implementing the "remove_dot_segments" algorithm described in Section 5.2.3
       of the RFC.

       This has a number of useful applications in allowing URIs to be compared with fewer false
       negatives. For example, all of the following URIs will normalize to the same value:

           HTTPS://www.example.com:443/../test/../foo/index.html
           https://WWW.EXAMPLE.COM/./foo/index.html
           https://www.example.com/%66%6f%6f/index.html
           https://www.example.com/foo/index.html

       That is, they will all be normalized into the last value.

SUBROUTINES

   normalize_uri
           $normal_uri = normalize_uri($uri);
           $normal_uri = normalize_uri($str);

       Given a URI object or a string, this routine basically just calls the "canonical" method
       and "remove_dot_segments" on the URI and returns the result as a URI object.

       The original URI is left unchanged.

   remove_dot_segments
           $clean_path_uri = remove_dot_segments($uri);
           $clean_path_uri = remove_dot_segments($str);

       Given a URI object or a string, this routine will remove dot segments (i.e, "."  and "..")
       from the path of the URI.

CAVEATS

       As RFC 3986 notes, normalization is a tool used to help identify whether one URI is
       equivalent to another. This does not, however, imply that the resources identified by two
       URIs that are different byte-for-byte but normalize to the same value will be the same.
       For example, the presence of "/./" in the path might be significant to an implementation,
       or using octet-encoding (e.g., "%3a") instead of the character represented might represent
       a different actual resource. So use normalization judiciously.

       This implementation of normalization is far from comprehensive. There are many
       normalizations you may wish to perform. In that case, you may want to look into
       URL::Normalize, which provides a more comprehensive list of normalizations, some of which
       go against the letter of RFC 3986, but can be valuable in certain applications.

       This implementation does not include the full gamut of what is keeping with the letter of
       RFC 3986, but might be expanded to include additional normalizations in the future.

       If you know of a normalization that could be implemented here: patches welcome.

AUTHOR

       Andrew Sterling Hanenkamp <hanenkamp@cpan.org>

COPYRIGHT AND LICENSE

       This software is copyright (c) 2016 by Qubling Software.

       This is free software; you can redistribute it and/or modify it under the same terms as
       the Perl 5 programming language system itself.