League Uri parser

Attention: Les informations de ce billet sont susceptibles d'être obsolètes car vieux de plus 2 ans.

Warning: The information you are reading may be obsolete, this post was published more than 2 years ago.

If you ever worked a lot with URLs you know that the first step before manipulating them is being able to correctly parse them into their different components. In PHP, this is accomplished using the parse_url function. But parse_url contains many bugs and shortcomings like:

Theses issues are known to PHP’s internals which is trying to solve them by introducing a new URL parser, hopefully for PHP7.2. In the meantime, if correctly parsing the URL is crucial for you the PHP League is proud to announce the release of the League URI Parser.

The League URI Parser only works in PHP7+ and required the intl extension to correctly parse RFC3987 URL’s. Here’s a simple example of how this parser works.

As you can see, the parser returns an hash similar to parse_url so switching between them in your application should be straight forward.

Although the returned hash is similar, there are some key differences between this parser and parse_url. The documentation goes into more in depth comparison between both parser but here’s the main points:

  • The parser always returns an array containing all the URL components;
  • The parser makes a distinction between an empty component whose value is the empty string and an undefined one whose value is null;
  • In case of an error, the parser will trigger an InvalidArgumentException instead of just returning false;

The League URI parser also provide a method to validate any host string. This method is capable of validating a:

  • Host as an IP string (IPv4 and IPv6)
  • Host as registered name

Regardless of the parser you will end up using for your next application, do keep in mind that parsing and validating an URL are two different actions. For instance, if you go back to the first example provided, the returned URL is invalid against the rules of a data URL scheme, but parsing was correct.

Final note

The League Uri Parser is an open source project with a MIT License so contributions are more than welcome and will be fully credited. These contributions can be anything from reporting an issue, requesting or adding missing features or simply improving or correcting some typo on the documentation website.

4 thoughts on “League Uri parser

  1. That’s the first time I encounter the __invoke() magic method (I actually had to dig in the source code to see how $parser('http://foo.com?@bar.com/') worked)… Could someone explain what benefits this offers compared to $parser->parse('http://foo.com?@bar.com/') ?

    • __invoke allows to treat an object as a closure. In the case of the Parser class:

      • this class does only one thing which is parsing an URI and nothing else.
      • It does not expose any setter or getter.

      The only responsibility for this class is parsing a URL. Moreover this class does not implement any interface. To be honest its a big function which is written into a class for better maintenance and testing.

      So in usage if it were a function you would have used it like so:

      $components = League\Uri\parse($my_uri);

      Using the __invoke method you do get the same call

      $components = (new League\Uri\Parser())($my_uri);

      A side effect of this is that you don’t need to remember the main action method and switching between this usage and parse_url call is made simpler IMHO.

      $components = (new League\Uri\Parser())($my_uri);
      $components = parse_url($my_uri);

      I know some people do not like the __invoke method because it may be unclear to the user that the variable is actually an object when used like so

      $operation_on_uri = new League\Uri\Parser();

      but I’d argue that you have a bigger problem which is naming correctly your variables.

  2. RFC defines way more relaxed rules for reg-name

    ` reg-name = *( unreserved / pct-encoded / sub-delims )
    sub-delims = “!” / “$” / “&” / “‘” / “(” / “)”
    / “*” / “+” / “,” / “;” / “=”`

    so something like `mysql://tcp(localhost)/ ` or `mongodb://host1,host2,host3/` is valid according to RFC

    Legue URI parser will throw exception on these, I think this should be addressed somewhere in docs

    • Hi,

      Well the parser is based on RFC3986 not RFC952. This is well explained in the documentation But I agree it should be more transparent. I’ll make the information more visible since it is explained in another package.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.