If you ever worked a lot with URLs you know that the first step before manipulating them is being able to correctly parse them into their different components. In PHP, this is accomplished using the parse_url function. But parse_url
contains many bugs and shortcomings like:
- not being able to correctly parse an URL according to RFC3986
- mangling URL components
- having even more bugs and quircks
Theses issues are known to PHP’s internals which is trying to solve them by introducing a new URL parser, hopefully for PHP7.2. In the meantime, if correctly parsing the URL is crucial for you the PHP League is proud to announce the release of the League URI Parser.
The League URI Parser only works in PHP7+ and required the intl extension to correctly parse RFC3987 URL’s. Here’s a simple example of how this parser works.
As you can see, the parser returns an hash similar to parse_url
so switching between them in your application should be straight forward.
Although the returned hash is similar, there are some key differences between this parser and parse_url
. The documentation goes into more in depth comparison between both parser but here’s the main points:
- The parser always returns an array containing all the URL components;
- The parser makes a distinction between an empty component whose value is the empty string and an undefined one whose value is
null
; - In case of an error, the parser will trigger an
InvalidArgumentException
instead of just returningfalse
;
The League URI parser also provide a method to validate any host string. This method is capable of validating a:
- Host as an IP string (IPv4 and IPv6)
- Host as registered name
Regardless of the parser you will end up using for your next application, do keep in mind that parsing and validating an URL are two different actions. For instance, if you go back to the first example provided, the returned URL is invalid against the rules of a data URL scheme, but parsing was correct.
Final note
The League Uri Parser is an open source project with a MIT License so contributions are more than welcome and will be fully credited. These contributions can be anything from reporting an issue, requesting or adding missing features or simply improving or correcting some typo on the documentation website.
That’s the first time I encounter the
__invoke()
magic method (I actually had to dig in the source code to see how$parser('http://foo.com?@bar.com/')
worked)… Could someone explain what benefits this offers compared to$parser->parse('http://foo.com?@bar.com/')
?__invoke
allows to treat an object as a closure. In the case of the Parser class:The only responsibility for this class is parsing a URL. Moreover this class does not implement any interface. To be honest its a big function which is written into a class for better maintenance and testing.
So in usage if it were a function you would have used it like so:
Using the
__invoke
method you do get the same callA side effect of this is that you don’t need to remember the main action method and switching between this usage and
parse_url
call is made simpler IMHO.I know some people do not like the
__invoke
method because it may be unclear to the user that the variable is actually an object when used like sobut I’d argue that you have a bigger problem which is naming correctly your variables.
RFC defines way more relaxed rules for reg-name
` reg-name = *( unreserved / pct-encoded / sub-delims )
sub-delims = “!” / “$” / “&” / “‘” / “(” / “)”
/ “*” / “+” / “,” / “;” / “=”`
so something like `mysql://tcp(localhost)/ ` or `mongodb://host1,host2,host3/` is valid according to RFC
Legue URI parser will throw exception on these, I think this should be addressed somewhere in docs
Hi,
Well the parser is based on RFC3986 not RFC952. This is well explained in the documentation But I agree it should be more transparent. I’ll make the information more visible since it is explained in another package.