CSV and PHP8.4+

Foreword

PHP8.4 is around the corner and while most of the excitements has been toward the new features that will land like asymetric visibility and property hooks, to name but of few there are still the old stuffs that do work and that too benefit from the upgrade with better API, bug fixes and sometimes deprecation of old outdated behaviours.

As the creator and maintainer of league/csv, the CSV library for PHP I try to stay up to date with what’s new in PHP in regards to CSV feature. For instance the package was able to update the end of line character way before the feature landed on PHP itself. But I bet that without looking at the package CHANGELOG and or repository commit, it will be hard for anyone to know when and how the switch in implementation occurred. By knowing what is coming to PHP before it gets release it gives time to the package maintainer to reduce the burden to adapt/change the package usage and sometimes, when in luck, the change can even go unnoticed by the package consumer.

But sometimes we stumble upon an exception. The case when the introduced change has so much repercussion that its handling is much more difficult. CSV processing is changing without changing in PHP8.4+. And unless, your CSV process does not interact at all with PHP’s CSV features, in which case you are free to disregard everything I am going to write about CSV and directly jump to the article conclusions, please bear with me for a while to uncover part of PHP uniqueness.

Where it all begins

First a bit of history/context is needed to understand what will change in PHP8.4+ and its impact in your day to day developer life. in PHP, when dealing with CSV the feature exposes 3 control characters, the delimiter, the enclosure and the obscur escape parameter. The escape parameter is an oddity which only exists in PHP. And while the other languages do not have this singularity, PHP shoot itself in the foot by implementing with an implicit default value. Which means that the value is used every time, by every developer, in every codebase, without their consent or them knowing. So we have a feature used by everyone even though no-one really knows why it exists or how it really works. The ice on the cake, sort of, is that it is a unlimited source of bugs filled against the PHP codebase ( or packages using PHP CSV feature) because of its counter intuitive behaviour in regards to CSV documents. The escape character parameter breaks how CSV are read, written and hinders interoperability with other CSV written or read in other languages.

Let’s kill the escape parameter

For all these reasons an RFC to kill the escape parameter was coined a long time ago but was never fully enforced partly because deprecating and removing the parameter would mean that a CSV document created in PHP would no longer be readable in … PHP. You would leave millions of unreadable documents in applications because of the blind pursuit of rightness. Since removal of the character is deemed critical for backward compatibility the RFC proposed 5 steps to try to cleanly remove the parameter while reducing its damage.

The first step was enacted in PHP7.4 by allowing the empty string as a possible value for the parameter. The empty string act as a escape parameter silencer. When using the empty string the escape parameter mechanism no longer works!! It meant that since PHP7.4 (And in a backward compatibility way since PHP7.0.10 when using league/csv), there is a way to get rid of the escape parameter effect and gain an improved interoperable CSV document. But, this change went for the most part unnoticed by a majority of developers who still uses the default value of the escape parameter when creating/reading a CSV in PHP.

To somehow improve the situation, the second step of the RFC to get rid of the parameter has been voted and agreed upon to be enacted in PHP8.4 via the PHP8.4 deprecation RFC. In a nutshell, every time you will use any CSV related function or method you will be required to specify the escape parameter as being the empty string. Using any other 1 byte long sequence, even the function/method default value will trigger a deprecation warning. Not specifying the parameter will trigger a deprecation notice (updated on 2024-08-27). This means that the following code, taken from the php.net documentation website, will start emitting a deprecation notice in PHP8.4

$row = 1;
if (($handle = fopen("test.csv", "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
        $num = count($data);
        echo "<p> $num fields in line $row: <br /></p>\n";
        $row++;
        for ($c=0; $c < $num; $c++) {
            echo $data[$c] . "<br />\n";
        }
    }
    fclose($handle);
}

To avoid the deprecation notice you MUST explicitly use the empty string as the escape character parameter:

$row = 1;
if (($handle = fopen("test.csv", "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ",", escape: '')) !== FALSE) {
        $num = count($data);
        echo "<p> $num fields in line $row: <br /></p>\n";
        $row++;
        for ($c=0; $c < $num; $c++) {
            echo $data[$c] . "<br />\n";
        }
    }
    fclose($handle);
}

I used the named parameter feature to clearly indicate the change, it is, of course, possible to do so without using named parameters.

While the example is about fgetcsv, The deprecation notices and how to resolve them is not limited to that function alone, all CSV related methods/functions that use the escape parameter must see their usage updated to explicitly set the escape parameter to avoid the deprecation notice.

In the case of league/csv which is a wrapper around PHP CSV functionalities it means that every time you will execute a foreach using methods from the package you will often emit one or more deprecation notices. no deprecation will be trigger as the package always set the escape parameter when calling PHP CSV features. Having say that it is best practice since PHP7.4 to write your code as follow:

$reader = Reader::createFromPath('path/to/document.csv');
$reader->setEscape(''); //set the escape parameter to use the empty string
foreach ($reader as $record) {
     // doing something here
}

$writer = Writer::createFromString();
$writer->setEscape(''); //set the escape parameter to use the empty string
foreach ($collection as $record) {
    $writer->insertOne($record);
}

This simple update will make your CSV document more interoperable and future-proof. This update is possible since version 9.2 if you are using league/csv . But it comes with a caveat. If your CSV was generated with the default escape character it might not longer be correctly parsed with the empty string. Which means that the update must be apply on a case-by-case basis, and, in some circumstances, your will need to re-encode your data as shown in the next example:

$csv = Reader::createFromPath('/path/to/file_with_escape_character.csv', 'r');
$writer = Writer::createFromPath('/path/to/file_without_escape_character.csv', 'w');
$writer->setEscape('');
//we keep the old document other character controls
$writer->setDelimiter($csv->getDelimiter()); 
$writer->setEnclosure($csv->getEnclosure());

$writer->insertAll($csv);

You should not forget to adapt your code according to your application rules and constraints.

A note about PHP deprecation notices

Now let’s pause a bit before continuing. What are PHP deprecation notices and what do they mean ? PHP Deprecation messages or notices are generated by PHP to tell the developer that while the code that generated the notice is currently perfectly working it might no longer work in a future PHP version. It is a notification system that warns PHP developers about future changes in the language to give them time to update their codebase and thus hopefully decrease the impact of backward compatibility breaks between consecutive major versions. This also means that you are still safe to use that piece of code in the current version and your app is definitely not broken.

To update or not to update ?

Now that you know the why, the how and the when, what will league/csv do ? When it comes to coding I always advocate for transparency and readability. I try hard not to hide things to the developer. Since I am also an active developer whenever something is hidden it becomes harder to debug or to fix because of the layer of indirection put it place with half baked workaround or polyfill.

So after a lot of thinking, and after discussing the issue with other fellow developers, I have decided, for now, to do nothing, in regards to the package codebase!! This means that in PHP8.4 the package will not prevent the huge flow of deprecation notices that will be generated by the underlying PHP system.
Instead, I will focus on communicating the behaviour change. I will in the upcoming days and weeks update the league/csv documentation website. I will go over each example and update them to make them PHP8.4+ compatible where it makes sense. I will also update the README.md page of the repository to make sure the information is there for the developer to see. It is in that same spirit that I have written this blog post so that everyone is aware about why your CSV (with or without the use of my package) code suddenly starts emitting deprecation notices.
For the future, when league/csv 10 will be released, it is clear that, its default value for the escape parameter will no longer be the current default value but it will become the empty string. While this won’t make every deprecation notice disappear it may reduce their numbers with a known and clearly explain BC break. and it will help (with deprecation notice) migrating your old broken CSV document containing the escape parameter to a more standardised CSV format understood by more programming languages.

In conclusion

It is important, while somehow annoying to remember that PHP deprecation notices should never hit your production server, they should be present in your local and/or staging environment but definitely not in your production environment. I do understand the developer who will still feel unease with the amount of deprecation warning even on their local environment but once again If you stumble on one try to fix it as soon as possible or at least schedule a moment to do so. While management tends to forget about it but updating your code regularly should be included in your application life cycle (budgets and estimates) to avoid getting too much technical debt.

The main take away of this, even if you do not use my package or stay away of CSV as much as possible is that whenever a new version of PHP is out do remember to read in full the NEWS and UPGRADE files included with the release. They contain a lot of informations that will never appear or be discussed in popular PHP blogs but are as critical to maintain your application up to date and in good shape. Knowing what’s new in version XYZ of PHP. is good but knowing what will change is even better. What is true today in the case of CSV will be true tomorrow for PHP stream, cURL, DOM or any other historical PHP features getting a revamped. Because PHP keeps evolving and because we are professional it is part of our job to always be up to date with what changes in the language. As a PHP package maintainer, I do my best to test in advance my packages and to follow PHP Internals to avoid any possible breakage with each new version of PHP but as user of my work, you, also, have to do your part. Package can not always hide or fix the language shortcomings, it takes two to tango, sometimes the package has already the tools to fix your issue and you just need read its documentation and apply them accordingly.

Last but not least

The league/csv is open source project with a MIT License so contributions are more than welcome and will be fully credited. These contributions can be anything from supporting the development or the package maintenance via sponsorship. Reporting an issue, requesting or adding missing features or simply improving or correcting some typo on the documentation website. Any contribution is welcomed as anyone in the PHP community will benefit from having a strong CSV package.

3 thoughts on “CSV and PHP8.4+

  1. This is amazing and insane 🙂
    This concept as simple as CSV had clear limitations since day 1 (having commas or line returns in a value). Your lib takes care of it. We’d think that would fix it forever, but no, there is a constant need of update.
    I’m baffled by the energy you put on this, for, again, just handling csvs, congratulations.

    I’m also amazed at how we can’t have a few more really universal formats. When exporting/importing a sheet, it seems likes we have only 2 opposed options:
    – dumb but broken simple things: csv, tsv (encoding is a surprise)
    – almost proprietary formats like xls(s), odf… pdf o_O with several possible internal versions and new kind of surprises

    How come we don’t all use a json of arrays of arrays of strings?
    – encoding (\uXXXX), commas, line returns are ok
    – no change of format, ever

    Can’t everyone implement that? Every language got a json_decode() no?

    I guess the reason is this 😉 https://xkcd.com/927/

  2. @Salagir columnar data encodes much more tightly than JSON does, due to having repetition of column names in JSON. As long as we have spreadsheets, we will probably have CSV.

  3. I think it would be beneficial if a new league/csv version is released sooner rather than later with the escape parameter set to an empty string by default – I only found out through this post about the weird escape parameter in PHP and did not know that this behavior is also “inherited” when using league/csv, so getting rid of it would be a step in the right direction (independent of how PHP handles this or how slowly they are deprecating and removing it).

    Making it a new major release would make it clear that something could break and would put the spotlight on possible existing problems with the escape parameter, and people who for some reason need it can still set it to a different value. I have changed my code to set an explicit empty escape parameter for now, but I would prefer to just upgrade to a new version instead.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.