Q&A: Enforcing enclosure with League\Csv

It is common knowledge that PHP’s fputcsv function does not allow enforcing the enclosure on every field.  Using  League CSV and PHP stream filter features  let me show you how to do so step by step.

1 – Install league csv

if you don’t already have it around, you can install the latest stable version using Composer.

$ composer require league/csv

2 – Choose a sequence to enforce the presence of the enclosure character

According to PHP fputcsv source code the enclosure is added only under the following circumstances:

  • one of the CSV control character is present in the field (delimiter, enclosure or escape character);
  • the newline or the tab character are present in the field \n, \r, \t
  • the space character is present

In my example I’m going to use the tab character and another exotic character.

3 – Set up you CSV

4 – Enforce the sequence on every CSV field

Using the formatting capabilities of the Writer object you can easily add the sequence to each CSV field

Every time you add a row with the Writer class prior from being added the row will be prepended with the $sequence sequence.

5 – Create a stream filter

This is were the magic happens. One of the overlooked feature of the package is that it can use stream filters to enhance CSV alteration. A stream filter can modify your CSV content after PHP’s fputcsv has formatted your row but before the content is actually save to the file.

I’ve created a stream filter class by extending PHP’s php_user_filter class. You can view the source code via the following gist .

The main code is in the RemoveSequence::filter method. What this code does is basically removing the added sequence that enforces the enclosure character from the CSV content prior to it being saved to the file but after PHP fputcsv.

6 – Attach the stream filter to the Writer object

  • registerStreamFilter: is a static method to ease registering the class as a possible stream filter;
  • createFilterName: is a static method to ease generating the stream filtername to attach to the Writer object according to the CSV control characters and the added sequence;

To apply a valid and efficient filter you are required to choose a sequence which is:

  • unique;
  • does not contain any newline character;
  • does not contain any space or null byte character;
  • does not contain an already used CSV control character;

This is the reason why I choose the tab character. The extra character is added in case your data already contains a tab character to make your added sequence truly unique. The enforce sequence must be unique and as small as possible too.

7 – Create you CSV

Now that all the pieces are in place let’s create the CSV row

Conclusion

Et voila! You’ve created a CSV with enforced enclosure on everyone of its field. Let’s recap it all:

Footnotes

Of note, In the recap script I’ve added an extra validation step to illustrate another feature of the League\Csv\Writer object which is to validate a row prior to its insertion according to your own rules. If a cell is invalid a League\Csv\Exception\InvalidRowException expection is thrown.

As you may have guess it, the formatting and the validation steps are optional. If you already have formatted and validated your data prior to the insertion, you don’t need them.
Conversely,  the addition of the sequence before insertion and its subsequent subtraction using a stream filter is required.

Registering the stream filter can be done in the script or in the bootstrap script of your application. You should not register the stream filter prior to each usage, but your are required to attach it to the League\Csv\Writer object in order to use it.

This technique works even without the league CSV library as long as you are able to manipulate your CSV as a stream on insertion.

In case of the League CSV there are some trade-offs when using this technique:

  • Its usage is limited by how SplFileObject supports stream filters;
  • League\Csv\Writer insertion speed is slowed down because of  the extra steps you are adding;

Last but not least

The League\Csv is an open source project with a MIT License so contributions are more than welcome and will be fully credited.  These contributions can be anything from reporting an issue, requesting or adding missing features or simply improving or correcting some typo on the documentation website.

2 thoughts on “Q&A: Enforcing enclosure with League\Csv

  1. If you use php 5.4+ change self::CLASS to __CLASS__

    $sequence = “\t\x1f”;
    will be invalidated by
    if (preg_match(‘,[\r\n\s],’, $sequence)) {
    throw new InvalidArgumentException(‘The sequence contains invalid characters’);
    }

    I used $sequence = $csv->getDelimiter()

  2. I know this is an old post, so I hope you’re still looking at the comments.

    How can you be so confident that your character sequence won’t be split across two buckets?

    It seems to me that if the CSV record being written is sufficiently large, the content could be split into multiple buckets. If one bucket ends with “\t” and the next one begins with “\x1f” your string replacement will fail. Is this why you say the prepended character sequence should be “as small as possible”?

    To avoid this, maybe you need to accumulate all the output into a buffer and do your string replacement on the complete output, or at least every pair of adjacent buckets (i.e. bucket 1 + bucket 2, bucket 2 + bucket 3, etc.).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.