It is common knowledge that PHP’s fputcsv
function does not allow enforcing the enclosure on every field. Using League CSV and PHP stream filter features let me show you how to do so step by step.
1 – Install league csv
if you don’t already have it around, you can install the latest stable version using Composer
.
$ composer require league/csv
2 – Choose a sequence to enforce the presence of the enclosure character
According to PHP fputcsv source code the enclosure is added only under the following circumstances:
- one of the CSV control character is present in the field (delimiter, enclosure or escape character);
- the newline or the tab character are present in the field
\n
,\r
,\t
- the space character is present
In my example I’m going to use the tab character and another exotic character.
3 – Set up you CSV
4 – Enforce the sequence on every CSV field
Using the formatting capabilities of the Writer object you can easily add the sequence to each CSV field
Every time you add a row with the Writer class prior from being added the row will be prepended with the $sequence
sequence.
5 – Create a stream filter
This is were the magic happens. One of the overlooked feature of the package is that it can use stream filters to enhance CSV alteration. A stream filter can modify your CSV content after PHP’s fputcsv
has formatted your row but before the content is actually save to the file.
I’ve created a stream filter class by extending PHP’s php_user_filter
class. You can view the source code via the following gist .
The main code is in the RemoveSequence::filter
method. What this code does is basically removing the added sequence that enforces the enclosure character from the CSV content prior to it being saved to the file but after PHP fputcsv
.
6 – Attach the stream filter to the Writer object
registerStreamFilter
: is a static method to ease registering the class as a possible stream filter;createFilterName
: is a static method to ease generating the stream filtername to attach to the Writer object according to the CSV control characters and the added sequence;
To apply a valid and efficient filter you are required to choose a sequence which is:
- unique;
- does not contain any newline character;
- does not contain any space or null byte character;
- does not contain an already used CSV control character;
This is the reason why I choose the tab character. The extra character is added in case your data already contains a tab character to make your added sequence truly unique. The enforce sequence must be unique and as small as possible too.
7 – Create you CSV
Now that all the pieces are in place let’s create the CSV row
Conclusion
Et voila! You’ve created a CSV with enforced enclosure on everyone of its field. Let’s recap it all:
Footnotes
Of note, In the recap script I’ve added an extra validation step to illustrate another feature of the League\Csv\Writer
object which is to validate a row prior to its insertion according to your own rules. If a cell is invalid a League\Csv\Exception\InvalidRowException
expection is thrown.
As you may have guess it, the formatting and the validation steps are optional. If you already have formatted and validated your data prior to the insertion, you don’t need them.
Conversely, the addition of the sequence before insertion and its subsequent subtraction using a stream filter is required.
Registering the stream filter can be done in the script or in the bootstrap script of your application. You should not register the stream filter prior to each usage, but your are required to attach it to the League\Csv\Writer
object in order to use it.
This technique works even without the league CSV library as long as you are able to manipulate your CSV as a stream on insertion.
In case of the League CSV there are some trade-offs when using this technique:
- Its usage is limited by how
SplFileObject
supports stream filters; League\Csv\Writer
insertion speed is slowed down because of the extra steps you are adding;
Last but not least
The League\Csv is an open source project with a MIT License so contributions are more than welcome and will be fully credited. These contributions can be anything from reporting an issue, requesting or adding missing features or simply improving or correcting some typo on the documentation website.
If you use php 5.4+ change self::CLASS to __CLASS__
—
$sequence = “\t\x1f”;
will be invalidated by
if (preg_match(‘,[\r\n\s],’, $sequence)) {
throw new InvalidArgumentException(‘The sequence contains invalid characters’);
}
I used $sequence = $csv->getDelimiter()
I know this is an old post, so I hope you’re still looking at the comments.
How can you be so confident that your character sequence won’t be split across two buckets?
It seems to me that if the CSV record being written is sufficiently large, the content could be split into multiple buckets. If one bucket ends with “\t” and the next one begins with “\x1f” your string replacement will fail. Is this why you say the prepended character sequence should be “as small as possible”?
To avoid this, maybe you need to accumulate all the output into a buffer and do your string replacement on the complete output, or at least every pair of adjacent buckets (i.e. bucket 1 + bucket 2, bucket 2 + bucket 3, etc.).