CSV, Json, collection and then some…

league\csv 9.17 has been released and it comes with a feature I always wanted to have since v9 was released more than 5 years ago: a fully fledged JSON converter.

The issue

Currently, if you want to convert a CSV into a JSON document you can easily do this

$document = Reader::createFromPath('/path/to/my/file.csv');
$document->setHeaderOffset(0);

echo json_encode($document, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES | JSON_FORCE_OBJECT), PHP_EOL;

That’s straightforward, the Reader class implements the JsonSerializable interface so you can directly use it with the json_encode function from PHP. The problem with this approach is when your CSV does not contain hundreds but millions of records you will run out of memory and your script will just fail.

The resolution

To resolve this known limitation I’ve added a JsonConverter class to the package. The encoding logic is based on the following observation: a CSV document is just an iterable structure that contains elements (rows) that can be individually converted into a JSON. So instead of converting the CSV document rows all at once, the JSON converter encodes the collection one row at a time calling the jon_encode function for each record individually.

Yes it makes the script slower but you gain in memory usage and the script will never fails. This means that the previous example can be rewritten as follow:

$document = Reader::createFromPath('/path/to/my/file.csv');
$document->setHeaderOffset(0);

$converter = JsonConverter::create()
   ->withPrettyPrint()
    ->withUnescapedSlashes()
    ->withForceObject();

echo $converter->encode($document), PHP_EOL;

Until now nothing new seems to be happening, you saved memory and used a more expressive way to apply json_encode flags but what’s the fuss all about ?

It’s all about stream

The JsonConverter was developed on the shoulders of the package stream capabilities which means when used right it can open doors to very interesting or surprising things that I am going to show you. I may not cover them all but I hope this will give you a sneak peak at what can be done with the converter but also with other part of the library that you never considered before.

So first thing first, not only can the converter encode the CSV into a JSON, it can also, to ease the developer life, directly store the generated JSON into a file.

$document = Reader::createFromPath('/path/to/my/file.csv');
$document->setHeaderOffset(0);

$converter = JsonConverter::create();
$converter->save($document, '/path/to/my/file.json');

By just changing a single line of code we just store the newly generated JSON into a file and the best part of it, is that it is done using the package streaming capabilities. So you can give the JsonConverter::save method a path, a stream resource or an SplFileObject and the data will be stored there.

Imagine your CSV is stored in your AWS s3 and you want your JSON to be stored there too.

$client = new Aws\S3\S3Client([/** options **/]);
$client->registerStreamWrapper();
$context = stream_context_create(['s3' => ['seekable' => true]]); //the Reader class requires the stream to be seekable
$document = Reader::createFromPath('s3://bucket/csv_file_key', $context);
$document->setHeaderOffset(0);

$converter = JsonConverter::create();
$converter->save($document, 's3://bucket/json_file_key');

And voila! Since all is streamed only a fraction of the data is made available on your server, everything is read and stored in your s3 and more importantly the code is readable and maintainable!

But it does not end there, what if you want to download your JSON document well in that case once again, you will only need to change one line of code:

$client = new Aws\S3\S3Client([/** options **/]);
$client->registerStreamWrapper();
$context = stream_context_create(['s3' => ['seekable' => true]]);
$document = Reader::createFromPath('s3://bucket/key', $context);
$document->setHeaderOffset(0);

$converter = JsonConverter::create();
$converter->download($document, 'downloaded_file_name.json');
die;

You just encode your CSV stored in your s3 into a JSON which is created and downloaded on the fly when the script is called via an HTTP client like your browser. Of course, if you are using a framework chances are it has its own download file utilities but I found this addition quite nice and easy to implement.

I do not care about CSV

All that is fine and great but what if I do not use CSV documents, what’s in for me. Well you are in luck because all this could work with any iterable structure as long as each of its individual item can be used by the json_encode function. let’s remove the Reader class and use PDO instead.

$dbh = new PDO('mysql:dbname=testdb;host=127.0.0.1', 'dbuser', 'dbpass');
$sth = $dbh->prepare("SELECT firstname, lastname, email, password FROM users");
$sth->setFetchMode(PDO::FETCH_ASSOC);
$sth->execute();
$converter = JsonConverter::create();
$converter->download($sth, 'file.json');
die;

And there you just created and made available a json version of your table for everyone to see.

Hey but you did not want your users to see the encrypted password… because it is encrypted right! There again the converter exposes the JsonConverter::formatter method which attaches a closure which allow you to format each individual element as you want.

$dbh = new PDO('mysql:dbname=testdb;host=127.0.0.1', 'dbuser', 'dbpass');
$sth = $dbh->prepare("SELECT firstname, lastname, email, password FROM users");
$sth->setFetchMode(PDO::FETCH_ASSOC);
$sth->execute();
$converter = JsonConverter::create();
$converter->formatter(fn (array $row): string => $row['firstname']);
$converter->download($sth, 'file.json');
die;

Your generated JSON will only contain a list of firstname and nothing else.

In conclusion

I hope that with those examples you will understand why I find this addition to the library exciting. Not only does it serve its original purpose for the CSV package but its capabilities extends its usefullness outside of CSV manipulation.

While the converter is not fast I believe the features it offers out-weight the relative loss in speed. Of course, this is something that needs to be evaluated on each scenario; what’s good for some use cases can be bad in a different context. And do not forget that other JSON stream encoder and parser exists in the PHP landscape. But I still feels that this relatively small class delivers a lot of feature worth checking.

Addendum: Following the publication of this blog post I was asked if I could not speed up the converter by using chunks of the iterable structure. Turns out that was the nice suggestion as using chunks provides a speed boost while retaining the low memory usage. So the next minor version of the package will add the possibility to set the chunk size which “fix” the speeding issue. In most case you won’t have to change any line of code as by default the converter will chunk the data for every 500 records.

Last but not least

The league/csv is open source project with a MIT License so contributions are more than welcome and will be fully credited. These contributions can be anything from supporting the development or the package maintenance via sponsorship. Reporting an issue, requesting or adding missing features or simply improving or correcting some typo on the documentation website. Any contribution is welcomed as anyone in the PHP community will benefit from having a strong CSV package.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.