Modernizing league\csv API

Once upon a time

For those who do not know, the league/csv package has been around for more than 10 years now. For good or for worse the package is the most downloaded package on packagist if you need to handle and process CSV in PHP. Having said that, the last major release of the packge happened during the PHP5 era. If I recall I only choose to drop support for PHP5 in the last weeks prior to v9.0.0 release. So in all fairness its current API is a good representation of what PHP, its ecosystem and most importantly my knowledge was back then.

Fast forward to today where:

  • PHP5 and 7 have went the way of the dodo;
  • developers have new expectations when it comes to packages API;
  • new concepts and patterns have emerged in the community;

To improve DX and more importantly to try to take advantages of this new environment I decided a couple years back to reduce the number of PHP versions supported by the package. A immediate consequence of this decision was a significant drop in maintenance cost of the package but also it opened the possibility of adding new syntactic features which do not break the code but makes for a largely improved DX.

I could list a lot of improvements and details the why and how they were made but I hope you will find all the answers to these questions while browsing the package documentation website. What I will do instead is show how the new API works via a code snippet.

Then and now

So let’s imagine we have a library.csv document which contains records of books and we have a DTO called Book use to represent a specific record. We first load the CSV file using the code below

<?php

use League\Csv\Reader;

$library = Reader::createFromPath(__DIR__ . '/library.csv');
$library->setHeaderOffset(0);

Once loaded we want to acces a single book with some specific constraints.

With league/csv 9.0.0. we would end up with a code as follow:

<?php

use League\Csv\Statement;

function record2Book(array $record): Book {
    return new Book(
        new Asin($record['asin']),
        $record['author'],
        $record['title'],
        explode(' ', $record['tags']),
        (int) $record['rating'],
    );
}

$bookRecords = (new Statement())
    ->where(fn (array $row): bool => $row['author'] === 'Dan Brown')
    ->where(fn (array $row): bool => $row['rating'] === '5')
    ->orderBy(fn (array $r1, array $r2): int => strcmp($r1['title'], $r2['title']))
    ->process($library)
    ->fetchOne();

$book = record2Book($bookRecords);

The code is self explanatory and does well what it is supposed to do. But such a code in nowadays can be seen as being a bit clunky. I have no personal issue with it but current developers prefer query builders that look and/or act like the one you would find when using Doctrine or Laravel.

So, since version 9.9.0, I have been slowly but steadily upgrading the API and now with the release of version 9.16.0 you can rewrite the snippet as follow:

 <?php
 
use League\Csv\Statement;
 
$book = Statement::create()
    ->andWhere('author', '=', 'Dan Brown')
    ->andWhere('rating', '=', '5')
    ->orderByAsc('title', strcmp(...))
    ->process($library)
    ->firstAsObject(Book::class);

As you might imagine behind the scene a lot of changes have been brought to the package in order to do that. The snippet uses denormalization (converting an array into an object) by leveraging PHP’s Reflection feature. On the other hand, constraints, and ordering are applied on an Iterator using PHP’s filter and sorting capabilities. All have been added without any breaking changes. And as you can imagine, it is possible to mix both notations if you feel like it.

Decoupling features

It might not be visible but all these new features are usable outside of CSV process. For instance, let’s try multi sorting an array. In PHP, you would use array_multisort but the function signature and usage can somehow be counter intuitive. With the package ordering feature it is possible to reproduce some of the function behaviour in a more readable and maintainable fashion.

The PHP’s documentation website gives use the following example when using  array_multisort:

<?php
$data = [
    ['volume' => 67, 'edition' => 2],
    ['volume' => 86, 'edition' => 1],
    ['volume' => 85, 'edition' => 6],
    ['volume' => 98, 'edition' => 2],
    ['volume' => 86, 'edition' => 6],
    ['volume' => 67, 'edition' => 7],
];

$volume = array_column($data1, 'volume');
$edition = array_column($data1, 'edition');
array_multisort($volume, SORT_DESC, $edition, SORT_ASC, $data);
//$data is sorted by reference

you need to use array_column and pass the array in a complex way. Now let’s reproduce the same code using league csv ordering features:

<?php

$ordering = Ordering\MultiSort::all(
    Ordering\Column::sortOn('volume', SORT_DESC),
    Ordering\Column::sortOn('edition', SORT_ASC),
);
// you can do this
usort($data, $ordering);
//the result is identical as using array_multisort

// or you can do this
$orderedIterator = $ordering->sort($data); 
//$orderedIterator is an ordered Iterator
// key association is maintained
//$data remains unchanged

The result can be identical but most importantly the DX is easier to grasp and we use less iterations in order to get the result.

What’s next

With changes will inevitably come bugs and other documentation issues. So if you are using the package or considering using the package please do so and let me know if the new features do what they are expected to do. And do remember that the CSV name should not fool you the package does more than handling CSV it can handle any tabular data be it a simple RDBMS table or a simple collection, nothing but you imagination and your business requirement can stop you from using the package feature outside of simply processing CSV documents.

Last but not least

The league/csv is open source project with a MIT License so contributions are more than welcome and will be fully credited. These contributions can be anything from supporting the development or the package maintenance via sponsorship. Reporting an issue, requesting or adding missing features or simply improving or correcting some typo on the documentation website. Any contribution is welcomed as anyone in the PHP community will benefit from having a strong CSV package.