Developing Your Own CSV Uploader – Some Considerations
OK, so you’re building an application and you want to start pumping real user data through it. What do you do? Well, if you decide to develop your own CSV uploader, be sure to take these items into consideration.
You Probably Do Actually Need a CSV Uploader
First let’s start with the basics. Why do you need a comma-separated values (CSV) uploader?
Eventually, every system is going to need data. Data is what makes the world tick these days. You’re going to want data from other systems, 3rd parties, or your users. And it is likely that in each of those sources, CSV’s, and to a lesser extent other delimited files, are the de facto data exchange standard.
Why is that? While CSV has it quirks and issues, it is a lowest-common-denominator format for tabular data. It is easily readable and editable by a human with a text editor, and yet it is structured enough to be read by old and new programs alike. It is ubiquitous. Not to mention that it is easier to parse than something like an XLSX (Excel) file, so many systems just build CSV import/export functionality and call it good.
As you build your software application, if you need to collect tabular data from users, I can almost guarantee you that somebody will ask for or you will consider developing your own CSV uploader (or they might call it an Excel uploader). And while you could build a fancy form for users to start entering data, or you could integrate with another system via API where all the data naturally lives (both of these are probably on your roadmap already, I’d bet), the fact is that CSV uploads are just so convenient for data interchange and often are seen as a “step one” for getting real, usable data into an application that’s being built in Agile fashion.
Can I build my own CSV Uploader?
Sure. Definitely. People do it all the time. I’ve built many over my career for various clients.
The truth of it is that it’s doable. But it is harder than you think.
Have you ever tried making almond milk? You can do it yourself. You just soak some almonds in water for days, blend, and strain. Seems simple, and costs only $0.27/ounce. But you can buy almond milk from the store for $0.05/ounce, and you don’t have to wait for multiple days. To me, that’s a no brainer. And yet, there are people who make their own almond milk.
I have a friend (an experienced, effective developer) who thought he could build CSV upload functionality in one day. He estimated his story and went off to the races. Two weeks later he was still working on it, and he and his team were fixing edge cases for literally months.
Similarly, I was consulting for a large company (>$11B in revenues), helping to architect and build a mobile/web platform for use by their field agents. The CSV upload portion of the platform took 2-3 developers two months to build, and similar to my friend’s experience, edge cases and bugs were still being fixed for months afterward. And these were competent developers.
Why is Developing Your Own CSV Uploader So Hard?
There are three main reasons why building your own CSV uploader is difficult:
- Users are often the people providing the CSVs.
- There is no unambiguous standard of what a CSV actually is.
- Non-functional requirements are often an afterthought.
1. Users are often the people providing the CSVs
This may seem obvious once you hear it, but many users have expertise in areas other than data interchange formats. Even when I helped to build a CSV upload functionality primarily directed at accountants, who live and breath Excel, we still had problems with users naming their columns slightly different than what the system expected. And don’t get me started on the hours of time we spent tracking down an extra space at the end of one of the header names. Users can hide rows or columns in Excel and then upload. Or users will manually add content to a CSV file and getting a comma in the wrong spot. Or they’ll upload a PDF instead. Or they’ll leave a whole column blank because “everyone in the department knows” what the values should be in that field.
Long story short is that building a simple UX that offers guard-rails to a user and prevents or helps to correct these types of issues is difficult, especially while getting out of the way of their efficiency. Validation, mapping, and error handling is paramount but can add lots of time and testing onto what is supposed to be a simple task.
2. There is no unambiguous standard of what a CSV actually is
The general idea of having comma-separated values seems simple, but in actual implementation it is riddled with questions. In fact, there was a standard created, called RFC 4180, that attempted to remove some of the ambiguities (but wasn’t successful in removing all of them). Plus, there are many, many variations on CSVs that are floating around in the wild. Check out some CSV file quirks to see how complicated handling this data-interchange format can get and what every robust data uploader will need to consider!
3. Non-functional requirements are often an afterthought
Non-functional requirements such as audibility, performance, scalability, usability, reliability, availability, recoverability, etc are often the last items to be considered or estimated but are crucial in developing your own CSV uploader. CSV uploading is sometimes thought of as a “temporary” solution to get data into your app. I’m here to tell you that your temporary CSV import functionality will most likely stick around and be used throughout the lifetime of the product.
Once users have experienced the flexibility of CSV import/export they hardly want to go back. Even with direct system integrations there is often a need for “override” or “adjustment” data coming directly from users.
One day a user will try uploading a 50 MB, 500 MB, or even 5GB file. How will your system perform when tested to its limits?
You Don’t Need to Build Your Own
I’ve seen many CSV uploaders built, and all of them were underestimated. So if you do decide to start developing your own CSV uploader, please don’t underestimate it. We’ll have some helpful articles to navigate some of the treacherous and tortuous terrain.
Instead of developing your own importer, consider using a pre-built hosted solution like Delimited. With Delimited and just five minutes, you can have all of the above considerations handled for you.
Delimited provides the user with a guard-rails, structured, pleasant, resilient experience for uploading data. It absolves developers from having to worry about edge cases and ambiguous standards. It uses secure, scalable architecture to handle large files. And it’s flexible enough to support multiple destination systems and styles.
Give Delimited a try and then decide if you still want to build your own CSV importer.