CSV Processing


Overview

Helium provides two mechanisms for processing CSV files. At the heart of both methods, a CSV file is represented in Helium as a blob type and then processed to produce an object instance for each line in the CSV file.



Supported Formats

CSV files must have a single header line.

Currently, only plain CSV files are supported. Native Excel formats etc. are not supported. 

CSV files created or manipulated using Microsoft Excel might not be compatible as is with Helium. This is due to the carriage-return character Excel uses namely, '/r'. To replace all '/r' characters with the standard Unix newlines the tr utility can be used.

The following will replace the carriage-return character and write the result to a new file:

Unix newline
tr '\r' '\n' < helium-export.csv > helium-export.unix.csv

 There is also a case when Excel might add a hidden character to the start of the file known as a Byte-Order Mark. These BOM characters are sometimes used to denote the encoding of the file in order for it to be recognised, however, these are not in common use any more. Apart from Excel and other Microsoft applications, these BOM encodings are not really included in, specifically, CSV files. See this page on how to find and handle these characters: Re: Excel CSV files and importing data into Helium


Be sure to use CSV files that don't contain any unexpected carriage return characters.

General Behaviour

With both methods of CSV processing a line of the CSV file results in a DSL object instance. The following applies in this regard:

  • Column/header names for the CSV file must match the attribute names in the destination object. Case is ignored. A column firstname will, for example, match to attributes named both firstName and firstname.
  • Parsing is type-safe. For example, an object with a birth_date attribute that is of type date must contain values that can be converted to a date value using the acting user’s preferred Locale and Time Zone. If a value cannot be converted for a specific attribute in Helium an exception will be triggered which can then be handled using the exception handling mechanism in the DSL. Alternatively the Helium logging service can be used to inspect the nature of the exception in order to help with debugging.
  • Complex attributes are supported. For example the CSV file may contain a column header clinic.name This will automatically populate the name attribute of an object that is associated with the primary object through a simple relationship with the name clinic. Simple relationships are one-to-one and many-to-one relationships. Other relationships are not supported and will be ignored. The object instance that represents the named relationships will only be created if the value in the column is not null. So if none of the complex attributes accessed through a specific relationships is set to a non-null value, then that related object won’t be created.



Handling of Exceptional Cases

Any parsing errors related to CSV processing will result in the entire transaction involved being rolled back. To avoid this, the Helium exception handling mechanism can be used to appropriately handle such cases. In addition, the Helium logging service can be used to inspect the details of the exception to assist with debugging.

 Empty columns in the CSV file will be stored as null values on the related object attribute.



Standard CSV Processing

The standard CSV processing mechanism involves processing of an entire CSV file at once, resulting in a collection of object instances. This method makes use of the fromCsv built-in function.

For details on this method of CSV processing the following can be consulted:

Standard CSV Processing


The standard method of processing CSV files processes a whole CSV file at once using the fromCsv built-in function.

CSV Batch Processing

In addition to the standard method of CSV processing as mentioned above, Helium also provides a batch processing mechanism where each line in the CSV file is processed separately. The implication of this is that exception handling in Helium can be used to handle scenarios where parsing of a single line in the CSV file fails while still processing the remaining lines from the file.

This method of CSV processing makes use of the Mez:createBatch and fromCsvLine built-in function as well as the MezBatch and MezBatchItem built-in objects.

For details on this method of CSV processing the following can be consulted:

 Batch CSV Processing


Helium CSV batch processing processes a CSV file one line at a time using the Mez:createBatch and fromCsvLine built-in function and the MezBatch and MezBatchItem built-in objects.