Batch CSV Processing

Description

Whereas standard CSV processing in Helium processes an entire blob, representing a CSV file, at once to produce a collection of object instances, the batch CSV processing features takes a multi-step approach:

  • First the Mez:createBatch built-in function is used to create a batch from a blob representing a valid CSV file. 
  • The built-in MezBatch and MezBatchItem objects are used to store the created batch.
  • The created batch can then be processed item by item using the fromCsvLine built-in function.



MezBatch and MezBatchItem Built-in Objects

The following two objects are used to store batches that are created by the use of the Mez:createBatch built-in function. These objects are built-in objects meaning they are included, by Helium, as part of every Helium app:

@NotTracked
persistent object MezBatch {
 
	// User specified description of this batch
    string name;
 
	// String representing the headers for the CSV file associated with this batch
    string header;
}
@NotTracked
persistent object MezBatchItem {
    
	// Flag indicating whether this batch item has been processed with fromCsvLine
	bool processed;
 
	// A string representing an entire line of the CSV file associated with the batch
    string value;
    
	// Relationship to the built-in MezBatch object
    @ManyToOne
    MezBatch batch via batchItems;
}

These objects can be queried like any other object in Helium using for example the native SQL statements provided by the DSL or using selectors.


VERY IMPORTANT

These objects are persistent.
Creating them will persist to the database.

If you expect/do regular csv uploads on your application please take note that these MezBatch and MezBatchItem records will be added to you schema.

It is advisable that you clear these tables often to ensure your schema doesn't grow needlessly large.

Creating a Batch for Processing

To create a batch for processing the Mez:createBatch built-in function can be used:

object FileUpload {
	blob data;
}
MezBatch stockUpdateBatch = Mez:createBatch(fileUpload._id, fileUpload.data);

In this case fileUpload represents an object instance containing a blob attribute named data.

The first parameter of Mez:createBatch represents the name of the batch. This can be any value that can be implicitly converted to a string. Note that there is no validation on uniqueness for this value meaning more than one batch can have the same name. If, however, batch creation and batch processing are done separately it makes sense that the batch should be uniquely identifiable. For this purpose we use the unique id of the fileUpload object instance to ensure uniqueness. 

The second parameter represents the blob value that represents the CSV file to process. blob values can be populated using, amongst other methods, the file upload widget, or the inbound API.

The above command will create entries in the database that looks as follows:

 helium-app-1=# select * from mezbatch;
                 _id_                 |          _tstamp_          |                 name                 |        header         
--------------------------------------+----------------------------+--------------------------------------+-----------------------
 b6a3f674-a47d-4480-810d-ef9b06026b47 | 2018-12-06 15:03:31.313484 | 4731ab52-7b4d-424e-b9f1-072ac47b3e08 | stockName,level,price
  helium-app-1=# select * from mezbatchitem ;
                 _id_                 |          _tstamp_          | processed |               value               |               batch_fk               
--------------------------------------+----------------------------+-----------+-----------------------------------+--------------------------------------
 1e8a4c8b-79cc-490c-a133-122e869722db | 2018-12-06 15:03:31.313484 | f         | Groundnut,440,10                  | b6a3f674-a47d-4480-810d-ef9b06026b47
 2e90b632-1516-4d74-a952-ece3a54afab1 | 2018-12-06 15:03:31.313484 | f         | A-grade grain fertilizer,100,100  | b6a3f674-a47d-4480-810d-ef9b06026b47
 362bab77-efeb-4594-ad7f-5bc9e87f0da6 | 2018-12-06 15:03:31.313484 | f         | Rice,150,50                       | b6a3f674-a47d-4480-810d-ef9b06026b47
 36ec85de-f7fe-4086-b055-2b9762fbe5e5 | 2018-12-06 15:03:31.313484 | f         | Corn,200,100                      | b6a3f674-a47d-4480-810d-ef9b06026b47
 5bc7aa10-5ca0-4c6e-9cb3-6d7f900e6f36 | 2018-12-06 15:03:31.313484 | f         | Wheat,300,50                      | b6a3f674-a47d-4480-810d-ef9b06026b47
 7eea8146-d83c-405a-bfed-9907ac5a22d8 | 2018-12-06 15:03:31.313484 | f         | A-grade legume fertilizer,150,120 | b6a3f674-a47d-4480-810d-ef9b06026b47
 8b0a1efb-7758-4406-bfe2-f7260505a9f3 | 2018-12-06 15:03:31.313484 | f         | Beans,400,60                      | b6a3f674-a47d-4480-810d-ef9b06026b47
 971a9efc-f891-485f-9752-a6488512f92d | 2018-12-06 15:03:31.313484 | f         | Cowpea,100,50                     | b6a3f674-a47d-4480-810d-ef9b06026b47
 ca458b84-a63f-42e5-a662-cd3c8e40f5b5 | 2018-12-06 15:03:31.313484 | f         | B-grade legume fertilizer,300,50  | b6a3f674-a47d-4480-810d-ef9b06026b47
 f138856e-033b-49b1-8f24-194b6ccb5a5b | 2018-12-06 15:03:31.313484 | f         | B-grade grain fertilizer,200,40   | b6a3f674-a47d-4480-810d-ef9b06026b47




Processing an Existing Batch

Once a batch has been created, it can be processed line for line using the fromCsvLine built-in function:

// Get the batch items related to the batch being processed
MezBatchItem[] stockUpdateBatchItems = MezBatchItem:relationshipIn(batch, stockUpdateBatch);
 
// Results from CSV processing will be stored in this collection
StockUpdate[] stockUpdates;
 
// Iterate over the batch items and process one by one using fromCsvLine
foreach(MezBatchItem stockUpdateBatchItem: stockUpdateBatchItems) {
	StockUpdate stockUpdate = StockUpdate:fromCsvLine(stockUpdateBatch.header, stockUpdateBatchItem.value);
	stockUpdates.append(stockUpdate);
 
	// Once a batch item has been successfully processed it can be marked as such
	stockUpdateBatchItem.processed = true;
}

To retrieve the batch items associated with the batch being processed, we make use of the relationship between the two objects.

The first parameter of fromCsvLine represents a string representing the header of the CSV file being processed. This can be retrieved from the MezBatch object instance as shown.

The second parameter of fromCsvLine represents a string representing the current line of the CSV file being processed. This can be retrieved from the MezBatchItem object instance as shown.



Exception Handling and Debugging

Seeing as each line of a CSV file is processed separately we can make use of the Helium exception handling functionality to continue processing lines of the CSV file even if one or more lines fail. Additional follow-up tasks can then be created to manually process or address lines in the CSV file that could not be parsed.

The above example with exception handling included might look something like this:

// Get the batch items related to the batch being processed
MezBatchItem[] stockUpdateBatchItems = MezBatchItem:relationshipIn(batch, stockUpdateBatch);
 
// Results from CSV processing will be stored in this collection
StockUpdate[] stockUpdates;
 
// Iterate over the batch items and process one by one using fromCsvLine
foreach(MezBatchItem stockUpdateBatchItem: stockUpdateBatchItems) {
	try{
		StockUpdate stockUpdate = StockUpdate:fromCsvLine(stockUpdateBatch.header, stockUpdateBatchItem.value);
		stockUpdates.append(stockUpdate);
 
		// Once a batch item has been successfully processed it can be marked as such
		stockUpdateBatchItem.processed = true;
	}
	catch(exception) {
		// Handle the exception by logging it and creating a follow up task
		handleException(stockUpdateBatch, stockUpdateBatchItem, exception.toString);
	}
}

More details on how the use exception handling in the Helium DSL, please see this page.