Looping over input files


TM1 can load data from a number of data sources, including plain text files. I regularly get the question how we can pick up several files at once. For example, load timestamped files where the filename contains the information on the file contents. As in: "Actual_2018_01_C913.txt" where C913 is a company. As this has been done before I will focus on an approach that is a kind of a variant. We use the same looping concept below, so please feel free to inspect the code and use it for your projects.

The variant that I want to show, is how we can pick up the most recent file in a directory, based on a timestamp in the filename. The picture will show what I mean:

The timestamp is the important part. I am not (necessarily) interested in the 'Date modified' of each file. In reality these can match but it's not a guarantee. I am interested to find the file with the latest timestamp. Next to that, we will see interesting ways on how to compare the timestamps. In doing this, we will not use Java extensions or DOS commands - knowing that Turbo Integrator does not have a lot of capabilities to access the file system.


In the below piece of code, let's:

  1. define the folder over which we loop
  2. use the function WildcardFileSearch to pick the files based on a certain pattern
  3. retrieve the time that we can deduct from the timestamps in the filenames
  4. determine which is the most recent file, by comparing the times
  5. store the end result in a text file, but of course you can use this filename in a data load process

# Wim Gielis # https://www.wimgielis.com
##### # Pick up the most recent file based on timestamps # 17/03/2018 #####
vLocation = 'D:\Exported files\'; # i = NewDateFormatter('nl_BE', 'Etc/UTC', 'serial', 'long', 'datetime'); i = NewDateFormatter( 'nl_BE' ); sLastObject = ''; nTimeOfLastFile = 0; sLastFile = ''; While( 1 > 0 ); sFile = WildcardFileSearch( vLocation | 'Actual_???????? ??????.txt', sLastObject ); If( sFile @<> '' ); sTimestamp = Subst( sFile, 8, 15 ); # YYYYMMDD HHMMSS nTimestamp = ParseDate( sTimestamp, 'yyyyMMdd HHmmss', i ) + 21916; If( nTimestamp > nTimeOfLastFile ); nTimeOfLastFile = nTimestamp; sLastFile = vLocation | sFile; EndIf; sLastObject = sFile; Else; Break; EndIf; End; If( nTimeOfLastFile > 0 ); AsciiOutput( 'test.txt', 'The most recent file:', sLastFile, FormatDate( nTimeOfLastFile - 21916, 'dd/MM/yyyy ''at'' HH:mm:ss', i )); Else; AsciiOutput( 'test.txt', 'No matching files found.' ); EndIf;

The end result is a file like this (for now) but you can use it as input to your data load processes. Just use DataSourceNameForServer to set the data source dynamically in the Prolog tab of your process.

What I like the most in this code, are the date/times functions like NewDateFormatter, ParseDate and FormatDate. Indeed, they do the heavy lifting for us ! Supply a good time pattern and watch this thing work for you instead of messing around with Subst, Scan and what else.

Look here for time formats but you will be better off if you digest my earlier classic article on the subject.

The Break function and the way I implemented the loop with 1 and 0, is useful knowledge too. Steal the code before this article will get old and buried on the interwebz.

Good luck with using and tweaking the code!


Section contents

About Wim

Wim Gielis is a Business Intelligence consultant and Excel expert

Other links