How to split Input file into multiple output files based on number of records,also generating output file with unique sequence in Talend Open studio?

 Hello Folks,


Requirement : 

1.Spliting Input file into multiple output files based on number of records.For example:Input file containing 20,000 records and we need to generate multiple output files(containing only 10,000 records).

2.Output file name with sequence For example: On day 1,File generated as FileName_001.csv.

On day 2.File must be generate as FileName_002.csv.


There are multiple option to achieve this target but I will explain you 2 options in most optimized way.


Option 1 :

This requirement is Interesting because as we know that we can generate or split output file based on rows (standard feature of tFileOutputdelimited component provided by Talend).







As when file generated as output it automatically add sequence also in file name.

But this option cannot be opted for following 2 reasons:

1.when we need our desired sequence number as it starts from 1 but we want to generate it from 10.

2.Everyday when job will executed it will generated from 1 sequence only not from the last sequence it finished on yesterday job execution.

If your requirement execlude above 2 points then you can opt option 1 otherwise go for option 2


Option 2 : Talend Job Flow

tFileRowCount --->onSubJobOk---->tLoop---->Iterate----->tFileInputDelimited---->main------>tFlowToIterate---->onComponentOK---->tJava----->tFileInputDelimited------>main------>tFileOutputDelimited------>OnComponentOK---->tRowGenerator---->main--->tFileOutputDelimited.







1.File Row Count : Will count number of records present in Input file.


2.Loop : For Iteration based on spliting condition, use To as ((Integer)globalMap.get("tFileRowCount_1_COUNT")) this will set termination point for loop.









3.Sequence File : using tFileInputDelimited extraction sequence number for naming output file.

Note : Create a sequence name text file containing your inital value like I selected value as "0001".










Note** : In tRowGenerator component used function is below(here we assumed for 4 digit sequence)

Integer.parseInt((String)globalMap.get("Sequence"))==9999?"0000":String.format("%04d",(Integer.parseInt((String)globalMap.get("Sequence"))+1))

************************************************************************

Further steps mentioned in this video in detail. 

๐Ÿ‘‰ https://www.youtube.com/watch?v=fFBbq-r_PEc&t=1s

You can join my telegram channel for more updates ๐Ÿ‘‰ https://t.me/beingtalenddev
 or 

My youtube channel link where you will find more useful contents

 ๐Ÿ‘‰ https://www.youtube.com/channel/UCt7L-WavPD5q10JU1RRz7Dg

Thank you very much!๐Ÿ™‹

Comments

Popular posts from this blog

Talend open studio Best 10 practices for designing Jobs

How to install Talend Open Studio in windows,also what does each section of Talend GUI meant for?