reading and parsing a TSV file then manipulating it for saving as CSV efficiently

Question

My source data is in a TSV file  6 columns and greater than 2 million rows   Here s what I m trying to accomplish    I need to read the data in 3 of the columns  3  4  5  in this source file The fifth column is an integer  I need to use this integer value to duplicate a row entry with using the data in the third and fourth columns  by the number of integer times   I want to write the output of  2 to an output file in CSV format    Below is what I came up with     My question  is this an efficient way to do it   It seems like it might be intensive when attempted on 2 million rows   First  I made a sample tab separate file to work with  and called it  sample txt    It s basic and only has four rows   Row1 Column1    Row1-Column2    Row1-Column3    Row1-Column4    2   Row1-Column6 Row2 Column1    Row2-Column2    Row2-Column3    Row2-Column4    3   Row2-Column6 Row3 Column1    Row3-Column2    Row3-Column3    Row3-Column4    1   Row3-Column6 Row4 Column1    Row4-Column2    Row4-Column3    Row4-Column4    2   Row4-Column6   then I have this code   import csv   with open  sample txt   r   as tsv      AoA    line strip   split   t   for line in tsv   for a in AoA      count   int a 4       while count  gt  0          with open  sample new csv    a   newline     as csvfile              csvwriter   csv writer csvfile  delimiter                  csvwriter writerow  a 2   a 3            count   count - 1

User · Accepted Answer

You should use the csv module to read the tab-separated value file  Do not read it into memory in one go  Each row you read has all the information you need to write rows to the output CSV file  after all  Keep the output file open throughout   import csv  with open  sample txt   newline     as tsvin  open  new csv    w   newline     as csvout      tsvin   csv reader tsvin  delimiter   t       csvout   csv writer csvout       for row in tsvin          count   int row 4           if count  gt  0              csvout writerows  row 2 4  for   in range count      or  using the itertools module to do the repeating with itertools repeat     from itertools import repeat import csv  with open  sample txt   newline     as tsvin  open  new csv    w   newline     as csvout      tsvin   csv reader tsvin  delimiter   t       csvout   csv writer csvout       for row in tsvin          count   int row 4           if count  gt  0              csvout writerows repeat row 2 4   count

[python] reading and parsing a TSV file, then manipulating it for saving as CSV (efficiently)

Examples related to python

Examples related to file

Examples related to csv

Examples related to tab-delimited-text

[python] reading and parsing a TSV file, then manipulating it for saving as CSV (*efficiently*)

Examples related to python

Examples related to file

Examples related to csv

Examples related to tab-delimited-text

[python] reading and parsing a TSV file, then manipulating it for saving as CSV (efficiently)