[Python-talk] How would one do something like this?

Lloyd Kvam python at venix.com
Wed Dec 10 12:24:01 EST 2008


On Wed, 2008-12-10 at 11:54 -0500, bruce.labitt at autoliv.com wrote:
> Given a large text file of say 10M lines, how could one go to the 
> 4,099,998th line and extract data for the next n lines? 

I think the key would be to reduce your memory footprint.

itertools has an islice function.

from itertools import islice
inf = open('tmp.txt')
start = 4099998
length = 1000
chunk = islice(inf, start, start+length, 1)
for line in chunk:
    process(line)

This processes one line at a time minimizing your memory footprint.  In
general, I've found the iterator / itertools to be very good for
processing logfiles.  Effectively the iterators feed each other much
like Unix pipes.

perhaps the for loop should really be
results = [process(line) for line in chunk]

giving you a list of results with one item for each line

> 
> If one assumes the file can be read completely in one swoop, then one 
> could do
> 
> inFile = file('tmp.txt','r')
> lines = inFile.readlines()              # read whole file all at once
> inFile.close()                          # close inFile
> 
> startline = 4099998-1
> for ii in range(0,n)
>   x[ii] = double(lines[startline+ii].strip())  # etc, etc...
> 
> ???
> 
> Any other approaches?  What to do as the file size increases, say to be 
> large enough it will not fit in RAM?
> The dataset I am extracting is a subsection of the file.
> 
> I have been reading line by line, but it is excruciatingly slow... 
> especially when the file is 200M lines!
> (5GB file!)
> 
> Thanks
> Bruce
> 
> 
> * Please note that the Tyco Electronics Automotive Radar Sensors Group was 
> acquired by Autoliv on September 26th, 2008.  My new contact information 
> at Autoliv is included below.  Please update your records accordingly.  
> 
> Bruce Labitt
> Autoliv Electronics
> 1011B Pawtucket Blvd, PO Box 1858
> Lowell, MA  01853
> 
> Email: bruce.labitt at autoliv.com. 
> Tel:  (978) 674-6526
> Fax: (978) 674-6581 
> 
> ******************************
> Neither the footer nor anything else in this E-mail is intended to or constitutes an <br>electronic signature and/or legally binding agreement in the absence of an <br>express statement or Autoliv policy and/or procedure to the contrary.<br>This E-mail and any attachments hereto are Autoliv property and may contain legally <br>privileged, confidential and/or proprietary information.<br>The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way <br>disseminating any material contained within this E-mail without prior written <br>permission from the author. If you receive this E-mail in error, please <br>immediately notify the author and delete this E-mail.  Autoliv disclaims all <br>responsibility and liability for the consequences of any person who fails to <br>abide by the terms herein. <br>
> ******************************
> _______________________________________________
> Python-talk mailing list
> Python-talk at dlslug.org
> http://dlslug.org/mailman/listinfo/python-talk
-- 
Lloyd Kvam
Venix Corp
DLSLUG/GNHLUG library
http://dlslug.org/library.html
http://www.librarything.com/catalog/dlslug
http://www.librarything.com/rsshtml/recent/dlslug
http://www.librarything.com/rss/recent/dlslug



More information about the Python-talk mailing list