[Python-talk] How would one do something like this?

bruce.labitt at autoliv.com bruce.labitt at autoliv.com
Wed Dec 10 15:00:01 EST 2008


Lloyd Kvam <python at venix.com> wrote on 12/10/2008 12:24:01 PM:

> 
> On Wed, 2008-12-10 at 11:54 -0500, bruce.labitt at autoliv.com wrote:
> > Given a large text file of say 10M lines, how could one go to the 
> > 4,099,998th line and extract data for the next n lines? 
> 
> I think the key would be to reduce your memory footprint.
> 

Exactly what I need.  (I only have 8G... :) )

> itertools has an islice function.
> 
> from itertools import islice
> inf = open('tmp.txt')
> start = 4099998
> length = 1000
> chunk = islice(inf, start, start+length, 1)
> for line in chunk:
>     process(line)
> 

Very slick!  This greatly reduced the time to read in the file!  Ahem, 
once 
the indexing (start point) was fixed that is.  I just tried it on a 2M 
line file to check it out.
A few minutes compared to >30 minutes.


> This processes one line at a time minimizing your memory footprint.  In
> general, I've found the iterator / itertools to be very good for
> processing logfiles.  Effectively the iterators feed each other much
> like Unix pipes.
> 
> perhaps the for loop should really be
> results = [process(line) for line in chunk]
> 
> giving you a list of results with one item for each line
> 

Thanks a lot!!!

Bruce

******************************
Neither the footer nor anything else in this E-mail is intended to or constitutes an <br>electronic signature and/or legally binding agreement in the absence of an <br>express statement or Autoliv policy and/or procedure to the contrary.<br>This E-mail and any attachments hereto are Autoliv property and may contain legally <br>privileged, confidential and/or proprietary information.<br>The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way <br>disseminating any material contained within this E-mail without prior written <br>permission from the author. If you receive this E-mail in error, please <br>immediately notify the author and delete this E-mail.  Autoliv disclaims all <br>responsibility and liability for the consequences of any person who fails to <br>abide by the terms herein. <br>
******************************


More information about the Python-talk mailing list