[Python-talk] How would one do something like this?
bruce.labitt at autoliv.com
bruce.labitt at autoliv.com
Wed Dec 10 15:00:01 EST 2008
Lloyd Kvam <python at venix.com> wrote on 12/10/2008 12:24:01 PM:
>
> On Wed, 2008-12-10 at 11:54 -0500, bruce.labitt at autoliv.com wrote:
> > Given a large text file of say 10M lines, how could one go to the
> > 4,099,998th line and extract data for the next n lines?
>
> I think the key would be to reduce your memory footprint.
>
Exactly what I need. (I only have 8G... :) )
> itertools has an islice function.
>
> from itertools import islice
> inf = open('tmp.txt')
> start = 4099998
> length = 1000
> chunk = islice(inf, start, start+length, 1)
> for line in chunk:
> process(line)
>
Very slick! This greatly reduced the time to read in the file! Ahem,
once
the indexing (start point) was fixed that is. I just tried it on a 2M
line file to check it out.
A few minutes compared to >30 minutes.
> This processes one line at a time minimizing your memory footprint. In
> general, I've found the iterator / itertools to be very good for
> processing logfiles. Effectively the iterators feed each other much
> like Unix pipes.
>
> perhaps the for loop should really be
> results = [process(line) for line in chunk]
>
> giving you a list of results with one item for each line
>
Thanks a lot!!!
Bruce
******************************
Neither the footer nor anything else in this E-mail is intended to or constitutes an <br>electronic signature and/or legally binding agreement in the absence of an <br>express statement or Autoliv policy and/or procedure to the contrary.<br>This E-mail and any attachments hereto are Autoliv property and may contain legally <br>privileged, confidential and/or proprietary information.<br>The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way <br>disseminating any material contained within this E-mail without prior written <br>permission from the author. If you receive this E-mail in error, please <br>immediately notify the author and delete this E-mail. Autoliv disclaims all <br>responsibility and liability for the consequences of any person who fails to <br>abide by the terms herein. <br>
******************************
More information about the Python-talk
mailing list