[Python-talk] How would one do something like this?
Lloyd Kvam
python at venix.com
Wed Dec 10 12:24:01 EST 2008
On Wed, 2008-12-10 at 11:54 -0500, bruce.labitt at autoliv.com wrote:
> Given a large text file of say 10M lines, how could one go to the
> 4,099,998th line and extract data for the next n lines?
I think the key would be to reduce your memory footprint.
itertools has an islice function.
from itertools import islice
inf = open('tmp.txt')
start = 4099998
length = 1000
chunk = islice(inf, start, start+length, 1)
for line in chunk:
process(line)
This processes one line at a time minimizing your memory footprint. In
general, I've found the iterator / itertools to be very good for
processing logfiles. Effectively the iterators feed each other much
like Unix pipes.
perhaps the for loop should really be
results = [process(line) for line in chunk]
giving you a list of results with one item for each line
>
> If one assumes the file can be read completely in one swoop, then one
> could do
>
> inFile = file('tmp.txt','r')
> lines = inFile.readlines() # read whole file all at once
> inFile.close() # close inFile
>
> startline = 4099998-1
> for ii in range(0,n)
> x[ii] = double(lines[startline+ii].strip()) # etc, etc...
>
> ???
>
> Any other approaches? What to do as the file size increases, say to be
> large enough it will not fit in RAM?
> The dataset I am extracting is a subsection of the file.
>
> I have been reading line by line, but it is excruciatingly slow...
> especially when the file is 200M lines!
> (5GB file!)
>
> Thanks
> Bruce
>
>
> * Please note that the Tyco Electronics Automotive Radar Sensors Group was
> acquired by Autoliv on September 26th, 2008. My new contact information
> at Autoliv is included below. Please update your records accordingly.
>
> Bruce Labitt
> Autoliv Electronics
> 1011B Pawtucket Blvd, PO Box 1858
> Lowell, MA 01853
>
> Email: bruce.labitt at autoliv.com.
> Tel: (978) 674-6526
> Fax: (978) 674-6581
>
> ******************************
> Neither the footer nor anything else in this E-mail is intended to or constitutes an <br>electronic signature and/or legally binding agreement in the absence of an <br>express statement or Autoliv policy and/or procedure to the contrary.<br>This E-mail and any attachments hereto are Autoliv property and may contain legally <br>privileged, confidential and/or proprietary information.<br>The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way <br>disseminating any material contained within this E-mail without prior written <br>permission from the author. If you receive this E-mail in error, please <br>immediately notify the author and delete this E-mail. Autoliv disclaims all <br>responsibility and liability for the consequences of any person who fails to <br>abide by the terms herein. <br>
> ******************************
> _______________________________________________
> Python-talk mailing list
> Python-talk at dlslug.org
> http://dlslug.org/mailman/listinfo/python-talk
--
Lloyd Kvam
Venix Corp
DLSLUG/GNHLUG library
http://dlslug.org/library.html
http://www.librarything.com/catalog/dlslug
http://www.librarything.com/rsshtml/recent/dlslug
http://www.librarything.com/rss/recent/dlslug
More information about the Python-talk
mailing list