Improve runtime of python code with indexing and string concatenation
I tried hard to improve the runtime of the following code snippet, which turned out to be the CPU-bottleneck in an asyncio-client package that I'm developing:
data = [''] * n
for i, ix in enumerate(indices):
data[ix] = elements[i]
s = 't'.join(data)
return s
What I do is basically very simple:
elements
is a list ofstr
(each <= 7 characters) that I eventually write at specific positions into a tab-separated file.indices
is a list ofint
giving the positions of each of theelements
in the file- If there is no element at a certain position, an empty string is inserted
I finally write the string into a text file using aiofiles
.
So far, I tried to use a generator to create the data on the fly, as well as to use numpy for faster indexing, but with no success. Any idea how to make this code run faster would be great. Here is a reproducible example with timing:
import numpy as np
import timeit
n = 1_000_000 # total number of items
k = 500_000 # number of elements to insert
elements = ['ade/gua'] * k # elements to insert, <= 7 unicode characters
indices = list(range(0, n, 2)) # indices where to insert, sorted
assert len(elements) == len(indices)
# This is where I started
def baseline():
data = [''] * n
for i, ix in enumerate(indices):
data[ix] = elements[i]
s = 't'.join(data)
return s
# Generate values on the fly
def generator():
def f():
it = iter(enumerate(indices))
i, ix = next(it)
for j in range(n):
if j == ix:
yield elements[i]
try:
i, ix = next(it)
except:
pass
else:
yield ''
s = 't'.join(f()) # iterating though generation seem too costly
return s
# Harness numpy
indices_np = np.array(indices) # indices could also be numpy array
def numpy():
data = np.full(n, '', dtype='<U7')
data[indices_np] = elements # this is faster with numpy
s = 't'.join(data) # much slower. array2string or savetxt does not help
return s
assert baseline() == generator() == numpy()
timeit.timeit(baseline, number=10) # 0.8463204780127853
timeit.timeit(generator, number=10) # 2.048296730965376 -> great job
timeit.timeit(numpy, number=10) # 4.486689139157534 -> life sucks
Edit 1
To address some of the points raised in the comments:
I write the string
aiofiles.open(filename, mode='w') as file
andfile.write()
Indices can generally not be expressed as a range
Indices can assumed to be always sorted at no extra cost.
ASCII characters are sufficient
Edit 2
Based on the answer of Mad Physicist, I tried the following code, with no success.
def buffer_plumbing():
m = len(elements) # total number of data points to insert
k = 7 # each element is 7 bytes long, only ascii
total_bytes = n - 1 + m * 7 # total number of bytes for the buffer
# find out the number of preceeding gaps for each element
gap = np.empty_like(indices_np)
gap[0] = indices_np[0] # that many gaps a the beginning
np.subtract(indices_np[1:], indices_np[:-1], out=gap[1:])
gap[1:] -= 1 # subtract one to get the gaps (except for the first)
# pre-allocate a large enough byte buffer
s = np.full(total_bytes , 't', dtype='S1')
# write element into the buffer
start = 0
for i, (g, e) in enumerate(zip(gap, elements)):
start += g
s[start: start + k].view(dtype=('S', k))[:] = e
start += k + 1
return s.tostring().decode('utf-8')
timeit.timeit(buffer_plumbing, number=10) # 26.82
python numpy optimization
|
show 4 more comments
I tried hard to improve the runtime of the following code snippet, which turned out to be the CPU-bottleneck in an asyncio-client package that I'm developing:
data = [''] * n
for i, ix in enumerate(indices):
data[ix] = elements[i]
s = 't'.join(data)
return s
What I do is basically very simple:
elements
is a list ofstr
(each <= 7 characters) that I eventually write at specific positions into a tab-separated file.indices
is a list ofint
giving the positions of each of theelements
in the file- If there is no element at a certain position, an empty string is inserted
I finally write the string into a text file using aiofiles
.
So far, I tried to use a generator to create the data on the fly, as well as to use numpy for faster indexing, but with no success. Any idea how to make this code run faster would be great. Here is a reproducible example with timing:
import numpy as np
import timeit
n = 1_000_000 # total number of items
k = 500_000 # number of elements to insert
elements = ['ade/gua'] * k # elements to insert, <= 7 unicode characters
indices = list(range(0, n, 2)) # indices where to insert, sorted
assert len(elements) == len(indices)
# This is where I started
def baseline():
data = [''] * n
for i, ix in enumerate(indices):
data[ix] = elements[i]
s = 't'.join(data)
return s
# Generate values on the fly
def generator():
def f():
it = iter(enumerate(indices))
i, ix = next(it)
for j in range(n):
if j == ix:
yield elements[i]
try:
i, ix = next(it)
except:
pass
else:
yield ''
s = 't'.join(f()) # iterating though generation seem too costly
return s
# Harness numpy
indices_np = np.array(indices) # indices could also be numpy array
def numpy():
data = np.full(n, '', dtype='<U7')
data[indices_np] = elements # this is faster with numpy
s = 't'.join(data) # much slower. array2string or savetxt does not help
return s
assert baseline() == generator() == numpy()
timeit.timeit(baseline, number=10) # 0.8463204780127853
timeit.timeit(generator, number=10) # 2.048296730965376 -> great job
timeit.timeit(numpy, number=10) # 4.486689139157534 -> life sucks
Edit 1
To address some of the points raised in the comments:
I write the string
aiofiles.open(filename, mode='w') as file
andfile.write()
Indices can generally not be expressed as a range
Indices can assumed to be always sorted at no extra cost.
ASCII characters are sufficient
Edit 2
Based on the answer of Mad Physicist, I tried the following code, with no success.
def buffer_plumbing():
m = len(elements) # total number of data points to insert
k = 7 # each element is 7 bytes long, only ascii
total_bytes = n - 1 + m * 7 # total number of bytes for the buffer
# find out the number of preceeding gaps for each element
gap = np.empty_like(indices_np)
gap[0] = indices_np[0] # that many gaps a the beginning
np.subtract(indices_np[1:], indices_np[:-1], out=gap[1:])
gap[1:] -= 1 # subtract one to get the gaps (except for the first)
# pre-allocate a large enough byte buffer
s = np.full(total_bytes , 't', dtype='S1')
# write element into the buffer
start = 0
for i, (g, e) in enumerate(zip(gap, elements)):
start += g
s[start: start + k].view(dtype=('S', k))[:] = e
start += k + 1
return s.tostring().decode('utf-8')
timeit.timeit(buffer_plumbing, number=10) # 26.82
python numpy optimization
I have to ask, what does 'ade/cyt' mean?
– Mad Physicist
Nov 15 '18 at 13:32
You are relocating the string umpteen times when you do the join. Your best optimization may be to avoid constructing it in the first place. What is the final purpose of that string? Are you writing it somewhere?
– Mad Physicist
Nov 15 '18 at 13:36
@MadPhysicistade/cyt
refered to nucleotides (as was biologically wrong, btw). I write it to a text file.
– dmy
Nov 15 '18 at 13:40
Why don't you replace the'ade/gua'
and other strings with integers. For example 1 for'ade/gua'
, 2 for'ade/cyt'
, etc... And withdata = np.zeros(n, dtype=np.int64)
you dodata[ix] = elements[i]
. Or what is your expected output? How many combinations do you expect?
– Scotty1-
Nov 15 '18 at 14:02
1
Can you show how you write the string exactly. I think your biggest speedup will come from not concatenating the whole thing if you're writing it to a file.
– Mad Physicist
Nov 16 '18 at 3:41
|
show 4 more comments
I tried hard to improve the runtime of the following code snippet, which turned out to be the CPU-bottleneck in an asyncio-client package that I'm developing:
data = [''] * n
for i, ix in enumerate(indices):
data[ix] = elements[i]
s = 't'.join(data)
return s
What I do is basically very simple:
elements
is a list ofstr
(each <= 7 characters) that I eventually write at specific positions into a tab-separated file.indices
is a list ofint
giving the positions of each of theelements
in the file- If there is no element at a certain position, an empty string is inserted
I finally write the string into a text file using aiofiles
.
So far, I tried to use a generator to create the data on the fly, as well as to use numpy for faster indexing, but with no success. Any idea how to make this code run faster would be great. Here is a reproducible example with timing:
import numpy as np
import timeit
n = 1_000_000 # total number of items
k = 500_000 # number of elements to insert
elements = ['ade/gua'] * k # elements to insert, <= 7 unicode characters
indices = list(range(0, n, 2)) # indices where to insert, sorted
assert len(elements) == len(indices)
# This is where I started
def baseline():
data = [''] * n
for i, ix in enumerate(indices):
data[ix] = elements[i]
s = 't'.join(data)
return s
# Generate values on the fly
def generator():
def f():
it = iter(enumerate(indices))
i, ix = next(it)
for j in range(n):
if j == ix:
yield elements[i]
try:
i, ix = next(it)
except:
pass
else:
yield ''
s = 't'.join(f()) # iterating though generation seem too costly
return s
# Harness numpy
indices_np = np.array(indices) # indices could also be numpy array
def numpy():
data = np.full(n, '', dtype='<U7')
data[indices_np] = elements # this is faster with numpy
s = 't'.join(data) # much slower. array2string or savetxt does not help
return s
assert baseline() == generator() == numpy()
timeit.timeit(baseline, number=10) # 0.8463204780127853
timeit.timeit(generator, number=10) # 2.048296730965376 -> great job
timeit.timeit(numpy, number=10) # 4.486689139157534 -> life sucks
Edit 1
To address some of the points raised in the comments:
I write the string
aiofiles.open(filename, mode='w') as file
andfile.write()
Indices can generally not be expressed as a range
Indices can assumed to be always sorted at no extra cost.
ASCII characters are sufficient
Edit 2
Based on the answer of Mad Physicist, I tried the following code, with no success.
def buffer_plumbing():
m = len(elements) # total number of data points to insert
k = 7 # each element is 7 bytes long, only ascii
total_bytes = n - 1 + m * 7 # total number of bytes for the buffer
# find out the number of preceeding gaps for each element
gap = np.empty_like(indices_np)
gap[0] = indices_np[0] # that many gaps a the beginning
np.subtract(indices_np[1:], indices_np[:-1], out=gap[1:])
gap[1:] -= 1 # subtract one to get the gaps (except for the first)
# pre-allocate a large enough byte buffer
s = np.full(total_bytes , 't', dtype='S1')
# write element into the buffer
start = 0
for i, (g, e) in enumerate(zip(gap, elements)):
start += g
s[start: start + k].view(dtype=('S', k))[:] = e
start += k + 1
return s.tostring().decode('utf-8')
timeit.timeit(buffer_plumbing, number=10) # 26.82
python numpy optimization
I tried hard to improve the runtime of the following code snippet, which turned out to be the CPU-bottleneck in an asyncio-client package that I'm developing:
data = [''] * n
for i, ix in enumerate(indices):
data[ix] = elements[i]
s = 't'.join(data)
return s
What I do is basically very simple:
elements
is a list ofstr
(each <= 7 characters) that I eventually write at specific positions into a tab-separated file.indices
is a list ofint
giving the positions of each of theelements
in the file- If there is no element at a certain position, an empty string is inserted
I finally write the string into a text file using aiofiles
.
So far, I tried to use a generator to create the data on the fly, as well as to use numpy for faster indexing, but with no success. Any idea how to make this code run faster would be great. Here is a reproducible example with timing:
import numpy as np
import timeit
n = 1_000_000 # total number of items
k = 500_000 # number of elements to insert
elements = ['ade/gua'] * k # elements to insert, <= 7 unicode characters
indices = list(range(0, n, 2)) # indices where to insert, sorted
assert len(elements) == len(indices)
# This is where I started
def baseline():
data = [''] * n
for i, ix in enumerate(indices):
data[ix] = elements[i]
s = 't'.join(data)
return s
# Generate values on the fly
def generator():
def f():
it = iter(enumerate(indices))
i, ix = next(it)
for j in range(n):
if j == ix:
yield elements[i]
try:
i, ix = next(it)
except:
pass
else:
yield ''
s = 't'.join(f()) # iterating though generation seem too costly
return s
# Harness numpy
indices_np = np.array(indices) # indices could also be numpy array
def numpy():
data = np.full(n, '', dtype='<U7')
data[indices_np] = elements # this is faster with numpy
s = 't'.join(data) # much slower. array2string or savetxt does not help
return s
assert baseline() == generator() == numpy()
timeit.timeit(baseline, number=10) # 0.8463204780127853
timeit.timeit(generator, number=10) # 2.048296730965376 -> great job
timeit.timeit(numpy, number=10) # 4.486689139157534 -> life sucks
Edit 1
To address some of the points raised in the comments:
I write the string
aiofiles.open(filename, mode='w') as file
andfile.write()
Indices can generally not be expressed as a range
Indices can assumed to be always sorted at no extra cost.
ASCII characters are sufficient
Edit 2
Based on the answer of Mad Physicist, I tried the following code, with no success.
def buffer_plumbing():
m = len(elements) # total number of data points to insert
k = 7 # each element is 7 bytes long, only ascii
total_bytes = n - 1 + m * 7 # total number of bytes for the buffer
# find out the number of preceeding gaps for each element
gap = np.empty_like(indices_np)
gap[0] = indices_np[0] # that many gaps a the beginning
np.subtract(indices_np[1:], indices_np[:-1], out=gap[1:])
gap[1:] -= 1 # subtract one to get the gaps (except for the first)
# pre-allocate a large enough byte buffer
s = np.full(total_bytes , 't', dtype='S1')
# write element into the buffer
start = 0
for i, (g, e) in enumerate(zip(gap, elements)):
start += g
s[start: start + k].view(dtype=('S', k))[:] = e
start += k + 1
return s.tostring().decode('utf-8')
timeit.timeit(buffer_plumbing, number=10) # 26.82
python numpy optimization
python numpy optimization
edited Nov 16 '18 at 12:02
dmy
asked Nov 15 '18 at 13:25
dmydmy
132
132
I have to ask, what does 'ade/cyt' mean?
– Mad Physicist
Nov 15 '18 at 13:32
You are relocating the string umpteen times when you do the join. Your best optimization may be to avoid constructing it in the first place. What is the final purpose of that string? Are you writing it somewhere?
– Mad Physicist
Nov 15 '18 at 13:36
@MadPhysicistade/cyt
refered to nucleotides (as was biologically wrong, btw). I write it to a text file.
– dmy
Nov 15 '18 at 13:40
Why don't you replace the'ade/gua'
and other strings with integers. For example 1 for'ade/gua'
, 2 for'ade/cyt'
, etc... And withdata = np.zeros(n, dtype=np.int64)
you dodata[ix] = elements[i]
. Or what is your expected output? How many combinations do you expect?
– Scotty1-
Nov 15 '18 at 14:02
1
Can you show how you write the string exactly. I think your biggest speedup will come from not concatenating the whole thing if you're writing it to a file.
– Mad Physicist
Nov 16 '18 at 3:41
|
show 4 more comments
I have to ask, what does 'ade/cyt' mean?
– Mad Physicist
Nov 15 '18 at 13:32
You are relocating the string umpteen times when you do the join. Your best optimization may be to avoid constructing it in the first place. What is the final purpose of that string? Are you writing it somewhere?
– Mad Physicist
Nov 15 '18 at 13:36
@MadPhysicistade/cyt
refered to nucleotides (as was biologically wrong, btw). I write it to a text file.
– dmy
Nov 15 '18 at 13:40
Why don't you replace the'ade/gua'
and other strings with integers. For example 1 for'ade/gua'
, 2 for'ade/cyt'
, etc... And withdata = np.zeros(n, dtype=np.int64)
you dodata[ix] = elements[i]
. Or what is your expected output? How many combinations do you expect?
– Scotty1-
Nov 15 '18 at 14:02
1
Can you show how you write the string exactly. I think your biggest speedup will come from not concatenating the whole thing if you're writing it to a file.
– Mad Physicist
Nov 16 '18 at 3:41
I have to ask, what does 'ade/cyt' mean?
– Mad Physicist
Nov 15 '18 at 13:32
I have to ask, what does 'ade/cyt' mean?
– Mad Physicist
Nov 15 '18 at 13:32
You are relocating the string umpteen times when you do the join. Your best optimization may be to avoid constructing it in the first place. What is the final purpose of that string? Are you writing it somewhere?
– Mad Physicist
Nov 15 '18 at 13:36
You are relocating the string umpteen times when you do the join. Your best optimization may be to avoid constructing it in the first place. What is the final purpose of that string? Are you writing it somewhere?
– Mad Physicist
Nov 15 '18 at 13:36
@MadPhysicist
ade/cyt
refered to nucleotides (as was biologically wrong, btw). I write it to a text file.– dmy
Nov 15 '18 at 13:40
@MadPhysicist
ade/cyt
refered to nucleotides (as was biologically wrong, btw). I write it to a text file.– dmy
Nov 15 '18 at 13:40
Why don't you replace the
'ade/gua'
and other strings with integers. For example 1 for 'ade/gua'
, 2 for 'ade/cyt'
, etc... And with data = np.zeros(n, dtype=np.int64)
you do data[ix] = elements[i]
. Or what is your expected output? How many combinations do you expect?– Scotty1-
Nov 15 '18 at 14:02
Why don't you replace the
'ade/gua'
and other strings with integers. For example 1 for 'ade/gua'
, 2 for 'ade/cyt'
, etc... And with data = np.zeros(n, dtype=np.int64)
you do data[ix] = elements[i]
. Or what is your expected output? How many combinations do you expect?– Scotty1-
Nov 15 '18 at 14:02
1
1
Can you show how you write the string exactly. I think your biggest speedup will come from not concatenating the whole thing if you're writing it to a file.
– Mad Physicist
Nov 16 '18 at 3:41
Can you show how you write the string exactly. I think your biggest speedup will come from not concatenating the whole thing if you're writing it to a file.
– Mad Physicist
Nov 16 '18 at 3:41
|
show 4 more comments
1 Answer
1
active
oldest
votes
You can pre-sort your data, after converting it to a pair of numpy arrays. This will allow you to manipulate a single pre-existing buffer rather than copying strings over and over as you reallocate them. The difference between my suggestion and your attempt is that we will use ndarray.tobytes
(or ndarray.tostring
) with the assumption that you only have ASCII characters. In fact, you can completely bypass the copy operation involved in converting into a bytes
object by using ndarray.tofile
directly.
If you have elements
in-hand, you know that the total length of your line will be the combined length of the elements
and n-1
tab separators. The start of an element in the full string is therefore it's index (the number of tabs that precede it) plus the cumulative length of all the elements that come before it. Here is a simple implementation of a single-buffer fill using mostly Python loops:
lengths = np.array([len(e) for e in elements])
indices = np.asanyarray(indices)
elements = np.array(elements, dtype='S7')
order = np.argsort(indices)
elements = elements[order]
indices = indices[order]
lengths = lengths[order]
cumulative = np.empty_like(lengths)
cumulative[0] = 0
np.cumsum(lengths[:-1], out=cumulative[1:])
cumulative += lengths
s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')
for i, l, e in zip(cumulative, lengths, elements):
s[i:i + l].view(dtype=('S', l))[:] = e
There are lots of possible optimizations to play with here, such as the possibility of allocating s
using np.empty
and only filling in the required elements with tabs. This will be left as an excise for the reader.
Another possibility is to avoid converting elements
to a numpy array entirely (it probably just wastes space and time). You can then rewrite the for
loop as
for i, l, o in zip(cumulative, lengths, order):
s[i:i + l].view(dtype=('S', l))[:] = elements[o]
You can dump the result into a bytes
object with
s = s.tobytes()
OR
s = s.tostring()
You can write the result as-is to a file opened for binary writing. In fact, if you don't need a copy of the buffer in the form of a bytes
, you can just write to the file directly:
s.tofile(f)
That will save you some memory and processing time.
In the same vein, you may be better off just writing to a file directly, piece by piece. This saves you not only the need to allocate the full buffer, but also the cumulative lengths. In fact, the only thing you need this way is the diff
of successive indices to tell you how many tabs to insert:
indices = np.asanyarray(indices)
order = np.argsort(indices)
indices = indices[order]
tabs = np.empty_like(indices)
tabs[0] = indices[0]
np.subtract(indices[1:], indices[:-1], out=tabs[1:])
tabs[1:] -= 1
for t, o in zip(tabs, order):
f.write('t' * t)
f.write(elements[o])
f.write('t' * (n - indices[-1] - 1))
This second approach has two major advantages besides the reduced amount of calculation. The first is that it works with unicode characters rather than ASCII only. The second is that it does not allocate any buffers besides strings of tabs, which should be extremely fast.
In both cases, having elements
and indices
sorted into ascending order by index will speed things up dramatically. The first case reduces to
lengths = np.array([len(e) for e in elements])
indices = np.asanyarray(indices)
cumulative = np.empty_like(lengths)
cumulative[0] = 0
np.cumsum(lengths[:-1], out=cumulative[1:])
cumulative += lengths
s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')
for i, l, e in zip(cumulative, lengths, elements):
s[i:i + l].view(dtype=('S', l))[:] = e
And the second becomes just
indices = np.asanyarray(indices)
tabs = np.empty_like(indices)
tabs[0] = indices[0]
np.subtract(indices[1:], indices[:-1], out=tabs[1:])
tabs[1:] -= 1
for t, e in zip(tabs, elements):
f.write('t' * t)
f.write(e)
f.write('t' * (n - indices[-1] - 1))
Thanks for the extensive answer. It tried the last approach, but I think there is a bug, because I need 't' also between consecutive elements as a delimitier. Besides, this massively slows down everything du to the large number of calls tofile.write
.
– dmy
Nov 16 '18 at 9:41
I tried everything now and its just getting worse. Seems like time to surrender...
– dmy
Nov 16 '18 at 12:03
@dmy. I'll play with it some more once I'm on a desktop. This is interesting. Remember to include the I/O in your timing. That's likely to be a bigger bottleneck than the in-memory stuff.
– Mad Physicist
Nov 16 '18 at 13:54
I'm using asyncio for this reason so I made the CPU to be my bottleneck.
– dmy
Nov 16 '18 at 14:04
If you're writing to a file, you're writing to a file. Writing x number of bytes will take a fixed amount of time, which will generally only go up if you try to parallelize with other disk operations.
– Mad Physicist
Nov 16 '18 at 14:07
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53320519%2fimprove-runtime-of-python-code-with-indexing-and-string-concatenation%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can pre-sort your data, after converting it to a pair of numpy arrays. This will allow you to manipulate a single pre-existing buffer rather than copying strings over and over as you reallocate them. The difference between my suggestion and your attempt is that we will use ndarray.tobytes
(or ndarray.tostring
) with the assumption that you only have ASCII characters. In fact, you can completely bypass the copy operation involved in converting into a bytes
object by using ndarray.tofile
directly.
If you have elements
in-hand, you know that the total length of your line will be the combined length of the elements
and n-1
tab separators. The start of an element in the full string is therefore it's index (the number of tabs that precede it) plus the cumulative length of all the elements that come before it. Here is a simple implementation of a single-buffer fill using mostly Python loops:
lengths = np.array([len(e) for e in elements])
indices = np.asanyarray(indices)
elements = np.array(elements, dtype='S7')
order = np.argsort(indices)
elements = elements[order]
indices = indices[order]
lengths = lengths[order]
cumulative = np.empty_like(lengths)
cumulative[0] = 0
np.cumsum(lengths[:-1], out=cumulative[1:])
cumulative += lengths
s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')
for i, l, e in zip(cumulative, lengths, elements):
s[i:i + l].view(dtype=('S', l))[:] = e
There are lots of possible optimizations to play with here, such as the possibility of allocating s
using np.empty
and only filling in the required elements with tabs. This will be left as an excise for the reader.
Another possibility is to avoid converting elements
to a numpy array entirely (it probably just wastes space and time). You can then rewrite the for
loop as
for i, l, o in zip(cumulative, lengths, order):
s[i:i + l].view(dtype=('S', l))[:] = elements[o]
You can dump the result into a bytes
object with
s = s.tobytes()
OR
s = s.tostring()
You can write the result as-is to a file opened for binary writing. In fact, if you don't need a copy of the buffer in the form of a bytes
, you can just write to the file directly:
s.tofile(f)
That will save you some memory and processing time.
In the same vein, you may be better off just writing to a file directly, piece by piece. This saves you not only the need to allocate the full buffer, but also the cumulative lengths. In fact, the only thing you need this way is the diff
of successive indices to tell you how many tabs to insert:
indices = np.asanyarray(indices)
order = np.argsort(indices)
indices = indices[order]
tabs = np.empty_like(indices)
tabs[0] = indices[0]
np.subtract(indices[1:], indices[:-1], out=tabs[1:])
tabs[1:] -= 1
for t, o in zip(tabs, order):
f.write('t' * t)
f.write(elements[o])
f.write('t' * (n - indices[-1] - 1))
This second approach has two major advantages besides the reduced amount of calculation. The first is that it works with unicode characters rather than ASCII only. The second is that it does not allocate any buffers besides strings of tabs, which should be extremely fast.
In both cases, having elements
and indices
sorted into ascending order by index will speed things up dramatically. The first case reduces to
lengths = np.array([len(e) for e in elements])
indices = np.asanyarray(indices)
cumulative = np.empty_like(lengths)
cumulative[0] = 0
np.cumsum(lengths[:-1], out=cumulative[1:])
cumulative += lengths
s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')
for i, l, e in zip(cumulative, lengths, elements):
s[i:i + l].view(dtype=('S', l))[:] = e
And the second becomes just
indices = np.asanyarray(indices)
tabs = np.empty_like(indices)
tabs[0] = indices[0]
np.subtract(indices[1:], indices[:-1], out=tabs[1:])
tabs[1:] -= 1
for t, e in zip(tabs, elements):
f.write('t' * t)
f.write(e)
f.write('t' * (n - indices[-1] - 1))
Thanks for the extensive answer. It tried the last approach, but I think there is a bug, because I need 't' also between consecutive elements as a delimitier. Besides, this massively slows down everything du to the large number of calls tofile.write
.
– dmy
Nov 16 '18 at 9:41
I tried everything now and its just getting worse. Seems like time to surrender...
– dmy
Nov 16 '18 at 12:03
@dmy. I'll play with it some more once I'm on a desktop. This is interesting. Remember to include the I/O in your timing. That's likely to be a bigger bottleneck than the in-memory stuff.
– Mad Physicist
Nov 16 '18 at 13:54
I'm using asyncio for this reason so I made the CPU to be my bottleneck.
– dmy
Nov 16 '18 at 14:04
If you're writing to a file, you're writing to a file. Writing x number of bytes will take a fixed amount of time, which will generally only go up if you try to parallelize with other disk operations.
– Mad Physicist
Nov 16 '18 at 14:07
|
show 1 more comment
You can pre-sort your data, after converting it to a pair of numpy arrays. This will allow you to manipulate a single pre-existing buffer rather than copying strings over and over as you reallocate them. The difference between my suggestion and your attempt is that we will use ndarray.tobytes
(or ndarray.tostring
) with the assumption that you only have ASCII characters. In fact, you can completely bypass the copy operation involved in converting into a bytes
object by using ndarray.tofile
directly.
If you have elements
in-hand, you know that the total length of your line will be the combined length of the elements
and n-1
tab separators. The start of an element in the full string is therefore it's index (the number of tabs that precede it) plus the cumulative length of all the elements that come before it. Here is a simple implementation of a single-buffer fill using mostly Python loops:
lengths = np.array([len(e) for e in elements])
indices = np.asanyarray(indices)
elements = np.array(elements, dtype='S7')
order = np.argsort(indices)
elements = elements[order]
indices = indices[order]
lengths = lengths[order]
cumulative = np.empty_like(lengths)
cumulative[0] = 0
np.cumsum(lengths[:-1], out=cumulative[1:])
cumulative += lengths
s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')
for i, l, e in zip(cumulative, lengths, elements):
s[i:i + l].view(dtype=('S', l))[:] = e
There are lots of possible optimizations to play with here, such as the possibility of allocating s
using np.empty
and only filling in the required elements with tabs. This will be left as an excise for the reader.
Another possibility is to avoid converting elements
to a numpy array entirely (it probably just wastes space and time). You can then rewrite the for
loop as
for i, l, o in zip(cumulative, lengths, order):
s[i:i + l].view(dtype=('S', l))[:] = elements[o]
You can dump the result into a bytes
object with
s = s.tobytes()
OR
s = s.tostring()
You can write the result as-is to a file opened for binary writing. In fact, if you don't need a copy of the buffer in the form of a bytes
, you can just write to the file directly:
s.tofile(f)
That will save you some memory and processing time.
In the same vein, you may be better off just writing to a file directly, piece by piece. This saves you not only the need to allocate the full buffer, but also the cumulative lengths. In fact, the only thing you need this way is the diff
of successive indices to tell you how many tabs to insert:
indices = np.asanyarray(indices)
order = np.argsort(indices)
indices = indices[order]
tabs = np.empty_like(indices)
tabs[0] = indices[0]
np.subtract(indices[1:], indices[:-1], out=tabs[1:])
tabs[1:] -= 1
for t, o in zip(tabs, order):
f.write('t' * t)
f.write(elements[o])
f.write('t' * (n - indices[-1] - 1))
This second approach has two major advantages besides the reduced amount of calculation. The first is that it works with unicode characters rather than ASCII only. The second is that it does not allocate any buffers besides strings of tabs, which should be extremely fast.
In both cases, having elements
and indices
sorted into ascending order by index will speed things up dramatically. The first case reduces to
lengths = np.array([len(e) for e in elements])
indices = np.asanyarray(indices)
cumulative = np.empty_like(lengths)
cumulative[0] = 0
np.cumsum(lengths[:-1], out=cumulative[1:])
cumulative += lengths
s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')
for i, l, e in zip(cumulative, lengths, elements):
s[i:i + l].view(dtype=('S', l))[:] = e
And the second becomes just
indices = np.asanyarray(indices)
tabs = np.empty_like(indices)
tabs[0] = indices[0]
np.subtract(indices[1:], indices[:-1], out=tabs[1:])
tabs[1:] -= 1
for t, e in zip(tabs, elements):
f.write('t' * t)
f.write(e)
f.write('t' * (n - indices[-1] - 1))
Thanks for the extensive answer. It tried the last approach, but I think there is a bug, because I need 't' also between consecutive elements as a delimitier. Besides, this massively slows down everything du to the large number of calls tofile.write
.
– dmy
Nov 16 '18 at 9:41
I tried everything now and its just getting worse. Seems like time to surrender...
– dmy
Nov 16 '18 at 12:03
@dmy. I'll play with it some more once I'm on a desktop. This is interesting. Remember to include the I/O in your timing. That's likely to be a bigger bottleneck than the in-memory stuff.
– Mad Physicist
Nov 16 '18 at 13:54
I'm using asyncio for this reason so I made the CPU to be my bottleneck.
– dmy
Nov 16 '18 at 14:04
If you're writing to a file, you're writing to a file. Writing x number of bytes will take a fixed amount of time, which will generally only go up if you try to parallelize with other disk operations.
– Mad Physicist
Nov 16 '18 at 14:07
|
show 1 more comment
You can pre-sort your data, after converting it to a pair of numpy arrays. This will allow you to manipulate a single pre-existing buffer rather than copying strings over and over as you reallocate them. The difference between my suggestion and your attempt is that we will use ndarray.tobytes
(or ndarray.tostring
) with the assumption that you only have ASCII characters. In fact, you can completely bypass the copy operation involved in converting into a bytes
object by using ndarray.tofile
directly.
If you have elements
in-hand, you know that the total length of your line will be the combined length of the elements
and n-1
tab separators. The start of an element in the full string is therefore it's index (the number of tabs that precede it) plus the cumulative length of all the elements that come before it. Here is a simple implementation of a single-buffer fill using mostly Python loops:
lengths = np.array([len(e) for e in elements])
indices = np.asanyarray(indices)
elements = np.array(elements, dtype='S7')
order = np.argsort(indices)
elements = elements[order]
indices = indices[order]
lengths = lengths[order]
cumulative = np.empty_like(lengths)
cumulative[0] = 0
np.cumsum(lengths[:-1], out=cumulative[1:])
cumulative += lengths
s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')
for i, l, e in zip(cumulative, lengths, elements):
s[i:i + l].view(dtype=('S', l))[:] = e
There are lots of possible optimizations to play with here, such as the possibility of allocating s
using np.empty
and only filling in the required elements with tabs. This will be left as an excise for the reader.
Another possibility is to avoid converting elements
to a numpy array entirely (it probably just wastes space and time). You can then rewrite the for
loop as
for i, l, o in zip(cumulative, lengths, order):
s[i:i + l].view(dtype=('S', l))[:] = elements[o]
You can dump the result into a bytes
object with
s = s.tobytes()
OR
s = s.tostring()
You can write the result as-is to a file opened for binary writing. In fact, if you don't need a copy of the buffer in the form of a bytes
, you can just write to the file directly:
s.tofile(f)
That will save you some memory and processing time.
In the same vein, you may be better off just writing to a file directly, piece by piece. This saves you not only the need to allocate the full buffer, but also the cumulative lengths. In fact, the only thing you need this way is the diff
of successive indices to tell you how many tabs to insert:
indices = np.asanyarray(indices)
order = np.argsort(indices)
indices = indices[order]
tabs = np.empty_like(indices)
tabs[0] = indices[0]
np.subtract(indices[1:], indices[:-1], out=tabs[1:])
tabs[1:] -= 1
for t, o in zip(tabs, order):
f.write('t' * t)
f.write(elements[o])
f.write('t' * (n - indices[-1] - 1))
This second approach has two major advantages besides the reduced amount of calculation. The first is that it works with unicode characters rather than ASCII only. The second is that it does not allocate any buffers besides strings of tabs, which should be extremely fast.
In both cases, having elements
and indices
sorted into ascending order by index will speed things up dramatically. The first case reduces to
lengths = np.array([len(e) for e in elements])
indices = np.asanyarray(indices)
cumulative = np.empty_like(lengths)
cumulative[0] = 0
np.cumsum(lengths[:-1], out=cumulative[1:])
cumulative += lengths
s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')
for i, l, e in zip(cumulative, lengths, elements):
s[i:i + l].view(dtype=('S', l))[:] = e
And the second becomes just
indices = np.asanyarray(indices)
tabs = np.empty_like(indices)
tabs[0] = indices[0]
np.subtract(indices[1:], indices[:-1], out=tabs[1:])
tabs[1:] -= 1
for t, e in zip(tabs, elements):
f.write('t' * t)
f.write(e)
f.write('t' * (n - indices[-1] - 1))
You can pre-sort your data, after converting it to a pair of numpy arrays. This will allow you to manipulate a single pre-existing buffer rather than copying strings over and over as you reallocate them. The difference between my suggestion and your attempt is that we will use ndarray.tobytes
(or ndarray.tostring
) with the assumption that you only have ASCII characters. In fact, you can completely bypass the copy operation involved in converting into a bytes
object by using ndarray.tofile
directly.
If you have elements
in-hand, you know that the total length of your line will be the combined length of the elements
and n-1
tab separators. The start of an element in the full string is therefore it's index (the number of tabs that precede it) plus the cumulative length of all the elements that come before it. Here is a simple implementation of a single-buffer fill using mostly Python loops:
lengths = np.array([len(e) for e in elements])
indices = np.asanyarray(indices)
elements = np.array(elements, dtype='S7')
order = np.argsort(indices)
elements = elements[order]
indices = indices[order]
lengths = lengths[order]
cumulative = np.empty_like(lengths)
cumulative[0] = 0
np.cumsum(lengths[:-1], out=cumulative[1:])
cumulative += lengths
s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')
for i, l, e in zip(cumulative, lengths, elements):
s[i:i + l].view(dtype=('S', l))[:] = e
There are lots of possible optimizations to play with here, such as the possibility of allocating s
using np.empty
and only filling in the required elements with tabs. This will be left as an excise for the reader.
Another possibility is to avoid converting elements
to a numpy array entirely (it probably just wastes space and time). You can then rewrite the for
loop as
for i, l, o in zip(cumulative, lengths, order):
s[i:i + l].view(dtype=('S', l))[:] = elements[o]
You can dump the result into a bytes
object with
s = s.tobytes()
OR
s = s.tostring()
You can write the result as-is to a file opened for binary writing. In fact, if you don't need a copy of the buffer in the form of a bytes
, you can just write to the file directly:
s.tofile(f)
That will save you some memory and processing time.
In the same vein, you may be better off just writing to a file directly, piece by piece. This saves you not only the need to allocate the full buffer, but also the cumulative lengths. In fact, the only thing you need this way is the diff
of successive indices to tell you how many tabs to insert:
indices = np.asanyarray(indices)
order = np.argsort(indices)
indices = indices[order]
tabs = np.empty_like(indices)
tabs[0] = indices[0]
np.subtract(indices[1:], indices[:-1], out=tabs[1:])
tabs[1:] -= 1
for t, o in zip(tabs, order):
f.write('t' * t)
f.write(elements[o])
f.write('t' * (n - indices[-1] - 1))
This second approach has two major advantages besides the reduced amount of calculation. The first is that it works with unicode characters rather than ASCII only. The second is that it does not allocate any buffers besides strings of tabs, which should be extremely fast.
In both cases, having elements
and indices
sorted into ascending order by index will speed things up dramatically. The first case reduces to
lengths = np.array([len(e) for e in elements])
indices = np.asanyarray(indices)
cumulative = np.empty_like(lengths)
cumulative[0] = 0
np.cumsum(lengths[:-1], out=cumulative[1:])
cumulative += lengths
s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')
for i, l, e in zip(cumulative, lengths, elements):
s[i:i + l].view(dtype=('S', l))[:] = e
And the second becomes just
indices = np.asanyarray(indices)
tabs = np.empty_like(indices)
tabs[0] = indices[0]
np.subtract(indices[1:], indices[:-1], out=tabs[1:])
tabs[1:] -= 1
for t, e in zip(tabs, elements):
f.write('t' * t)
f.write(e)
f.write('t' * (n - indices[-1] - 1))
answered Nov 16 '18 at 5:57
Mad PhysicistMad Physicist
38.1k1677109
38.1k1677109
Thanks for the extensive answer. It tried the last approach, but I think there is a bug, because I need 't' also between consecutive elements as a delimitier. Besides, this massively slows down everything du to the large number of calls tofile.write
.
– dmy
Nov 16 '18 at 9:41
I tried everything now and its just getting worse. Seems like time to surrender...
– dmy
Nov 16 '18 at 12:03
@dmy. I'll play with it some more once I'm on a desktop. This is interesting. Remember to include the I/O in your timing. That's likely to be a bigger bottleneck than the in-memory stuff.
– Mad Physicist
Nov 16 '18 at 13:54
I'm using asyncio for this reason so I made the CPU to be my bottleneck.
– dmy
Nov 16 '18 at 14:04
If you're writing to a file, you're writing to a file. Writing x number of bytes will take a fixed amount of time, which will generally only go up if you try to parallelize with other disk operations.
– Mad Physicist
Nov 16 '18 at 14:07
|
show 1 more comment
Thanks for the extensive answer. It tried the last approach, but I think there is a bug, because I need 't' also between consecutive elements as a delimitier. Besides, this massively slows down everything du to the large number of calls tofile.write
.
– dmy
Nov 16 '18 at 9:41
I tried everything now and its just getting worse. Seems like time to surrender...
– dmy
Nov 16 '18 at 12:03
@dmy. I'll play with it some more once I'm on a desktop. This is interesting. Remember to include the I/O in your timing. That's likely to be a bigger bottleneck than the in-memory stuff.
– Mad Physicist
Nov 16 '18 at 13:54
I'm using asyncio for this reason so I made the CPU to be my bottleneck.
– dmy
Nov 16 '18 at 14:04
If you're writing to a file, you're writing to a file. Writing x number of bytes will take a fixed amount of time, which will generally only go up if you try to parallelize with other disk operations.
– Mad Physicist
Nov 16 '18 at 14:07
Thanks for the extensive answer. It tried the last approach, but I think there is a bug, because I need 't' also between consecutive elements as a delimitier. Besides, this massively slows down everything du to the large number of calls to
file.write
.– dmy
Nov 16 '18 at 9:41
Thanks for the extensive answer. It tried the last approach, but I think there is a bug, because I need 't' also between consecutive elements as a delimitier. Besides, this massively slows down everything du to the large number of calls to
file.write
.– dmy
Nov 16 '18 at 9:41
I tried everything now and its just getting worse. Seems like time to surrender...
– dmy
Nov 16 '18 at 12:03
I tried everything now and its just getting worse. Seems like time to surrender...
– dmy
Nov 16 '18 at 12:03
@dmy. I'll play with it some more once I'm on a desktop. This is interesting. Remember to include the I/O in your timing. That's likely to be a bigger bottleneck than the in-memory stuff.
– Mad Physicist
Nov 16 '18 at 13:54
@dmy. I'll play with it some more once I'm on a desktop. This is interesting. Remember to include the I/O in your timing. That's likely to be a bigger bottleneck than the in-memory stuff.
– Mad Physicist
Nov 16 '18 at 13:54
I'm using asyncio for this reason so I made the CPU to be my bottleneck.
– dmy
Nov 16 '18 at 14:04
I'm using asyncio for this reason so I made the CPU to be my bottleneck.
– dmy
Nov 16 '18 at 14:04
If you're writing to a file, you're writing to a file. Writing x number of bytes will take a fixed amount of time, which will generally only go up if you try to parallelize with other disk operations.
– Mad Physicist
Nov 16 '18 at 14:07
If you're writing to a file, you're writing to a file. Writing x number of bytes will take a fixed amount of time, which will generally only go up if you try to parallelize with other disk operations.
– Mad Physicist
Nov 16 '18 at 14:07
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53320519%2fimprove-runtime-of-python-code-with-indexing-and-string-concatenation%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I have to ask, what does 'ade/cyt' mean?
– Mad Physicist
Nov 15 '18 at 13:32
You are relocating the string umpteen times when you do the join. Your best optimization may be to avoid constructing it in the first place. What is the final purpose of that string? Are you writing it somewhere?
– Mad Physicist
Nov 15 '18 at 13:36
@MadPhysicist
ade/cyt
refered to nucleotides (as was biologically wrong, btw). I write it to a text file.– dmy
Nov 15 '18 at 13:40
Why don't you replace the
'ade/gua'
and other strings with integers. For example 1 for'ade/gua'
, 2 for'ade/cyt'
, etc... And withdata = np.zeros(n, dtype=np.int64)
you dodata[ix] = elements[i]
. Or what is your expected output? How many combinations do you expect?– Scotty1-
Nov 15 '18 at 14:02
1
Can you show how you write the string exactly. I think your biggest speedup will come from not concatenating the whole thing if you're writing it to a file.
– Mad Physicist
Nov 16 '18 at 3:41