Calculate Pi algorithm from a tutorial using OpenMP

up vote
1
down vote

favorite

I am studying this tutorial about OpenMP and I came across this exercise, on page 19. It is a pi calculation algorithm which I have to parallelize:

static long num_steps = 100000;
double step;
void main ()

 int i;
 double x, pi
 double sum = 0.0;
 step = 1.0 / (double)num_steps;

 for(i = 0; i < num_steps; i++)
 
 x = (I + 0.5) * step;
 sum = sum + 4.0 / (1.0 + x*x);
 

 pi = step * sum;

I can not use, up to this point, #pragma parallel for. I can only use:

#pragma omp parallel 
omp_get_thread_num();
omp_set_num_threads(int);
omp_get_num_threads();

My implementation looks like this :

#define NUM_STEPS 800

int main(int argc, char **argv)

 int num_steps = NUM_STEPS;
 int i;
 double x;
 double pi;
 double step = 1.0 / (double)num_steps;

 double sum[num_steps];

 for(i = 0; i < num_steps; i++)
 
 sum[i] = 0;
 

 omp_set_num_threads(num_steps);
 #pragma omp parallel
 
 x = (omp_get_thread_num() + 0.5) * step;
 sum[omp_get_thread_num()] += 4.0 / (1.0 + x * x);
 

 double totalSum = 0;

 for(i = 0; i < num_steps; i++)
 
 totalSum += sum[i];
 

 pi = step * totalSum;

 printf("Pi: %.5f", pi);

Ignoring the problem by using an sum array (It explains later that it needs to define a critical section for the sum value with #pragma omp critical or #pragma omp atomic), the above impelentation only works for a limited number of threads (800 in my case), where the serial code uses 100000 steps. Is there a way to achieve this with only the aforementioned OpenMP commands, or am I obliged to use #pragma omp parallel for, which hasn't been mentioned yet in the tutorial?

Thanks a lot for your time, I am really trying to grasp the concept of parallelization in C using OpenMP.

asked Nov 11 at 1:01

Vector Sigma

1084

Are you allowed to use #pragma omp atomic?
– Increasingly Idiotic
Nov 11 at 2:52

@IncreasinglyIdiotic It is explained later the usefulness of #pragma omp atomic, but only regarding sum value. How can I use it to solve the "too many threads" issue? Is there a way, without having to use parallel for?
– Vector Sigma
Nov 11 at 2:57

This tutorial is regularly sending confused learners to StackOverflow. I suggest you look for a learning material that follows a more idiomatic high-level approach instead of explaining OpenMP bottom-up. Maybe it works if you are getting a live-workshop, but it definitely does not when reading/watching the material online.
– Zulan
Nov 11 at 7:11

@Zulan [ad]Stackoverflow: turning confusion to knowledge since 2008 :-D[/ad]
– Vector Sigma
Nov 11 at 22:11

add a comment |

up vote
1
down vote

favorite

I am studying this tutorial about OpenMP and I came across this exercise, on page 19. It is a pi calculation algorithm which I have to parallelize:

static long num_steps = 100000;
double step;
void main ()

 int i;
 double x, pi
 double sum = 0.0;
 step = 1.0 / (double)num_steps;

 for(i = 0; i < num_steps; i++)
 
 x = (I + 0.5) * step;
 sum = sum + 4.0 / (1.0 + x*x);
 

 pi = step * sum;

I can not use, up to this point, #pragma parallel for. I can only use:

#pragma omp parallel 
omp_get_thread_num();
omp_set_num_threads(int);
omp_get_num_threads();

My implementation looks like this :

#define NUM_STEPS 800

int main(int argc, char **argv)

 int num_steps = NUM_STEPS;
 int i;
 double x;
 double pi;
 double step = 1.0 / (double)num_steps;

 double sum[num_steps];

 for(i = 0; i < num_steps; i++)
 
 sum[i] = 0;
 

 omp_set_num_threads(num_steps);
 #pragma omp parallel
 
 x = (omp_get_thread_num() + 0.5) * step;
 sum[omp_get_thread_num()] += 4.0 / (1.0 + x * x);
 

 double totalSum = 0;

 for(i = 0; i < num_steps; i++)
 
 totalSum += sum[i];
 

 pi = step * totalSum;

 printf("Pi: %.5f", pi);

Thanks a lot for your time, I am really trying to grasp the concept of parallelization in C using OpenMP.

asked Nov 11 at 1:01

Vector Sigma

1084

Are you allowed to use #pragma omp atomic?
– Increasingly Idiotic
Nov 11 at 2:52

@IncreasinglyIdiotic It is explained later the usefulness of #pragma omp atomic, but only regarding sum value. How can I use it to solve the "too many threads" issue? Is there a way, without having to use parallel for?
– Vector Sigma
Nov 11 at 2:57

This tutorial is regularly sending confused learners to StackOverflow. I suggest you look for a learning material that follows a more idiomatic high-level approach instead of explaining OpenMP bottom-up. Maybe it works if you are getting a live-workshop, but it definitely does not when reading/watching the material online.
– Zulan
Nov 11 at 7:11

@Zulan [ad]Stackoverflow: turning confusion to knowledge since 2008 :-D[/ad]
– Vector Sigma
Nov 11 at 22:11

add a comment |

up vote
1
down vote

favorite

I am studying this tutorial about OpenMP and I came across this exercise, on page 19. It is a pi calculation algorithm which I have to parallelize:

static long num_steps = 100000;
double step;
void main ()

 int i;
 double x, pi
 double sum = 0.0;
 step = 1.0 / (double)num_steps;

 for(i = 0; i < num_steps; i++)
 
 x = (I + 0.5) * step;
 sum = sum + 4.0 / (1.0 + x*x);
 

 pi = step * sum;

I can not use, up to this point, #pragma parallel for. I can only use:

#pragma omp parallel 
omp_get_thread_num();
omp_set_num_threads(int);
omp_get_num_threads();

My implementation looks like this :

#define NUM_STEPS 800

int main(int argc, char **argv)

 int num_steps = NUM_STEPS;
 int i;
 double x;
 double pi;
 double step = 1.0 / (double)num_steps;

 double sum[num_steps];

 for(i = 0; i < num_steps; i++)
 
 sum[i] = 0;
 

 omp_set_num_threads(num_steps);
 #pragma omp parallel
 
 x = (omp_get_thread_num() + 0.5) * step;
 sum[omp_get_thread_num()] += 4.0 / (1.0 + x * x);
 

 double totalSum = 0;

 for(i = 0; i < num_steps; i++)
 
 totalSum += sum[i];
 

 pi = step * totalSum;

 printf("Pi: %.5f", pi);

Thanks a lot for your time, I am really trying to grasp the concept of parallelization in C using OpenMP.

asked Nov 11 at 1:01

Vector Sigma

1084

I am studying this tutorial about OpenMP and I came across this exercise, on page 19. It is a pi calculation algorithm which I have to parallelize:

static long num_steps = 100000;
double step;
void main ()

 int i;
 double x, pi
 double sum = 0.0;
 step = 1.0 / (double)num_steps;

 for(i = 0; i < num_steps; i++)
 
 x = (I + 0.5) * step;
 sum = sum + 4.0 / (1.0 + x*x);
 

 pi = step * sum;

I can not use, up to this point, #pragma parallel for. I can only use:

#pragma omp parallel 
omp_get_thread_num();
omp_set_num_threads(int);
omp_get_num_threads();

My implementation looks like this :

#define NUM_STEPS 800

int main(int argc, char **argv)

 int num_steps = NUM_STEPS;
 int i;
 double x;
 double pi;
 double step = 1.0 / (double)num_steps;

 double sum[num_steps];

 for(i = 0; i < num_steps; i++)
 
 sum[i] = 0;
 

 omp_set_num_threads(num_steps);
 #pragma omp parallel
 
 x = (omp_get_thread_num() + 0.5) * step;
 sum[omp_get_thread_num()] += 4.0 / (1.0 + x * x);
 

 double totalSum = 0;

 for(i = 0; i < num_steps; i++)
 
 totalSum += sum[i];
 

 pi = step * totalSum;

 printf("Pi: %.5f", pi);

Thanks a lot for your time, I am really trying to grasp the concept of parallelization in C using OpenMP.

c parallel-processing openmp

asked Nov 11 at 1:01

Vector Sigma

1084

asked Nov 11 at 1:01

Vector Sigma

1084

asked Nov 11 at 1:01

Vector Sigma

1084

asked Nov 11 at 1:01

Vector Sigma

1084

asked Nov 11 at 1:01

Vector Sigma

1084

Are you allowed to use #pragma omp atomic?
– Increasingly Idiotic
Nov 11 at 2:52

@IncreasinglyIdiotic It is explained later the usefulness of #pragma omp atomic, but only regarding sum value. How can I use it to solve the "too many threads" issue? Is there a way, without having to use parallel for?
– Vector Sigma
Nov 11 at 2:57

This tutorial is regularly sending confused learners to StackOverflow. I suggest you look for a learning material that follows a more idiomatic high-level approach instead of explaining OpenMP bottom-up. Maybe it works if you are getting a live-workshop, but it definitely does not when reading/watching the material online.
– Zulan
Nov 11 at 7:11

@Zulan [ad]Stackoverflow: turning confusion to knowledge since 2008 :-D[/ad]
– Vector Sigma
Nov 11 at 22:11

add a comment |

Are you allowed to use #pragma omp atomic?
– Increasingly Idiotic
Nov 11 at 2:52

@IncreasinglyIdiotic It is explained later the usefulness of #pragma omp atomic, but only regarding sum value. How can I use it to solve the "too many threads" issue? Is there a way, without having to use parallel for?
– Vector Sigma
Nov 11 at 2:57

This tutorial is regularly sending confused learners to StackOverflow. I suggest you look for a learning material that follows a more idiomatic high-level approach instead of explaining OpenMP bottom-up. Maybe it works if you are getting a live-workshop, but it definitely does not when reading/watching the material online.
– Zulan
Nov 11 at 7:11

@Zulan [ad]Stackoverflow: turning confusion to knowledge since 2008 :-D[/ad]
– Vector Sigma
Nov 11 at 22:11

Are you allowed to use #pragma omp atomic?
– Increasingly Idiotic
Nov 11 at 2:52

@IncreasinglyIdiotic It is explained later the usefulness of #pragma omp atomic, but only regarding sum value. How can I use it to solve the "too many threads" issue? Is there a way, without having to use parallel for?
– Vector Sigma
Nov 11 at 2:57

This tutorial is regularly sending confused learners to StackOverflow. I suggest you look for a learning material that follows a more idiomatic high-level approach instead of explaining OpenMP bottom-up. Maybe it works if you are getting a live-workshop, but it definitely does not when reading/watching the material online.
– Zulan
Nov 11 at 7:11

@Zulan [ad]Stackoverflow: turning confusion to knowledge since 2008 :-D[/ad]
– Vector Sigma
Nov 11 at 22:11

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You will need to find a way to make your parallel algorithm somewhat independent from the number of threads.

The most simple way is to do something like:

int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

for (int i = tid; i < num_steps; i += n_threads) 
 // ...

This way the work is split across all threads regardless of the number of threads.

If there were 3 threads and 9 steps:

Thread 0 would do steps 0, 3, 6

Thread 1 would do steps 1, 4, 7

Thread 2 would do steps 2, 5, 8

This works but isn't ideal if each thread is accessing data from some shared array. It is better if threads access sections of data nearby for locality purposes.

In that case you can divide the number of steps by the number of threads and give each thread a contiguous set of tasks like so:

int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

int steps_per_thread = num_steps / n_threads;
int start = tid * steps_per_thread;
int end = start + steps_per_thread;

for (int i = start; i < end; i++) 
 // ...

Now the 3 threads performing 9 steps looks like:

Thread 0 does steps 0, 1, 2

Thread 1 does steps 3, 4, 5

Thread 2 does steps 6, 7, 8

This approach is actually what is most likely happening when #pragma omp for is used. In most cases the compiler just divides the tasks according to the number of threads and assigns each thread a section.

So given a set of 2 threads and a 100 iteration for loop, the compiler would likely give iterations 0-49 to thread 0 and iterations 50-99 to thread 1.

Note that if the number of iterations does not divide evenly by the number of threads the remainder needs to be handled explicitly.

answered Nov 11 at 5:53

Increasingly Idiotic

2,1282829

Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
– Vector Sigma
Nov 11 at 22:09

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244922%2fcalculate-pi-algorithm-from-a-tutorial-using-openmp%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You will need to find a way to make your parallel algorithm somewhat independent from the number of threads.

The most simple way is to do something like:

int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

for (int i = tid; i < num_steps; i += n_threads) 
 // ...

This way the work is split across all threads regardless of the number of threads.

If there were 3 threads and 9 steps:

Thread 0 would do steps 0, 3, 6

Thread 1 would do steps 1, 4, 7

Thread 2 would do steps 2, 5, 8

This works but isn't ideal if each thread is accessing data from some shared array. It is better if threads access sections of data nearby for locality purposes.

In that case you can divide the number of steps by the number of threads and give each thread a contiguous set of tasks like so:

int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

int steps_per_thread = num_steps / n_threads;
int start = tid * steps_per_thread;
int end = start + steps_per_thread;

for (int i = start; i < end; i++) 
 // ...

Now the 3 threads performing 9 steps looks like:

Thread 0 does steps 0, 1, 2

Thread 1 does steps 3, 4, 5

Thread 2 does steps 6, 7, 8

So given a set of 2 threads and a 100 iteration for loop, the compiler would likely give iterations 0-49 to thread 0 and iterations 50-99 to thread 1.

Note that if the number of iterations does not divide evenly by the number of threads the remainder needs to be handled explicitly.

answered Nov 11 at 5:53

Increasingly Idiotic

2,1282829

Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
– Vector Sigma
Nov 11 at 22:09

add a comment |

up vote
1
down vote

accepted

You will need to find a way to make your parallel algorithm somewhat independent from the number of threads.

The most simple way is to do something like:

int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

for (int i = tid; i < num_steps; i += n_threads) 
 // ...

This way the work is split across all threads regardless of the number of threads.

If there were 3 threads and 9 steps:

Thread 0 would do steps 0, 3, 6

Thread 1 would do steps 1, 4, 7

Thread 2 would do steps 2, 5, 8

This works but isn't ideal if each thread is accessing data from some shared array. It is better if threads access sections of data nearby for locality purposes.

In that case you can divide the number of steps by the number of threads and give each thread a contiguous set of tasks like so:

int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

int steps_per_thread = num_steps / n_threads;
int start = tid * steps_per_thread;
int end = start + steps_per_thread;

for (int i = start; i < end; i++) 
 // ...

Now the 3 threads performing 9 steps looks like:

Thread 0 does steps 0, 1, 2

Thread 1 does steps 3, 4, 5

Thread 2 does steps 6, 7, 8

So given a set of 2 threads and a 100 iteration for loop, the compiler would likely give iterations 0-49 to thread 0 and iterations 50-99 to thread 1.

Note that if the number of iterations does not divide evenly by the number of threads the remainder needs to be handled explicitly.

answered Nov 11 at 5:53

Increasingly Idiotic

2,1282829

Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
– Vector Sigma
Nov 11 at 22:09

add a comment |

up vote
1
down vote

accepted

You will need to find a way to make your parallel algorithm somewhat independent from the number of threads.

The most simple way is to do something like:

int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

for (int i = tid; i < num_steps; i += n_threads) 
 // ...

This way the work is split across all threads regardless of the number of threads.

If there were 3 threads and 9 steps:

Thread 0 would do steps 0, 3, 6

Thread 1 would do steps 1, 4, 7

Thread 2 would do steps 2, 5, 8

This works but isn't ideal if each thread is accessing data from some shared array. It is better if threads access sections of data nearby for locality purposes.

In that case you can divide the number of steps by the number of threads and give each thread a contiguous set of tasks like so:

int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

int steps_per_thread = num_steps / n_threads;
int start = tid * steps_per_thread;
int end = start + steps_per_thread;

for (int i = start; i < end; i++) 
 // ...

Now the 3 threads performing 9 steps looks like:

Thread 0 does steps 0, 1, 2

Thread 1 does steps 3, 4, 5

Thread 2 does steps 6, 7, 8

So given a set of 2 threads and a 100 iteration for loop, the compiler would likely give iterations 0-49 to thread 0 and iterations 50-99 to thread 1.

Note that if the number of iterations does not divide evenly by the number of threads the remainder needs to be handled explicitly.

answered Nov 11 at 5:53

Increasingly Idiotic

2,1282829

You will need to find a way to make your parallel algorithm somewhat independent from the number of threads.

The most simple way is to do something like:

int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

for (int i = tid; i < num_steps; i += n_threads) 
 // ...

This way the work is split across all threads regardless of the number of threads.

If there were 3 threads and 9 steps:

Thread 0 would do steps 0, 3, 6

Thread 1 would do steps 1, 4, 7

Thread 2 would do steps 2, 5, 8

This works but isn't ideal if each thread is accessing data from some shared array. It is better if threads access sections of data nearby for locality purposes.

In that case you can divide the number of steps by the number of threads and give each thread a contiguous set of tasks like so:

int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

int steps_per_thread = num_steps / n_threads;
int start = tid * steps_per_thread;
int end = start + steps_per_thread;

for (int i = start; i < end; i++) 
 // ...

Now the 3 threads performing 9 steps looks like:

Thread 0 does steps 0, 1, 2

Thread 1 does steps 3, 4, 5

Thread 2 does steps 6, 7, 8

So given a set of 2 threads and a 100 iteration for loop, the compiler would likely give iterations 0-49 to thread 0 and iterations 50-99 to thread 1.

Note that if the number of iterations does not divide evenly by the number of threads the remainder needs to be handled explicitly.

answered Nov 11 at 5:53

Increasingly Idiotic

2,1282829

answered Nov 11 at 5:53

Increasingly Idiotic

2,1282829

answered Nov 11 at 5:53

Increasingly Idiotic

2,1282829

answered Nov 11 at 5:53

Increasingly Idiotic

2,1282829

Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
– Vector Sigma
Nov 11 at 22:09

add a comment |

Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
– Vector Sigma
Nov 11 at 22:09

Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
– Vector Sigma
Nov 11 at 22:09

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

ElCd6G 4qPVXzb jKhcKhpC4vzGv

搜尋此網誌

Odtnhj