Setting the Number of Reducers in a MapReduce job which is in an Oozie Workflow









up vote
2
down vote

favorite
1












I have a five node cluster, three nodes of which contain DataNodes and TaskTrackers.



I've imported around 10million rows from Oracle via Sqoop and process it via MapReduce in an Oozie workflow.



The MapReduce job takes about 30 minutes and is only using one reducer.



Edit - If I run the MapReduce code on its own, separate from Oozie, the job.setNumReduceTasks(4) correctly establishes 4 reducers.



I have tried the following methods to manually set the number of reducers to four, with no success:



In Oozie, set the following property in the tag of the map reduce node:



<property><name>mapred.reduce.tasks</name><value>4</value></property>


In the MapReduce java code's Main method:



Configuration conf = new Configuration();
Job job = new Job(conf, "10 million rows");
...
job.setNumReduceTasks(4);


I also tried:



Configuration conf = new Configuration();
Job job = new Job(conf, "10 million rows");
...
conf.set("mapred.reduce.tasks", "4");


My map function looks similar to this:



public void map(Text key, Text value, Context context) 
CustomObj customObj = new CustomObj(key.toString());
context.write(new Text(customObj.getId()), customObj);



I think there are something like 80,000 different values for the ID.



My Reduce function looks similar to this:



public void reduce(Text key, Iterable<CustomObj> vals, Context context) 
OtherCustomObj otherCustomObj = new OtherCustomObj();
...
context.write(null, otherCustomObj);



The custom object emitted in the Mapper implements WritableComparable, but the other custom objected emitted in the Reducer does not implement WritableComparable.



Here are the logs regarding the System counters, job counters, and map-reduce framework, where it specifies that only one reduce task was launched.



 map 100% reduce 100%
Job complete: job_201401131546_0425
Counters: 32
File System Counters
FILE: Number of bytes read=1370377216
FILE: Number of bytes written=2057213222
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=556345690
HDFS: Number of bytes written=166938092
HDFS: Number of read operations=18
HDFS: Number of large read operations=0
HDFS: Number of write operations=1
Job Counters
Launched map tasks=11
Launched reduce tasks=1
Data-local map tasks=11
Total time spent by all maps in occupied slots (ms)=1268296
Total time spent by all reduces in occupied slots (ms)=709774
Total time spent by all maps waiting after reserving slots (ms)=0
Total time spent by all reduces waiting after reserving slots (ms)=0
Map-Reduce Framework
Map input records=9440000
Map output records=9440000
Map output bytes=666308476
Input split bytes=1422
Combine input records=0
Combine output records=0
Reduce input groups=80000
Reduce shuffle bytes=685188530
Reduce input records=9440000
Reduce output records=2612760
Spilled Records=28320000
CPU time spent (ms)=1849500
Physical memory (bytes) snapshot=3581157376
Virtual memory (bytes) snapshot=15008251904
Total committed heap usage (bytes)=2848063488


Edit: I modified the MapReduce to introduce a custom partitioner, a sort comparator, and a grouping comparator. For some reason, the code now launches two reducers (when scheduled via Oozie), but not four.



I set the mapred.tasktracker.map.tasks.maximum property to 20 on each TaskTracker (and JobTracker), restarted them but no result.










share|improve this question























  • manually set custom partitioner to 4, in the implementation method segregate ID's based on some condition into 4 parts. This is just to test whether 4 partitions/reducers are executing.
    – Mr.Chowdary
    May 21 '15 at 4:34










  • whats the Hadoop version you are using ? Check that the property you are using for setting reducer is valid for that version or not
    – Preeti Khurana
    Sep 30 '16 at 6:07














up vote
2
down vote

favorite
1












I have a five node cluster, three nodes of which contain DataNodes and TaskTrackers.



I've imported around 10million rows from Oracle via Sqoop and process it via MapReduce in an Oozie workflow.



The MapReduce job takes about 30 minutes and is only using one reducer.



Edit - If I run the MapReduce code on its own, separate from Oozie, the job.setNumReduceTasks(4) correctly establishes 4 reducers.



I have tried the following methods to manually set the number of reducers to four, with no success:



In Oozie, set the following property in the tag of the map reduce node:



<property><name>mapred.reduce.tasks</name><value>4</value></property>


In the MapReduce java code's Main method:



Configuration conf = new Configuration();
Job job = new Job(conf, "10 million rows");
...
job.setNumReduceTasks(4);


I also tried:



Configuration conf = new Configuration();
Job job = new Job(conf, "10 million rows");
...
conf.set("mapred.reduce.tasks", "4");


My map function looks similar to this:



public void map(Text key, Text value, Context context) 
CustomObj customObj = new CustomObj(key.toString());
context.write(new Text(customObj.getId()), customObj);



I think there are something like 80,000 different values for the ID.



My Reduce function looks similar to this:



public void reduce(Text key, Iterable<CustomObj> vals, Context context) 
OtherCustomObj otherCustomObj = new OtherCustomObj();
...
context.write(null, otherCustomObj);



The custom object emitted in the Mapper implements WritableComparable, but the other custom objected emitted in the Reducer does not implement WritableComparable.



Here are the logs regarding the System counters, job counters, and map-reduce framework, where it specifies that only one reduce task was launched.



 map 100% reduce 100%
Job complete: job_201401131546_0425
Counters: 32
File System Counters
FILE: Number of bytes read=1370377216
FILE: Number of bytes written=2057213222
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=556345690
HDFS: Number of bytes written=166938092
HDFS: Number of read operations=18
HDFS: Number of large read operations=0
HDFS: Number of write operations=1
Job Counters
Launched map tasks=11
Launched reduce tasks=1
Data-local map tasks=11
Total time spent by all maps in occupied slots (ms)=1268296
Total time spent by all reduces in occupied slots (ms)=709774
Total time spent by all maps waiting after reserving slots (ms)=0
Total time spent by all reduces waiting after reserving slots (ms)=0
Map-Reduce Framework
Map input records=9440000
Map output records=9440000
Map output bytes=666308476
Input split bytes=1422
Combine input records=0
Combine output records=0
Reduce input groups=80000
Reduce shuffle bytes=685188530
Reduce input records=9440000
Reduce output records=2612760
Spilled Records=28320000
CPU time spent (ms)=1849500
Physical memory (bytes) snapshot=3581157376
Virtual memory (bytes) snapshot=15008251904
Total committed heap usage (bytes)=2848063488


Edit: I modified the MapReduce to introduce a custom partitioner, a sort comparator, and a grouping comparator. For some reason, the code now launches two reducers (when scheduled via Oozie), but not four.



I set the mapred.tasktracker.map.tasks.maximum property to 20 on each TaskTracker (and JobTracker), restarted them but no result.










share|improve this question























  • manually set custom partitioner to 4, in the implementation method segregate ID's based on some condition into 4 parts. This is just to test whether 4 partitions/reducers are executing.
    – Mr.Chowdary
    May 21 '15 at 4:34










  • whats the Hadoop version you are using ? Check that the property you are using for setting reducer is valid for that version or not
    – Preeti Khurana
    Sep 30 '16 at 6:07












up vote
2
down vote

favorite
1









up vote
2
down vote

favorite
1






1





I have a five node cluster, three nodes of which contain DataNodes and TaskTrackers.



I've imported around 10million rows from Oracle via Sqoop and process it via MapReduce in an Oozie workflow.



The MapReduce job takes about 30 minutes and is only using one reducer.



Edit - If I run the MapReduce code on its own, separate from Oozie, the job.setNumReduceTasks(4) correctly establishes 4 reducers.



I have tried the following methods to manually set the number of reducers to four, with no success:



In Oozie, set the following property in the tag of the map reduce node:



<property><name>mapred.reduce.tasks</name><value>4</value></property>


In the MapReduce java code's Main method:



Configuration conf = new Configuration();
Job job = new Job(conf, "10 million rows");
...
job.setNumReduceTasks(4);


I also tried:



Configuration conf = new Configuration();
Job job = new Job(conf, "10 million rows");
...
conf.set("mapred.reduce.tasks", "4");


My map function looks similar to this:



public void map(Text key, Text value, Context context) 
CustomObj customObj = new CustomObj(key.toString());
context.write(new Text(customObj.getId()), customObj);



I think there are something like 80,000 different values for the ID.



My Reduce function looks similar to this:



public void reduce(Text key, Iterable<CustomObj> vals, Context context) 
OtherCustomObj otherCustomObj = new OtherCustomObj();
...
context.write(null, otherCustomObj);



The custom object emitted in the Mapper implements WritableComparable, but the other custom objected emitted in the Reducer does not implement WritableComparable.



Here are the logs regarding the System counters, job counters, and map-reduce framework, where it specifies that only one reduce task was launched.



 map 100% reduce 100%
Job complete: job_201401131546_0425
Counters: 32
File System Counters
FILE: Number of bytes read=1370377216
FILE: Number of bytes written=2057213222
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=556345690
HDFS: Number of bytes written=166938092
HDFS: Number of read operations=18
HDFS: Number of large read operations=0
HDFS: Number of write operations=1
Job Counters
Launched map tasks=11
Launched reduce tasks=1
Data-local map tasks=11
Total time spent by all maps in occupied slots (ms)=1268296
Total time spent by all reduces in occupied slots (ms)=709774
Total time spent by all maps waiting after reserving slots (ms)=0
Total time spent by all reduces waiting after reserving slots (ms)=0
Map-Reduce Framework
Map input records=9440000
Map output records=9440000
Map output bytes=666308476
Input split bytes=1422
Combine input records=0
Combine output records=0
Reduce input groups=80000
Reduce shuffle bytes=685188530
Reduce input records=9440000
Reduce output records=2612760
Spilled Records=28320000
CPU time spent (ms)=1849500
Physical memory (bytes) snapshot=3581157376
Virtual memory (bytes) snapshot=15008251904
Total committed heap usage (bytes)=2848063488


Edit: I modified the MapReduce to introduce a custom partitioner, a sort comparator, and a grouping comparator. For some reason, the code now launches two reducers (when scheduled via Oozie), but not four.



I set the mapred.tasktracker.map.tasks.maximum property to 20 on each TaskTracker (and JobTracker), restarted them but no result.










share|improve this question















I have a five node cluster, three nodes of which contain DataNodes and TaskTrackers.



I've imported around 10million rows from Oracle via Sqoop and process it via MapReduce in an Oozie workflow.



The MapReduce job takes about 30 minutes and is only using one reducer.



Edit - If I run the MapReduce code on its own, separate from Oozie, the job.setNumReduceTasks(4) correctly establishes 4 reducers.



I have tried the following methods to manually set the number of reducers to four, with no success:



In Oozie, set the following property in the tag of the map reduce node:



<property><name>mapred.reduce.tasks</name><value>4</value></property>


In the MapReduce java code's Main method:



Configuration conf = new Configuration();
Job job = new Job(conf, "10 million rows");
...
job.setNumReduceTasks(4);


I also tried:



Configuration conf = new Configuration();
Job job = new Job(conf, "10 million rows");
...
conf.set("mapred.reduce.tasks", "4");


My map function looks similar to this:



public void map(Text key, Text value, Context context) 
CustomObj customObj = new CustomObj(key.toString());
context.write(new Text(customObj.getId()), customObj);



I think there are something like 80,000 different values for the ID.



My Reduce function looks similar to this:



public void reduce(Text key, Iterable<CustomObj> vals, Context context) 
OtherCustomObj otherCustomObj = new OtherCustomObj();
...
context.write(null, otherCustomObj);



The custom object emitted in the Mapper implements WritableComparable, but the other custom objected emitted in the Reducer does not implement WritableComparable.



Here are the logs regarding the System counters, job counters, and map-reduce framework, where it specifies that only one reduce task was launched.



 map 100% reduce 100%
Job complete: job_201401131546_0425
Counters: 32
File System Counters
FILE: Number of bytes read=1370377216
FILE: Number of bytes written=2057213222
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=556345690
HDFS: Number of bytes written=166938092
HDFS: Number of read operations=18
HDFS: Number of large read operations=0
HDFS: Number of write operations=1
Job Counters
Launched map tasks=11
Launched reduce tasks=1
Data-local map tasks=11
Total time spent by all maps in occupied slots (ms)=1268296
Total time spent by all reduces in occupied slots (ms)=709774
Total time spent by all maps waiting after reserving slots (ms)=0
Total time spent by all reduces waiting after reserving slots (ms)=0
Map-Reduce Framework
Map input records=9440000
Map output records=9440000
Map output bytes=666308476
Input split bytes=1422
Combine input records=0
Combine output records=0
Reduce input groups=80000
Reduce shuffle bytes=685188530
Reduce input records=9440000
Reduce output records=2612760
Spilled Records=28320000
CPU time spent (ms)=1849500
Physical memory (bytes) snapshot=3581157376
Virtual memory (bytes) snapshot=15008251904
Total committed heap usage (bytes)=2848063488


Edit: I modified the MapReduce to introduce a custom partitioner, a sort comparator, and a grouping comparator. For some reason, the code now launches two reducers (when scheduled via Oozie), but not four.



I set the mapred.tasktracker.map.tasks.maximum property to 20 on each TaskTracker (and JobTracker), restarted them but no result.







hadoop mapreduce reducers






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 24 '14 at 19:11

























asked Jan 22 '14 at 20:09









Matthew Moisen

4,5131662115




4,5131662115











  • manually set custom partitioner to 4, in the implementation method segregate ID's based on some condition into 4 parts. This is just to test whether 4 partitions/reducers are executing.
    – Mr.Chowdary
    May 21 '15 at 4:34










  • whats the Hadoop version you are using ? Check that the property you are using for setting reducer is valid for that version or not
    – Preeti Khurana
    Sep 30 '16 at 6:07
















  • manually set custom partitioner to 4, in the implementation method segregate ID's based on some condition into 4 parts. This is just to test whether 4 partitions/reducers are executing.
    – Mr.Chowdary
    May 21 '15 at 4:34










  • whats the Hadoop version you are using ? Check that the property you are using for setting reducer is valid for that version or not
    – Preeti Khurana
    Sep 30 '16 at 6:07















manually set custom partitioner to 4, in the implementation method segregate ID's based on some condition into 4 parts. This is just to test whether 4 partitions/reducers are executing.
– Mr.Chowdary
May 21 '15 at 4:34




manually set custom partitioner to 4, in the implementation method segregate ID's based on some condition into 4 parts. This is just to test whether 4 partitions/reducers are executing.
– Mr.Chowdary
May 21 '15 at 4:34












whats the Hadoop version you are using ? Check that the property you are using for setting reducer is valid for that version or not
– Preeti Khurana
Sep 30 '16 at 6:07




whats the Hadoop version you are using ? Check that the property you are using for setting reducer is valid for that version or not
– Preeti Khurana
Sep 30 '16 at 6:07












1 Answer
1






active

oldest

votes

















up vote
0
down vote













Just as a starting point what is the value of the following property in the mapred-site.xml



<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
</property>





share|improve this answer




















  • My mapred-site.xml on every node in my cluster is not set.
    – Matthew Moisen
    Jan 24 '14 at 1:54










  • then you will likely either get 2 or 1 reducer (depending on the default # reducers for that version). Consider using rsync to push out a mapred-site.xml to the slave nodes
    – javadba
    Jan 24 '14 at 2:07











  • Ok before I followed your instructions, I tested the MR code by itself and was able to launch 4 reducers. Next, I added a custom partitioner, sort comparator, and a grouping comparator to my MapReduce code, and scheduled it via Oozie, which managed to increase the number of reducers to 2. Finally I followed your instructions, rsyncing and verifying that each TaskTracker (and JobTracker) had a mapred.tasktracker.map.tasks.maximum of 20, but the oozie workflow still only launches two reducers.
    – Matthew Moisen
    Jan 24 '14 at 19:14










  • Looks like you added the correct setting to oozie already, so at this point I do not have additional suggestions.
    – javadba
    Jan 24 '14 at 19:53










  • Can you post your custom partitioner code?
    – Ravindra babu
    Dec 24 '15 at 5:58










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f21292940%2fsetting-the-number-of-reducers-in-a-mapreduce-job-which-is-in-an-oozie-workflow%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













Just as a starting point what is the value of the following property in the mapred-site.xml



<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
</property>





share|improve this answer




















  • My mapred-site.xml on every node in my cluster is not set.
    – Matthew Moisen
    Jan 24 '14 at 1:54










  • then you will likely either get 2 or 1 reducer (depending on the default # reducers for that version). Consider using rsync to push out a mapred-site.xml to the slave nodes
    – javadba
    Jan 24 '14 at 2:07











  • Ok before I followed your instructions, I tested the MR code by itself and was able to launch 4 reducers. Next, I added a custom partitioner, sort comparator, and a grouping comparator to my MapReduce code, and scheduled it via Oozie, which managed to increase the number of reducers to 2. Finally I followed your instructions, rsyncing and verifying that each TaskTracker (and JobTracker) had a mapred.tasktracker.map.tasks.maximum of 20, but the oozie workflow still only launches two reducers.
    – Matthew Moisen
    Jan 24 '14 at 19:14










  • Looks like you added the correct setting to oozie already, so at this point I do not have additional suggestions.
    – javadba
    Jan 24 '14 at 19:53










  • Can you post your custom partitioner code?
    – Ravindra babu
    Dec 24 '15 at 5:58














up vote
0
down vote













Just as a starting point what is the value of the following property in the mapred-site.xml



<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
</property>





share|improve this answer




















  • My mapred-site.xml on every node in my cluster is not set.
    – Matthew Moisen
    Jan 24 '14 at 1:54










  • then you will likely either get 2 or 1 reducer (depending on the default # reducers for that version). Consider using rsync to push out a mapred-site.xml to the slave nodes
    – javadba
    Jan 24 '14 at 2:07











  • Ok before I followed your instructions, I tested the MR code by itself and was able to launch 4 reducers. Next, I added a custom partitioner, sort comparator, and a grouping comparator to my MapReduce code, and scheduled it via Oozie, which managed to increase the number of reducers to 2. Finally I followed your instructions, rsyncing and verifying that each TaskTracker (and JobTracker) had a mapred.tasktracker.map.tasks.maximum of 20, but the oozie workflow still only launches two reducers.
    – Matthew Moisen
    Jan 24 '14 at 19:14










  • Looks like you added the correct setting to oozie already, so at this point I do not have additional suggestions.
    – javadba
    Jan 24 '14 at 19:53










  • Can you post your custom partitioner code?
    – Ravindra babu
    Dec 24 '15 at 5:58












up vote
0
down vote










up vote
0
down vote









Just as a starting point what is the value of the following property in the mapred-site.xml



<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
</property>





share|improve this answer












Just as a starting point what is the value of the following property in the mapred-site.xml



<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
</property>






share|improve this answer












share|improve this answer



share|improve this answer










answered Jan 24 '14 at 1:06









javadba

21.5k31143280




21.5k31143280











  • My mapred-site.xml on every node in my cluster is not set.
    – Matthew Moisen
    Jan 24 '14 at 1:54










  • then you will likely either get 2 or 1 reducer (depending on the default # reducers for that version). Consider using rsync to push out a mapred-site.xml to the slave nodes
    – javadba
    Jan 24 '14 at 2:07











  • Ok before I followed your instructions, I tested the MR code by itself and was able to launch 4 reducers. Next, I added a custom partitioner, sort comparator, and a grouping comparator to my MapReduce code, and scheduled it via Oozie, which managed to increase the number of reducers to 2. Finally I followed your instructions, rsyncing and verifying that each TaskTracker (and JobTracker) had a mapred.tasktracker.map.tasks.maximum of 20, but the oozie workflow still only launches two reducers.
    – Matthew Moisen
    Jan 24 '14 at 19:14










  • Looks like you added the correct setting to oozie already, so at this point I do not have additional suggestions.
    – javadba
    Jan 24 '14 at 19:53










  • Can you post your custom partitioner code?
    – Ravindra babu
    Dec 24 '15 at 5:58
















  • My mapred-site.xml on every node in my cluster is not set.
    – Matthew Moisen
    Jan 24 '14 at 1:54










  • then you will likely either get 2 or 1 reducer (depending on the default # reducers for that version). Consider using rsync to push out a mapred-site.xml to the slave nodes
    – javadba
    Jan 24 '14 at 2:07











  • Ok before I followed your instructions, I tested the MR code by itself and was able to launch 4 reducers. Next, I added a custom partitioner, sort comparator, and a grouping comparator to my MapReduce code, and scheduled it via Oozie, which managed to increase the number of reducers to 2. Finally I followed your instructions, rsyncing and verifying that each TaskTracker (and JobTracker) had a mapred.tasktracker.map.tasks.maximum of 20, but the oozie workflow still only launches two reducers.
    – Matthew Moisen
    Jan 24 '14 at 19:14










  • Looks like you added the correct setting to oozie already, so at this point I do not have additional suggestions.
    – javadba
    Jan 24 '14 at 19:53










  • Can you post your custom partitioner code?
    – Ravindra babu
    Dec 24 '15 at 5:58















My mapred-site.xml on every node in my cluster is not set.
– Matthew Moisen
Jan 24 '14 at 1:54




My mapred-site.xml on every node in my cluster is not set.
– Matthew Moisen
Jan 24 '14 at 1:54












then you will likely either get 2 or 1 reducer (depending on the default # reducers for that version). Consider using rsync to push out a mapred-site.xml to the slave nodes
– javadba
Jan 24 '14 at 2:07





then you will likely either get 2 or 1 reducer (depending on the default # reducers for that version). Consider using rsync to push out a mapred-site.xml to the slave nodes
– javadba
Jan 24 '14 at 2:07













Ok before I followed your instructions, I tested the MR code by itself and was able to launch 4 reducers. Next, I added a custom partitioner, sort comparator, and a grouping comparator to my MapReduce code, and scheduled it via Oozie, which managed to increase the number of reducers to 2. Finally I followed your instructions, rsyncing and verifying that each TaskTracker (and JobTracker) had a mapred.tasktracker.map.tasks.maximum of 20, but the oozie workflow still only launches two reducers.
– Matthew Moisen
Jan 24 '14 at 19:14




Ok before I followed your instructions, I tested the MR code by itself and was able to launch 4 reducers. Next, I added a custom partitioner, sort comparator, and a grouping comparator to my MapReduce code, and scheduled it via Oozie, which managed to increase the number of reducers to 2. Finally I followed your instructions, rsyncing and verifying that each TaskTracker (and JobTracker) had a mapred.tasktracker.map.tasks.maximum of 20, but the oozie workflow still only launches two reducers.
– Matthew Moisen
Jan 24 '14 at 19:14












Looks like you added the correct setting to oozie already, so at this point I do not have additional suggestions.
– javadba
Jan 24 '14 at 19:53




Looks like you added the correct setting to oozie already, so at this point I do not have additional suggestions.
– javadba
Jan 24 '14 at 19:53












Can you post your custom partitioner code?
– Ravindra babu
Dec 24 '15 at 5:58




Can you post your custom partitioner code?
– Ravindra babu
Dec 24 '15 at 5:58

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f21292940%2fsetting-the-number-of-reducers-in-a-mapreduce-job-which-is-in-an-oozie-workflow%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

What does pagestruct do in Eviews?

Dutch intervention in Lombok and Karangasem

Channel Islands