As Praveen mentions above, when using the basic FileInputFormat
classes is just the number of input splits that constitute the data. The number of reducers is controlled by mapred.reduce.tasks
specified in the way you have it: -D mapred.reduce.tasks=10
would specify 10 reducers. Note that the space after -D
is required; if you omit the space, the configuration property is passed along to the relevant JVM, not to Hadoop.
Are you specifying 0
because there is no reduce work to do? In that case, if you're having trouble with the run-time parameter, you can also set the value directly in code. Given a JobConf
instance job
, call
job.setNumReduceTasks(0);
inside, say, your implementation of Tool.run
. That should produce output directly from the mappers. If your job actually produces no output whatsoever (because you're using the framework just for side-effects like network calls or image processing, or if the results are entirely accounted for in Counter values), you can disable output by also calling
job.setOutputFormat(NullOutputFormat.class);