1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
lys-0071 [83]
3 years ago
6

Modify the WordCount program so it outputs the wordcount for each distinct word in each file. So the output of this DocWordCount

program should be of the form ‘word#####filename count’, where ‘#####’ serves as a delimiter between word and filename and tab serves as a delimiter between filename and count. Submit your source code in a file named DocWordCount.java.
Explanation: Consider two simple files file1.txt and file2.txt. $ echo "Hadoop is yellow Hadoop" > file1.txt $ echo "yellow Hadoop is an elephant" > file2.txt Running ‘DocWordCount.java’ on these two files will give an output similar to that below, where ##### is a delimiter.

Output of DocWordCount.java

yellow#####file2.txt 1

Hadoop#####file2.txt 1

is#####file2.txt 1

elephant#####file2.txt 1

yellow#####file1.txt 1

Hadoop#####file1.txt 2

is#####file1.txt 1

an#####file2.txt 1

Initial code that needs to be modified:

package org.myorg;

import java.io.IOException;
import java.util.regex.Pattern;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.log4j.Logger;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;


public class WordCount extends Configured implements Tool {

private static final Logger LOG = Logger .getLogger( WordCount.class);

public static void main( String[] args) throws Exception {
int res = ToolRunner .run( new WordCount(), args);
System .exit(res);
}

public int run( String[] args) throws Exception {
Job job = Job .getInstance(getConf(), " wordcount ");
job.setJarByClass( this .getClass());

FileInputFormat.addInputPaths(job, args[0]);
FileOutputFormat.setOutputPath(job, new Path(args[ 1]));
job.setMapperClass( Map .class);
job.setReducerClass( Reduce .class);
job.setOutputKeyClass( Text .class);
job.setOutputValueClass( IntWritable .class);

return job.waitForCompletion( true) ? 0 : 1;
}

public static class Map extends Mapper {
private final static IntWritable one = new IntWritable( 1);
private Text word = new Text();

private static final Pattern WORD_BOUNDARY = Pattern .compile("\\s*\\b\\s*");

public void map( LongWritable offset, Text lineText, Context context)
throws IOException, InterruptedException {

String line = lineText.toString();
Text currentWord = new Text();

for ( String word : WORD_BOUNDARY .split(line)) {
if (word.isEmpty()) {
continue;
}
currentWord = new Text(word);
context.write(currentWord,one);
}
}
}

public static class Reduce extends Reducer {
@Override
public void reduce( Text word, Iterable counts, Context context)
throws IOException, InterruptedException {
int sum = 0;
for ( IntWritable count : counts) {
sum += count.get();
}
context.write(word, new IntWritable(sum));
}
}
}
Computers and Technology
1 answer:
stepladder [879]3 years ago
8 0

Answer and Explanation:

package PackageDemo;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

public static void main(String [] args) throws Exception

{

Configuration c=new Configuration();

String[] files=new GenericOptionsParser(c,args).getRemainingArgs();

Path input=new Path(files[0]);

Path output=new Path(files[1]);

Job j=new Job(c,"wordcount");

j.setJarByClass(WordCount.class);

j.setMapperClass(MapForWordCount.class);

j.setReducerClass(ReduceForWordCount.class);

j.setOutputKeyClass(Text.class);

j.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(j, input);

FileOutputFormat.setOutputPath(j, output);

System.exit(j.waitForCompletion(true)?0:1);

}

public static class MapForWordCount extends Mapper<LongWritable, Text, Text, IntWritable>{

public void map(LongWritable key, Text value, Context con) throws IOException, InterruptedException

{

String line = value.toString();

String[] words=line.split(",");

for(String word: words )

{

Text outputKey = new Text(word.toUpperCase().trim());

IntWritable outputValue = new IntWritable(1);

con.write(outputKey, outputValue);

}

}

}

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>

{

public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException

{

int sum = 0;

for(IntWritable value : values)

{

sum += value.get();

}

con.write(word, new IntWritable(sum));

}

}

}

You might be interested in
____ software is rights-protection software used to control the use of a work.
jeka57 [31]
<span>Cyclos is a project of STRO, a leading organisation on monetary innovations. Cyclos offers a complete on-line payment system with additional modules such as e-commerce and communication tools. [ [ The Cyclos platform permits institutions such as local banks and MFI`s to offer banking services that can stimulate local trade and development. Cyclos is also used by many organizations and</span>
3 0
3 years ago
Given the int variables x, y, and z, write a fragment of code that assigns the smallest of x, y, and z to another int variable m
Marysya12 [62]

Answer:

// here is code in C++.

#include <bits/stdc++.h>

using namespace std;

// main function

int main()

{

   // variables

   int x=5,y=2,z=9;

   int min;

   // find the smallest value and assign to min

   // if x is smallest

   if(x < y && x < z)

   // assign x to min

    min=x;

     // if y is smallest

else if(y < z)

 // assign y to min

    min=y;

// if z is smallest

else

 // assign z to min

    min=z;

// print the smallest

cout<<"smallest value is:"<<min<<endl;

return 0;

}

Explanation:

Declare and initialize variables x=5,y=2 and z=9.Then check if x is less than y and x is less than z, assign value of x to variable "min" .Else if value of y is less than value of z then smallest value is y, assign value of y to "min".Else z will be the smallest value, assign its value to "min".

Output:

smallest value is:2

6 0
3 years ago
Why is code tracing important when debugging?
larisa [96]

It helps reveal the flow of execution of your program, including results of in-between evaluations. In other words, you can see what your program is doing, and why it takes the decisions it is taking.

If something unexpected happens, the trace will show you the sequence of events that lead to it.

7 0
3 years ago
In what way can a costume be deleted ?
alex41 [277]

Answer:

Clicking the "X" button towards the upper-right of each costume's icon in the Costume Pane will delete

Explanation:

5 0
2 years ago
Assume that name has been declared suitably for storing names (like "Amy", "Fritz" and "Moustafa"). Assume also that stdin is a
aniked [119]

Answer:

void main(){

string name;

printf("Enter Name\n");

stdin("%s",&name);

Printf("\nGreetings %s",name);

}

Explanation:

Here scanf is represented by stdin and we are using that scanner object to read the string value from user.The value which we read are printed in a new line using printf .The format specifier %s in printf is replaced by name variable

3 0
3 years ago
Other questions:
  • Sizing handles are used in Microsoft® Word® to _____.
    13·1 answer
  • Which best describe a resource each student could use to find information
    6·2 answers
  • Referential integrity states that:______.
    15·1 answer
  • If the speakers are not working on a laptop, what could be the problem besides the speakers?
    5·1 answer
  • PLEASE HELP!!!!
    6·2 answers
  • Which of these ia an example of gene flow?
    5·1 answer
  • What will happen when you drag and drop a worksheet tab into another workbook WITHOUT holding the Ctrl key down?
    13·2 answers
  • What does this mean??
    11·2 answers
  • Jason works for a restaurant that serves only organic, local produce. What
    15·2 answers
  • Read each of the following statements about Computer Science and explain why you think that statement is true.
    9·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!