1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
lys-0071 [83]
3 years ago
6

Modify the WordCount program so it outputs the wordcount for each distinct word in each file. So the output of this DocWordCount

program should be of the form ‘word#####filename count’, where ‘#####’ serves as a delimiter between word and filename and tab serves as a delimiter between filename and count. Submit your source code in a file named DocWordCount.java.
Explanation: Consider two simple files file1.txt and file2.txt. $ echo "Hadoop is yellow Hadoop" > file1.txt $ echo "yellow Hadoop is an elephant" > file2.txt Running ‘DocWordCount.java’ on these two files will give an output similar to that below, where ##### is a delimiter.

Output of DocWordCount.java

yellow#####file2.txt 1

Hadoop#####file2.txt 1

is#####file2.txt 1

elephant#####file2.txt 1

yellow#####file1.txt 1

Hadoop#####file1.txt 2

is#####file1.txt 1

an#####file2.txt 1

Initial code that needs to be modified:

package org.myorg;

import java.io.IOException;
import java.util.regex.Pattern;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.log4j.Logger;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;


public class WordCount extends Configured implements Tool {

private static final Logger LOG = Logger .getLogger( WordCount.class);

public static void main( String[] args) throws Exception {
int res = ToolRunner .run( new WordCount(), args);
System .exit(res);
}

public int run( String[] args) throws Exception {
Job job = Job .getInstance(getConf(), " wordcount ");
job.setJarByClass( this .getClass());

FileInputFormat.addInputPaths(job, args[0]);
FileOutputFormat.setOutputPath(job, new Path(args[ 1]));
job.setMapperClass( Map .class);
job.setReducerClass( Reduce .class);
job.setOutputKeyClass( Text .class);
job.setOutputValueClass( IntWritable .class);

return job.waitForCompletion( true) ? 0 : 1;
}

public static class Map extends Mapper {
private final static IntWritable one = new IntWritable( 1);
private Text word = new Text();

private static final Pattern WORD_BOUNDARY = Pattern .compile("\\s*\\b\\s*");

public void map( LongWritable offset, Text lineText, Context context)
throws IOException, InterruptedException {

String line = lineText.toString();
Text currentWord = new Text();

for ( String word : WORD_BOUNDARY .split(line)) {
if (word.isEmpty()) {
continue;
}
currentWord = new Text(word);
context.write(currentWord,one);
}
}
}

public static class Reduce extends Reducer {
@Override
public void reduce( Text word, Iterable counts, Context context)
throws IOException, InterruptedException {
int sum = 0;
for ( IntWritable count : counts) {
sum += count.get();
}
context.write(word, new IntWritable(sum));
}
}
}
Computers and Technology
1 answer:
stepladder [879]3 years ago
8 0

Answer and Explanation:

package PackageDemo;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

public static void main(String [] args) throws Exception

{

Configuration c=new Configuration();

String[] files=new GenericOptionsParser(c,args).getRemainingArgs();

Path input=new Path(files[0]);

Path output=new Path(files[1]);

Job j=new Job(c,"wordcount");

j.setJarByClass(WordCount.class);

j.setMapperClass(MapForWordCount.class);

j.setReducerClass(ReduceForWordCount.class);

j.setOutputKeyClass(Text.class);

j.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(j, input);

FileOutputFormat.setOutputPath(j, output);

System.exit(j.waitForCompletion(true)?0:1);

}

public static class MapForWordCount extends Mapper<LongWritable, Text, Text, IntWritable>{

public void map(LongWritable key, Text value, Context con) throws IOException, InterruptedException

{

String line = value.toString();

String[] words=line.split(",");

for(String word: words )

{

Text outputKey = new Text(word.toUpperCase().trim());

IntWritable outputValue = new IntWritable(1);

con.write(outputKey, outputValue);

}

}

}

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>

{

public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException

{

int sum = 0;

for(IntWritable value : values)

{

sum += value.get();

}

con.write(word, new IntWritable(sum));

}

}

}

You might be interested in
What department is cyber security
Paul [167]

Answer:

United States Department of Homeland Security

The National Cyber Security Division (NCSD) is a division of the Office of Cyber Security & Communications, within the United States Department of Homeland Security's Cybersecurity and Infrastructure Security Agency.

<em>Give</em><em> </em><em>me</em><em> </em><em>brainliest</em><em> </em>

8 0
3 years ago
Read 2 more answers
Who all likes fortnite
Debora [2.8K]

Answer:

my brother

Explanation:

5 0
3 years ago
Read 2 more answers
Analyze the following code. Which of the following statements is correct?
matrenka [14]

Answer:

that is complex wow

Explanation:

8 0
3 years ago
Question :
BlackZzzverrR [31]

Answer:

D

Explanation:

Search engine optimization (SEO)

4 0
3 years ago
he function below takes one parameter: a list of strings (string_list). Complete the function to return a new list containing on
Aleks04 [339]

Answer:

def select_short_strings(string_list):

   new_list = []

   

   for s in string_list:

       if len(s) < 20:

           new_list.append(s)

   return new_list

   

lst = ["apple", "I am learning Python and it is fun!", "I love programming, it is easy", "orange"]

print(select_short_strings(lst))

Explanation:

- Create a function called <em>select_short_strings</em> that takes one argument <em>string_list</em>

Inside the function:

- Initialize an empty list to hold the strings that are less than 20

- Inside the loop, check the strings inside <em>string_list</em> has a length that is smaller than 20. If found one, put it to the <em>new_list</em>.

- When the loop is done, return the <em>new_list</em>

- Create a list to check and call the function

4 0
3 years ago
Read 2 more answers
Other questions:
  • You have a network that needs 29 subnets while maximizing the number of host addresses available on each subnet. How many bits m
    13·1 answer
  • What types of automation device might require specialist installer training?
    5·1 answer
  • In Microsoft Word, when you highlight existing text you want to replace, you are in what
    11·1 answer
  • The frequencies licensed by telecommunication firms to provide wireless service is known as _____.
    13·1 answer
  • How is the internet made?
    7·1 answer
  • A simulation system is a technology that enables you to take over a customer’s screen, mouse, or other connected device in order
    13·1 answer
  • 3. Describe at least 3 nonprice competition strategies a company could use to convince customers that its product is better than
    5·2 answers
  • ARGENT !!20 POINTS <br> А ________ translates commands from a computer to draw lines on paper.
    10·2 answers
  • The ______ Works on a single variable or constant. *​
    8·1 answer
  • An e-commerce client is moving from on-premise, legacy systems to a cloud-based platform. During the transition, the client is a
    6·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!