1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
lys-0071 [83]
3 years ago
6

Modify the WordCount program so it outputs the wordcount for each distinct word in each file. So the output of this DocWordCount

program should be of the form ‘word#####filename count’, where ‘#####’ serves as a delimiter between word and filename and tab serves as a delimiter between filename and count. Submit your source code in a file named DocWordCount.java.
Explanation: Consider two simple files file1.txt and file2.txt. $ echo "Hadoop is yellow Hadoop" > file1.txt $ echo "yellow Hadoop is an elephant" > file2.txt Running ‘DocWordCount.java’ on these two files will give an output similar to that below, where ##### is a delimiter.

Output of DocWordCount.java

yellow#####file2.txt 1

Hadoop#####file2.txt 1

is#####file2.txt 1

elephant#####file2.txt 1

yellow#####file1.txt 1

Hadoop#####file1.txt 2

is#####file1.txt 1

an#####file2.txt 1

Initial code that needs to be modified:

package org.myorg;

import java.io.IOException;
import java.util.regex.Pattern;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.log4j.Logger;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;


public class WordCount extends Configured implements Tool {

private static final Logger LOG = Logger .getLogger( WordCount.class);

public static void main( String[] args) throws Exception {
int res = ToolRunner .run( new WordCount(), args);
System .exit(res);
}

public int run( String[] args) throws Exception {
Job job = Job .getInstance(getConf(), " wordcount ");
job.setJarByClass( this .getClass());

FileInputFormat.addInputPaths(job, args[0]);
FileOutputFormat.setOutputPath(job, new Path(args[ 1]));
job.setMapperClass( Map .class);
job.setReducerClass( Reduce .class);
job.setOutputKeyClass( Text .class);
job.setOutputValueClass( IntWritable .class);

return job.waitForCompletion( true) ? 0 : 1;
}

public static class Map extends Mapper {
private final static IntWritable one = new IntWritable( 1);
private Text word = new Text();

private static final Pattern WORD_BOUNDARY = Pattern .compile("\\s*\\b\\s*");

public void map( LongWritable offset, Text lineText, Context context)
throws IOException, InterruptedException {

String line = lineText.toString();
Text currentWord = new Text();

for ( String word : WORD_BOUNDARY .split(line)) {
if (word.isEmpty()) {
continue;
}
currentWord = new Text(word);
context.write(currentWord,one);
}
}
}

public static class Reduce extends Reducer {
@Override
public void reduce( Text word, Iterable counts, Context context)
throws IOException, InterruptedException {
int sum = 0;
for ( IntWritable count : counts) {
sum += count.get();
}
context.write(word, new IntWritable(sum));
}
}
}
Computers and Technology
1 answer:
stepladder [879]3 years ago
8 0

Answer and Explanation:

package PackageDemo;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

public static void main(String [] args) throws Exception

{

Configuration c=new Configuration();

String[] files=new GenericOptionsParser(c,args).getRemainingArgs();

Path input=new Path(files[0]);

Path output=new Path(files[1]);

Job j=new Job(c,"wordcount");

j.setJarByClass(WordCount.class);

j.setMapperClass(MapForWordCount.class);

j.setReducerClass(ReduceForWordCount.class);

j.setOutputKeyClass(Text.class);

j.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(j, input);

FileOutputFormat.setOutputPath(j, output);

System.exit(j.waitForCompletion(true)?0:1);

}

public static class MapForWordCount extends Mapper<LongWritable, Text, Text, IntWritable>{

public void map(LongWritable key, Text value, Context con) throws IOException, InterruptedException

{

String line = value.toString();

String[] words=line.split(",");

for(String word: words )

{

Text outputKey = new Text(word.toUpperCase().trim());

IntWritable outputValue = new IntWritable(1);

con.write(outputKey, outputValue);

}

}

}

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>

{

public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException

{

int sum = 0;

for(IntWritable value : values)

{

sum += value.get();

}

con.write(word, new IntWritable(sum));

}

}

}

You might be interested in
4
kupik [55]

Answer:

the answer is D Smart Object

5 0
3 years ago
Read 2 more answers
Five Star Retro Video rents VHS tapes and DVDs to the same connoisseurs who like to buy LP record albums. The store rents new vi
umka2103 [35]

Answer:

The total cost is $13.0

Explanation:

Five Star Retro Video rents VHS tapes and DVDs to the same connoisseurs who like to buy LP record albums. The store rents new videos for $3.00 a night, and oldies for $2.00 a night.

Write a program that the clerks at Five Star Retro Video can use to calculate the total charge for a customer's video rentals.

The program should prompt the user for the number of each type of video and output the total cost.

8 0
3 years ago
Based on the following passage on construction technology during the Middle Ages, why might a worker not be allowed to join a gu
olya-2409 [2.1K]

Answer:

He was not born into a family of skilled laborers

Explanation:

6 0
3 years ago
Collaborative filtering is
vazorg [7]

Answer:

A: used by ISP's to filter out email SPAM

C: a way to help an individual focus on best choices when deciding what to watch or buy.

Explanation:

Collaborative filtering uses a community-based approach to filter spam. It works by collecting numerous email users from around the world. By doing this, it becomes possible for users to flag emails that are spam and those that are legitimate.

Also Collaborative Filtering is one of the most efficient techniques for building a system that can help a user when it comes to recommending best choices based on information from a large number of users.

4 0
3 years ago
Levi's an experienced marketing strategist for a software company that sells a learning platform to public schools. He developed
koban [17]

Answer:

B. He segments data using his company’s CRM dashboards, giving his organization access to data that powers their decision-making.

E. He encourages learning from failure, which is necessary for testing the possibilities and for learning what does not work.

Explanation:

He seems to have realized that learning from mistake is important, and that can be done through the research of the company's past. Also, he has also learnt to analyze the data as well, as he is able to constantly evolve company.s marketing strategies to fit to school's training needs. And this is impossible without analysis and research. It also looks like that he is a good learner, and loves exploring new things. And he must be using analytic software like Tableau OR Power BI, and he might be using Machine learning as well, and definitely the latest. And he is definitely a good manager.

And since its management level, B is definitely correct as it is reliable, and other options are not reliable.

6 0
3 years ago
Other questions:
  • A study guide can be created
    6·2 answers
  • Kyra needs help planning what images and text to use in her web page what technique can help her
    5·2 answers
  • Drag each label to the correct image.
    9·2 answers
  • When you open as many links as you want, and still stay in the same browser window instead of cluttering your screen with multip
    5·1 answer
  • When Judy logged on the network, she faced the message requesting that she changes her password. So, she changed her password. B
    10·1 answer
  • Hey yall wanna send me some just ask for my phone #
    13·1 answer
  • For a parking payment app, what option would MOST likely connect a user to a third party/external gateway?
    10·1 answer
  • David plays racing games on his way to work. He uses the analog stick to navigate his vehicle through other artificial intellige
    7·2 answers
  • 14. The heart of a computer is<br> a CPU<br> b. Memory<br> c. I/O Unit<br> d. Disks
    5·1 answer
  • A user calls the help desk reporting that a laptop with Linux freezes on startup and displays kernel panic. What could cause thi
    12·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!