1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
lys-0071 [83]
3 years ago
6

Modify the WordCount program so it outputs the wordcount for each distinct word in each file. So the output of this DocWordCount

program should be of the form ‘word#####filename count’, where ‘#####’ serves as a delimiter between word and filename and tab serves as a delimiter between filename and count. Submit your source code in a file named DocWordCount.java.
Explanation: Consider two simple files file1.txt and file2.txt. $ echo "Hadoop is yellow Hadoop" > file1.txt $ echo "yellow Hadoop is an elephant" > file2.txt Running ‘DocWordCount.java’ on these two files will give an output similar to that below, where ##### is a delimiter.

Output of DocWordCount.java

yellow#####file2.txt 1

Hadoop#####file2.txt 1

is#####file2.txt 1

elephant#####file2.txt 1

yellow#####file1.txt 1

Hadoop#####file1.txt 2

is#####file1.txt 1

an#####file2.txt 1

Initial code that needs to be modified:

package org.myorg;

import java.io.IOException;
import java.util.regex.Pattern;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.log4j.Logger;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;


public class WordCount extends Configured implements Tool {

private static final Logger LOG = Logger .getLogger( WordCount.class);

public static void main( String[] args) throws Exception {
int res = ToolRunner .run( new WordCount(), args);
System .exit(res);
}

public int run( String[] args) throws Exception {
Job job = Job .getInstance(getConf(), " wordcount ");
job.setJarByClass( this .getClass());

FileInputFormat.addInputPaths(job, args[0]);
FileOutputFormat.setOutputPath(job, new Path(args[ 1]));
job.setMapperClass( Map .class);
job.setReducerClass( Reduce .class);
job.setOutputKeyClass( Text .class);
job.setOutputValueClass( IntWritable .class);

return job.waitForCompletion( true) ? 0 : 1;
}

public static class Map extends Mapper {
private final static IntWritable one = new IntWritable( 1);
private Text word = new Text();

private static final Pattern WORD_BOUNDARY = Pattern .compile("\\s*\\b\\s*");

public void map( LongWritable offset, Text lineText, Context context)
throws IOException, InterruptedException {

String line = lineText.toString();
Text currentWord = new Text();

for ( String word : WORD_BOUNDARY .split(line)) {
if (word.isEmpty()) {
continue;
}
currentWord = new Text(word);
context.write(currentWord,one);
}
}
}

public static class Reduce extends Reducer {
@Override
public void reduce( Text word, Iterable counts, Context context)
throws IOException, InterruptedException {
int sum = 0;
for ( IntWritable count : counts) {
sum += count.get();
}
context.write(word, new IntWritable(sum));
}
}
}
Computers and Technology
1 answer:
stepladder [879]3 years ago
8 0

Answer and Explanation:

package PackageDemo;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

public static void main(String [] args) throws Exception

{

Configuration c=new Configuration();

String[] files=new GenericOptionsParser(c,args).getRemainingArgs();

Path input=new Path(files[0]);

Path output=new Path(files[1]);

Job j=new Job(c,"wordcount");

j.setJarByClass(WordCount.class);

j.setMapperClass(MapForWordCount.class);

j.setReducerClass(ReduceForWordCount.class);

j.setOutputKeyClass(Text.class);

j.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(j, input);

FileOutputFormat.setOutputPath(j, output);

System.exit(j.waitForCompletion(true)?0:1);

}

public static class MapForWordCount extends Mapper<LongWritable, Text, Text, IntWritable>{

public void map(LongWritable key, Text value, Context con) throws IOException, InterruptedException

{

String line = value.toString();

String[] words=line.split(",");

for(String word: words )

{

Text outputKey = new Text(word.toUpperCase().trim());

IntWritable outputValue = new IntWritable(1);

con.write(outputKey, outputValue);

}

}

}

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>

{

public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException

{

int sum = 0;

for(IntWritable value : values)

{

sum += value.get();

}

con.write(word, new IntWritable(sum));

}

}

}

You might be interested in
She wants to sort the list based on the names of the first four gas in ascending order. Which command group dose she navigate to
zubka84 [21]

<u>Best method to sort data:</u>

Best method to sort for data people use excel sheet.  If we have 4 quarter sales data of each select man wise.

1. Open the MS Excel

2. Create blank sheet

3. Create list of salesman and quarter sales in each cells from A1….

4. Select all whole worksheet and paste it the data

5. Once data is copied, custom list is created and select advanced options dialog box, select required field and sorting is achieved.

This is simple’s method for sort the list based on the names of the first four gas in ascending order.

7 0
3 years ago
Read 2 more answers
You and your friend who lives far away want to fairly and randomly select which of the two of you will travel to the other’s hom
Phantasy [73]

Answer:

c. your friend can hash all possible options and discover your secret.

Explanation:

SHA-256 is a set of hash functions that was designed by the NSA. SHA-2 is considered an upgrade on the set that was its predecessor, SHA-1. A hash is a mathematical function that condenses data in a process of one-way encryption. SHA-256 creates hash algoritms that are considered irreversible and unique. However, one of the properties of hashing algorithms is determinism, which means that any computer in the world would be able to compute a particular hash and get the same answer.

6 0
3 years ago
JAVA
butalik [34]

Answer:

import java.util.*;

public class MyClass {

   public static void main(String args[]) {

       Scanner input = new Scanner(System.in);

       System.out.print("Input a word: ");

       String userinput = input.nextLine();

       for(int i =0;i<userinput.length();i+=2) {

           System.out.print(userinput.charAt(i));

       }

   }

}

Explanation:

This line prompts user for input

       System.out.print("Input a word: ");

This declares a string variable named userinput and also gets input from the user

       String userinput = input.nextLine();

The following iterates through every other character of userinput from the first using iteration variable i and i is incremented by 2

       for(int i =0;i<userinput.length();i+=2) {

This prints characters at i-th position

           System.out.print(userinput.charAt(i));

5 0
3 years ago
Collaborative filtering is
vazorg [7]

Answer:

A: used by ISP's to filter out email SPAM

C: a way to help an individual focus on best choices when deciding what to watch or buy.

Explanation:

Collaborative filtering uses a community-based approach to filter spam. It works by collecting numerous email users from around the world. By doing this, it becomes possible for users to flag emails that are spam and those that are legitimate.

Also Collaborative Filtering is one of the most efficient techniques for building a system that can help a user when it comes to recommending best choices based on information from a large number of users.

4 0
3 years ago
List five ways in which the type declaration system of a language such as Java or C differs from the data definition language us
zhannawk [14.2K]

Answer:

Hi Sevanah! Below are the five main differences between a type declarative language and a data definition language:

A data definition language:

1. define data structure

2. define the column attributes of a table

3. no further classifications

4. The basic commands in a Data Definition language are CREATE, DROP, RENAME, ALTER

5. Scope of variables in data definition languages is limited  

A type declarative language:

1. manipulate the data itself

2. uses functions to add and update the rows of a table

3. further classified into procedural and non-procedural languages

4. The basic commands are INSERT, UPDATE and MERGE

5. Scope of variables in type declarative languages is varied

Explanation:

A data definition language is used to define data structures. It makes use of statements such as create table, alter table to create and alter the database schema to allow it to hold rows of information. A type declarative language such as Java or C is used to manipulate the data itself. For example, insert, update and deletion of rows to the database. Whereas data definition languages define the column attributes of a table, a type declaration language uses functions to add and update the rows of a table. A data definition language does not have any further classification, however a type declarative language can be further classified into procedural and non-procedural languages. The basic commands in a Data Definition language are CREATE, DROP, RENAME, ALTER, whereas the basic commands in a type declarative language are INSERT, UPDATE and MERGE. There is little of no scope of variables in a data definition language, whereas the scope of variables in type declarative languages is varied

8 0
3 years ago
Other questions:
  • What is the formula equivalent to the function =SUM(B1:B5)?
    11·2 answers
  • bj;ljfg'cfmb dfl kbslf jflk[gkdblkfd lbrkjlfbbfkjbdfjkbadbjbbkvbk'lbk'blbf;l;lkb lm;flklmkjlvkljvkljvbfbdsjsfbbjjlkbfjklbkfj'lgg
    9·2 answers
  • Please help again if anyone doesn't mind
    15·1 answer
  • How do i do a class in java??
    5·1 answer
  • Which is the purpose of adding B-Roll footage to a sequence?
    10·1 answer
  • For Internet Protocol (IP) v6 traffic to travel on an IP v4 network, which two technologies are used
    15·1 answer
  • Which of the following is as result of division of Labour​
    5·1 answer
  • QUESTION 1
    6·1 answer
  • 12. In Justify the text is aligned both to the right and to the left margins, adding extra space between words as necessary *
    13·2 answers
  • In how many positions are there nucleotide differences between your query sequence and the sequence of accession AY259214.1
    12·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!