1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
Sidana [21]
2 years ago
12

The goal of this task is to perform a web crawl on a URL string provided by the user. From the crawl, you will need to parse out

all of the images on that web page and return a JSON array of strings that represent the URLs of all images on the page. [Jsoup](https://jsoup.org/) is a great basic library for crawling and is already included as a maven dependency in this project, however you are welcome to use whatever library you would like.
Required Functionality
We expect your submission to be able to achieve the following goals:
- Build a web crawler that can find all images on the web page(s) that it crawls.
- Crawl sub-pages to find more images.
- Implement multi-threading so that the crawl can be performed on multiple pages at a time.
- Keep your crawl within the same domain as the input URL.
- Avoid re-crawling any pages that have already been visited.
Extra Functionality
No individual point below is explicitly required, but we recommend trying to achieve the following goals as well:
- Make your crawler "friendly" - try not to get banned from the site by performing too many crawls.
- Try to detect what images might be considered logos.
- Show off your front-end dev skills with Javascript, HTML, and/or CSS to make the site look more engaging.
- Any other way you feel you can show off your strengths as a developer ????
PLEASE do not send us a submission with only a basic JSoup crawl and only a couple lines of code.** This is your chance to prove what you could contribute to our team.
Your project will be due exactly 48 hours after you receive this project. To submit, zip up your project (`imagefinder.zip`) and email it back to me. **Please include a list of URLs that you used to test in your submissions.** You should place them in the attached `test-links.txt` file found in the root of this project.
Structure
The ImageFinder servlet is found in `src/main/java/com/eulerity/hackathon/imagefinder/ImageFinder.java`. This is the only provided Java class. Feel free to add more classes or packages as you see fit.
The main landing page for this project can be found in `src/main/webapp/index.html`. This page contains more instructions and serves as the starting page for the web application. You may edit this page as much as it suits you, and/or add other pages.
Finally, in the root directory of this project, you will find the `pom.xml`. This contains the project configuration details used by maven to build the project. If you want/need to use outside dependencies, you should add them to this file.
Running the Project
Here we will detail how to setup and run this project so you may get started, as well as the requirements needed to do so.
Requirements
Before beginning, make sure you have the following installed and ready to use
- Maven 3.5 or higher
- Java 8
Setup
To start, open a terminal window and navigate to wherever you unzipped to the root directory `imagefinder`. To build the project, run the command:
>`mvn package`
If all goes well you should see some lines that ends with "BUILD SUCCESS". When you build your project, maven should build it in the `target` directory. To clear this, you may run the command:
>`mvn clean`
To run the project, use the following command to start the server:
>`mvn clean test package jetty:run`
You should see a line at the bottom that says "Started Jetty Server". Now, if you enter `localhost:8080` into your browser, you should see the `index.html` welcome page! If all has gone well to this point, you're ready to begin!
## Submission
When you are finished working on the project, before zipping up and emailing back your submission, **PLEASE RUN ONE LAST `mvn clean` COMMAND TO REMOVE ANY UNNECESSARY FILES FROM YOUR SUBMISSION**. Please also make sure to add the URLs you used to test your project to the `test-links.txt` file. After doing these things, you may zip up the root directory (`imagefinder`) and email it back to us.
## Final Notes
- If you feel you need more time to work, you are free to ask for it.
- If you are having any trouble, especially with the setup, please reach out and we will try to answer as soon as we can.
- The ideas listed above on how to expand the project are great starting points, but feel free to add in your own ideas as well.
- Try to follow some good-practice principles when working on your code, such as meaningful and clean variable/method names and other good coding practices.
- The code we have provided is to allow you to hit the ground running. You are free to use whatever web service you would like (as long as you use Java 8 and it is runnable from the command line).
- We look forward to seeing what you can do, so good luck and have fun
Computers and Technology
1 answer:
spin [16.1K]2 years ago
4 0

The goal of this task is to perform a web crawl on a URL string provided by the user by the Add one or numerous URLs to be visited.

<h3>What is a multithreaded internet crawler?</h3>

The internet crawler will make use of a couple of threads. It may be capable of moving slowly all of the precise internet pages of a website. It may be capable of documenting again any 2XX and 4XX links. It will take withinside the area call from the command line. It will keep away from the cyclic traversal of links.

Here are the primary steps to construct a crawler:

Step 1: Add one or numerous URLs to be visited.

Step 2: Pop a hyperlink from the URLs to be visited and upload it to the Visited URLs thread.

Step 3: Fetch the page's content material and scrape the records you are interested by with the ScrapingBot API.

Read more about the web:

brainly.com/question/14680064

#SPJ1

You might be interested in
Many contemporary languages allow two kinds of comments: one in which delimiters are used on both ends (multiple-line comments),
m_a_m_a [10]

Answer and Explanation:

Multiple-line comments :

Advantage :In the event that we need to remark out of zone of the given program , we can utilize it.

Disadvantage : In numerous line results are in reduced unwavering quality. It stretches out the remark as far as possible of the following comment.For least difficult approach to unintentionally leave off the last delimiter, which successfully expelling code from the program.

Single Line Comments :

Advantage :In the event that you need to close all the delimiter before close the program we utilize this kind of remarks.

Disadvantage : It put more burden on your program .It repeated on every line of a block of comments

3 0
3 years ago
Kevin is working in the Tasks folder of his Outlook account. Part of his computer screen is shown below.
klasskru [66]

Answer:

D

Explanation:

7 0
3 years ago
What are impacts of ict in every day your life?describe if prifely​
telo118 [61]

Answer:

ICT is a broad subject and a concept of evolving.It covers any product that will store, retrieve, manipulate, transmit, or receive information electronically in a digital form.

Explanation:

HOW WE USE ICT IN OUR DAILY LIFE

COMMUNICATION

JOB OPPORTUNITIES

EDUCATION

SOCIALIZING

POSITIVE IMPACT OF ICT IN OUR DAILY LIFE

1.Easy to access information:

I use ICT to access more information that I need for everyday schooling.Because Internet has more faster than searching to a school library. Even the deadline of my research is coming, I can make it fast with the help of ICT

2. Education: distance learning and on-line tutorials. New ways of learning, e.g. interactive multi-media and virtual reality.

3.Free access of sharing like photo,video,and message

5 0
2 years ago
What do customers use to access the internet, usually for a monthly fee?
kati45 [8]
To search things like answers,order things on Amazon,or go watch videos on Youtube
4 0
3 years ago
1. Assume that word is a variable of type String that has been assigned a value . Assume furthermore that this value always cont
s2008m [1.1K]

Answer:

1.word = "George slew the dragon"

startIndex = word.find('dr')

endIndex = startIndex + 4

drWord = word[startIndex:endIndex]

2. sentence = "Broccoli is delicious."

sentence_list = sentence.split(" ")

firstWord = sentence_list[0]

Explanation:

The above snippet is written in Python 3.

1. word is initialized to a sentence.

Then we find the the occurence of 'dr' in the sentence which is assign to startIndex

We then add 4 to the startIndex and assign it to endIndex. 4 is added because we need a length of 4

We then use string slicing method to create a substring from the startIndex to endIndex which is assigned to drWord.

2. A string is assigned to sentence. Then we split the sentence using sentence.split(" "). We split based on the spacing. The inbuilt function of split returns a list. The first element in the list is assigned to firstWord. List uses zero based index counting. So. firstWord = sentence_list[0] is use to get first element.

4 0
4 years ago
Other questions:
  • Which building-block feature is available in the Text grouping on the Insert tab?
    14·1 answer
  • What steps do you need to take to register
    14·1 answer
  • Create a function named first_a that uses a list comprehension. The function will take a single integer parameter n. Find every
    13·1 answer
  • write a program that reads in the length and the width of a rectangular yard . your program should compute the time required ( i
    5·1 answer
  • Write an assembly program that will prompt the user for 3 inputs (let’s abstractly call them a, b, and c) without a leading stri
    13·1 answer
  • The block of code below is supposed to display “multiple of 5” if the positive number value is in fact a multiple of 5
    8·1 answer
  • ¿Qué ayuda nos proporcionan las herramientas tecnológicas en estos tiempo de pandemia? ayudaaaaa plis
    9·1 answer
  • Write a function named “createPurchaseOrder” that accepts the quantity (integer), the cost per item(double), and the description
    13·1 answer
  • Use HTML and CSS, create a web page for the table shown below. Please copy (and paste) your code from your text editor (Replit).
    10·1 answer
  • 4. Ernesto works in a small office with five other people. What are two possible connection
    8·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!