1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
NARA [144]
3 years ago
7

You are building a predictive solution based on web server log data. The data is collected in a comma-separated values (CSV) for

mat that always includes the following fields: date: string time: string client_ip: string server_ip: string url_stem: string url_query: string client_bytes: integer server_bytes: integer You want to load the data into a DataFrame for analysis. You must load the data in the correct format while minimizing the processing overhead on the Spark cluster. What should you do? Load the data as lines of text into an RDD, then split the text based on a comma-delimiter and load the RDD into a DataFrame. Define a schema for the data, then read the data from the CSV file into a DataFrame using the schema. Read the data from the CSV file into a DataFrame, infering the schema. Convert the data to tab-delimited format, then read the data from the text file into a DataFrame, infering the schema.
Computers and Technology
1 answer:
juin [17]3 years ago
4 0

Answer:

see explaination

Explanation:

The data is collected in a comma-separated values (CSV) format that always includes the following fields:

? date: string

? time: string

? client_ip: string

? server_ip: string

? url_stem: string

? url_query: string

? client_bytes: integer

? server_bytes: integer

What should you do?

a. Load the data as lines of text into an RDD, then split the text based on a comma-delimiter and load the RDD into DataFrame.

# import the module csv

import csv

import pandas as pd

# open the csv file

with open(r"C:\Users\uname\Downloads\abc.csv") as csv_file:

# read the csv file

csv_reader = csv.reader(csv_file, delimiter=',')

# now we can use this csv files into the pandas

df = pd.DataFrame([csv_reader], index=None)

df.head()

b. Define a schema for the data, then read the data from the CSV file into a DataFrame using the schema.

from pyspark.sql.types import *

from pyspark.sql import SparkSession

newschema = StructType([

StructField("date", DateType(),true),

StructField("time", DateType(),true),

StructField("client_ip", StringType(),true),

StructField("server_ip", StringType(),true),

StructField("url_stem", StringType(),true),

StructField("url_query", StringType(),true),

StructField("client_bytes", IntegerType(),true),

StructField("server_bytes", IntegerType(),true])

c. Read the data from the CSV file into a DataFrame, infering the schema.

abc_DF = spark.read.load('C:\Users\uname\Downloads\new_abc.csv', format="csv", header="true", sep=' ', schema=newSchema)

d. Convert the data to tab-delimited format, then read the data from the text file into a DataFrame, infering the schema.

Import pandas as pd

Df2 = pd.read_csv(‘new_abc.csv’,delimiter="\t")

print('Contents of Dataframe : ')

print(Df2)

You might be interested in
A___ is a placeholder where you can enter text to manipulate and give new graphical effects.​
Evgesh-ka [11]

Answer:

A "text box" is a placeholder where you can enter text to manipulate and give new graphical effects

7 0
3 years ago
Which menu enables you to add content to the table of contents?.
snow_tiger [21]

The menu that allow you to add content to the table of contents is Add Text.

<h3>What is Table of contents?</h3>

A table of contents is known to be a feature that is headed by the Contents. It is one that is often abbreviated as TOC.

It is known to be a list, that is often found on a page before the start of a main work, of its chapter or titles of section. A person can input this feature by clicking add text.

Learn more about  table of contents from

brainly.com/question/12530927

6 0
2 years ago
1. You are the network manager for a computer training center that allows students to bring their own laptops to class for learn
Eduardwww [97]

Answer:

The lease duration for student computers .

Explanation:

The computer training center network manager which give permission to students to get his separate laptop to both the training and note-taking classes. Students require an internet connection because they have installed the DHCP server in their network to instantly give IP's.  

So, The length of the lease for student computers which should be changed to ensure that they do not waste addresses used by learners who left for that day.

4 0
3 years ago
You are the system administrator for Precision Accounting Services, which employs 20 accountants and 25 accounting assistants. T
Naddik [55]

What can  be done is to provide the computers with  remote access.

<h3>What is Remote access?</h3>

This involves the use of softwares that enables a single computer to view

or control others from any area.

Adopting this method means all the workers will have access to a

consistent desktop experience no matter which computer they sign in to

for work.

Read more about Remote access here brainly.com/question/26327418

4 0
2 years ago
Words or names defined by the programmer are called
polet [3.4K]
<span>Words or names defined by the programmers are called as programmer defined symbols or identifiers. These are called as variables which are just simple storage locations and making available for the program to use. No two variables are same in the code of the program.</span>
6 0
3 years ago
Other questions:
  • If a movie starts at 305 and the movie is 2 hrs and 44 mins when does the movie end?
    14·1 answer
  • Which examples demonstrate common Network Systems workplaces and employers? Check all that apply.
    13·2 answers
  • Mass Media does not play a large role in American society. Please select the best answer from the choices provided
    13·2 answers
  • Charlie is a British national who works in the United States as a novelist for children. Because he is British, he types the wor
    10·2 answers
  • Explain why it might be more appropriate to declare an attribute that contains only digits as a character data type instead of a
    11·1 answer
  • What are the characteristics of good blogs?
    8·1 answer
  • What happens if i unplug my alarm system?
    11·1 answer
  • 2. How is accessing the Internet through a home network and public Wi-Fi similar?​
    15·1 answer
  • 30 pts!<br> Explain how Moore's law presumes everyone will have access to the Internet.
    14·1 answer
  • Help please answer the question 1 2 3 4 5 6<br><br>help
    12·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!