1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
NARA [144]
3 years ago
7

You are building a predictive solution based on web server log data. The data is collected in a comma-separated values (CSV) for

mat that always includes the following fields: date: string time: string client_ip: string server_ip: string url_stem: string url_query: string client_bytes: integer server_bytes: integer You want to load the data into a DataFrame for analysis. You must load the data in the correct format while minimizing the processing overhead on the Spark cluster. What should you do? Load the data as lines of text into an RDD, then split the text based on a comma-delimiter and load the RDD into a DataFrame. Define a schema for the data, then read the data from the CSV file into a DataFrame using the schema. Read the data from the CSV file into a DataFrame, infering the schema. Convert the data to tab-delimited format, then read the data from the text file into a DataFrame, infering the schema.
Computers and Technology
1 answer:
juin [17]3 years ago
4 0

Answer:

see explaination

Explanation:

The data is collected in a comma-separated values (CSV) format that always includes the following fields:

? date: string

? time: string

? client_ip: string

? server_ip: string

? url_stem: string

? url_query: string

? client_bytes: integer

? server_bytes: integer

What should you do?

a. Load the data as lines of text into an RDD, then split the text based on a comma-delimiter and load the RDD into DataFrame.

# import the module csv

import csv

import pandas as pd

# open the csv file

with open(r"C:\Users\uname\Downloads\abc.csv") as csv_file:

# read the csv file

csv_reader = csv.reader(csv_file, delimiter=',')

# now we can use this csv files into the pandas

df = pd.DataFrame([csv_reader], index=None)

df.head()

b. Define a schema for the data, then read the data from the CSV file into a DataFrame using the schema.

from pyspark.sql.types import *

from pyspark.sql import SparkSession

newschema = StructType([

StructField("date", DateType(),true),

StructField("time", DateType(),true),

StructField("client_ip", StringType(),true),

StructField("server_ip", StringType(),true),

StructField("url_stem", StringType(),true),

StructField("url_query", StringType(),true),

StructField("client_bytes", IntegerType(),true),

StructField("server_bytes", IntegerType(),true])

c. Read the data from the CSV file into a DataFrame, infering the schema.

abc_DF = spark.read.load('C:\Users\uname\Downloads\new_abc.csv', format="csv", header="true", sep=' ', schema=newSchema)

d. Convert the data to tab-delimited format, then read the data from the text file into a DataFrame, infering the schema.

Import pandas as pd

Df2 = pd.read_csv(‘new_abc.csv’,delimiter="\t")

print('Contents of Dataframe : ')

print(Df2)

You might be interested in
Communication is used to satisfy instrumental goals, which means ________ .
Ann [662]
Communication comes in lots of types, therefore, it also has lots of goals and purposes. One of the purpose of communication is to satisfy instrument goals. Instrument goals here refers to the goal that focuses on convincing others to act in an appropriate way. This is most applicable in situations when someone had to deal with others. 
3 0
3 years ago
Read 2 more answers
Hooollaaaa , todos absolutamente todos quieren borrar esta tarea por que no tiene nada que ver con la materias del colegiooo per
Andreas93 [3]

Answer:

JAJAJSJJS XD PERO SI SOY

7 0
3 years ago
Read 2 more answers
1. find the network address for 172.22.49.252/17
mariarad [96]

Answer:

1. The network address for 172.22.49.252/17 is 172.22.0.0/17.

2. The last valid assignable host address of 172.22.4.129/26 is 172.22.4.190.

3. The first and last host address of 192.167.25.25/16 is 192.167.0.1 and 192.167.255.254.

4. The broadcast address of 10.75.96.0/20 is 10.75.111.255

Explanation:

Subnetting in networking is the process of managing the use of host addresses and subnet masks of a network IP address. For example, the IP address "172.22.49.252/17" is a class B address that receives an extra bit from the third octet which changes its subnet-mask from "255.255.0.0" to "255.255.128.0". with this, only 32766 IP addresses are used, with the network address of "172.22.0.0/17".

7 0
3 years ago
What will happen if you type pseudocode in another language's programming environment and try to run the
wariber [46]

Answer:

You will get compile error

Explanation:

7 0
3 years ago
. A Worker in Microworkers can also be an Employe
bija089 [108]

Answer:

A Worker in Microworkers can also be an Employer: After reaching $25 in earnings. After placing an initial deposit of $10, and launching a valid campaign. If success rate is maintained at 75% before launching a campaign. After creating a separate Employer account.

Explanation:

6 0
3 years ago
Other questions:
  • Yesterday Hunter's laptop screen appeared to go black. The laptop was still running, but he could not see the desktop or any gra
    9·1 answer
  • Write statementsto show how finding the length of a character array char [ ] differs from finding the length of a String object
    9·1 answer
  • Which file extension indicates that a file is an Adobe Acrobat document?
    13·1 answer
  • Boardman College maintains two files—one for Sociology majors and another for Anthropology majors. Each file contains students'
    5·1 answer
  • What is the name of the option in most presentation applications with which you can modify slide elements? The ( answer here )op
    9·1 answer
  • ______ is/are the replacement of human operation and control of machinery with some form of programmed control.
    11·1 answer
  • How did the military in the early 1900s move resources?
    7·1 answer
  • What is block chain?
    5·2 answers
  • Do you think Apple will eventually meet their goal of becoming a replacement for a physical wallet
    7·1 answer
  • Help? Can you explain what this mean?
    7·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!