1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
NARA [144]
3 years ago
7

You are building a predictive solution based on web server log data. The data is collected in a comma-separated values (CSV) for

mat that always includes the following fields: date: string time: string client_ip: string server_ip: string url_stem: string url_query: string client_bytes: integer server_bytes: integer You want to load the data into a DataFrame for analysis. You must load the data in the correct format while minimizing the processing overhead on the Spark cluster. What should you do? Load the data as lines of text into an RDD, then split the text based on a comma-delimiter and load the RDD into a DataFrame. Define a schema for the data, then read the data from the CSV file into a DataFrame using the schema. Read the data from the CSV file into a DataFrame, infering the schema. Convert the data to tab-delimited format, then read the data from the text file into a DataFrame, infering the schema.
Computers and Technology
1 answer:
juin [17]3 years ago
4 0

Answer:

see explaination

Explanation:

The data is collected in a comma-separated values (CSV) format that always includes the following fields:

? date: string

? time: string

? client_ip: string

? server_ip: string

? url_stem: string

? url_query: string

? client_bytes: integer

? server_bytes: integer

What should you do?

a. Load the data as lines of text into an RDD, then split the text based on a comma-delimiter and load the RDD into DataFrame.

# import the module csv

import csv

import pandas as pd

# open the csv file

with open(r"C:\Users\uname\Downloads\abc.csv") as csv_file:

# read the csv file

csv_reader = csv.reader(csv_file, delimiter=',')

# now we can use this csv files into the pandas

df = pd.DataFrame([csv_reader], index=None)

df.head()

b. Define a schema for the data, then read the data from the CSV file into a DataFrame using the schema.

from pyspark.sql.types import *

from pyspark.sql import SparkSession

newschema = StructType([

StructField("date", DateType(),true),

StructField("time", DateType(),true),

StructField("client_ip", StringType(),true),

StructField("server_ip", StringType(),true),

StructField("url_stem", StringType(),true),

StructField("url_query", StringType(),true),

StructField("client_bytes", IntegerType(),true),

StructField("server_bytes", IntegerType(),true])

c. Read the data from the CSV file into a DataFrame, infering the schema.

abc_DF = spark.read.load('C:\Users\uname\Downloads\new_abc.csv', format="csv", header="true", sep=' ', schema=newSchema)

d. Convert the data to tab-delimited format, then read the data from the text file into a DataFrame, infering the schema.

Import pandas as pd

Df2 = pd.read_csv(‘new_abc.csv’,delimiter="\t")

print('Contents of Dataframe : ')

print(Df2)

You might be interested in
Write the code to call a function named send_variable and that expects a single int parameter. Suppose a variable called x refer
NemiM [27]

Answer:

<em>#include <iostream></em>

<em>using namespace std;</em>

<em>//function definition</em>

<em>void send_variable(int num){</em>

<em>    cout<<"The Number is "<<num<<endl;</em>

<em>}</em>

<em>// main function begins here</em>

<em>int main()</em>

<em>{</em>

<em>    int x =15; //declares an it variable and assigns 15</em>

<em>    // Calls the function send_variable</em>

<em>    send_variable(x);</em>

<em>    return 0;</em>

<em>}</em>

Explanation:

Using C++ programming language we created the function called send_variable and in the main function we call this function which only displays the value of an int variable passed unto it.

6 0
3 years ago
__________ delivers a comprehensive and accurate graphical overview of key performance indicators, often using a single screen.
Irina18 [472]
The answer is a digital dashboard.

In its simplest form, a digital dashboard or a business dashboard provides a graphical representation of KPIs, measures and metrics used by a company to monitor performance of departments, individuals, teams or the entire company. They track the progress of business objectives and make effective data driven decisions.



6 0
4 years ago
You have been asked to configure a client-side virtualization solution with three guest oss. Each one needs internet access. How
ArbitrLikvidat [17]

The most cost-effective way to configure a client-side virtualization solution is by using one (1) physical NIC, three (3) virtual NICs, and one (1) virtual switch.

<h3>What is virtualization?</h3>

Virtualization refers to the creation of an abstraction layer over computer hardware through the use of a software, in order to enable the operating system (OS), storage device, server, etc., to be used by end users.

In this scenario, the most cost-effective way to configure a client-side virtualization solution is by using one (1) physical network interface card (NIC), three (3) virtual network interface cards (NICs), and one (1) virtual switch.

Read more on virtualization here: brainly.com/question/14229248

#SPJ1

4 0
2 years ago
Which shortcut key aligns to the center of a page
riadik2000 [5.3K]

Answer:

To make text centered, select and highlight the text first, then hold down Ctrl (the control key) on the keyboard and press E. To make text right aligned, select and highlight the text first, then hold down Ctrl (the control key) on the keyboard and then press R.

Explanation:

4 0
3 years ago
A(n) _____ chart is drawn on the same worksheet as the data.
svet-max [94.6K]
A is the answer because it makes more sense.............
5 0
3 years ago
Other questions:
  • Cable television systems originated with the invention of a particular component. What was this component called?​
    9·1 answer
  • A _________ provides multiple ports for connecting nodes and is aware of the exact address or identity of all the nodes attached
    15·2 answers
  • Suppose that you have been running an unknown sorting algorithm. Out of curiosity, you once stopped the algorithm when it was pa
    8·1 answer
  • A circuit contains four resistors connected in series. R1 is 100 , R2 is 200 , R3 is 240 , and R4 is 600 . What is the total cir
    13·1 answer
  • Let f be the following function: int f(char *s, char *t){char *p1, *p2;for(p1 = s, p2 = t; *p1 != ‘\0’&amp;&amp; *p2 != ‘\0’; p1
    6·1 answer
  • Which of the following payment types require you to pay upfront?
    9·1 answer
  • Python
    7·1 answer
  • Always place the smallest dimensions ____ the object, with ____ the object.
    9·1 answer
  • Choose all items that represent essential features of excellent navigation menu design.
    12·2 answers
  • The Cisco ____ model does not describe how communications take place. Rather, it focuses on how best to design a network, especi
    9·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!