What is data visualization ?
Answer
Data visualization basically refers to the graphical or visual representation of information and data using visual elements like charts, graphs, maps and so forth.
Name the Python library generally used for data visualization.
Answer
The Python library generally used for data visualization is Matplotlib library.
Is Pyplot a Python library ? What is it ?
Answer
Yes, Pyplot is a Python library. Pyplot is a collection of methods within matplotlib library which allows user to construct 2D plots easily and interactively.
Name the function you will use to create a horizontal bar chart.
Answer
The barh()
function is used to create a horizontal bar chart.
Which argument will you provide to change the following in a line chart ?
(i) width of the line
(ii) color of the line
Answer
(i) For changing the width of the line, we use the linewidth argument with the plot() function as: <matplotlib.pyplot>.plot(<data1>, [,data2], linewidth = <width> )
(ii) For changing the color of the line, we use the color argument with the plot() function as: <matplotlib.pyplot>.plot(<data1>, [,data2], <color code>)
What is a marker ? How can you change the marker type and color in a plot ?
Answer
The data points being plotted on a graph/chart are called markers. To change the marker type and color, we use following additional optional arguments in plot function : marker = <valid marker type>, markeredgecolor = <valid color>
.
Using which function of Pyplot can you plot histograms ?
Answer
With Pyplot, a histogram is created using hist()
function.
Are bar charts and histograms the same ?
Answer
No, bar charts and histograms are not same. A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. On the other hand, a histogram is a type of graph that provides a visual interpretation of numerical data by indicating the number of data points that lie within a range of values.
Name various types of histogram plots that you can create using Pyplot.
Answer
The types of histogram plots that we can create using Pyplot are cumulative histogram, step type histogram, stacked bar type histogram, horizontal histogram.
What is a frequency polygon ?
Answer
A frequency polygon is a type of frequency distribution graph. In a frequency polygon, the number of observations is marked with a single point at the midpoint of an interval.
What is the use of box plot ?
Answer
The box plot is used to show the range and the middle half of the ranked data.
Using which function of Pyplot, can you create box plots ?
Answer
Pyplot module's boxplot()
function allows to create box plots.
Which library is imported to draw charts in Python ?
- csv
- matplotlib
- numpy
- pandas
Answer
matplotlib
Reason — The library imported to draw charts in Python is Matplotlib.
PyPlot is an interface of Python's ............... library.
- seaborn
- plotly
- ggplot
- matplotlib
Answer
matplotlib
Reason — PyPlot is an interface of Python's Matplotlib library that allows users to construct 2D plots easily and interactively.
For 2D plotting using a Python library, which library interface is often used ?
- seaborn
- plotly
- matplotlib
- matplotlib.pyplot
Answer
matplotlib.pyplot
Reason — The matplotlib.pyplot
interface is commonly used for 2D plotting in Python using the Matplotlib library.
Which of the following is not a valid chart type ?
- histogram
- statistical
- pie
- box
Answer
statistical, box
Reason — Pie charts and histograms are valid chart types used for data visualization. Statistical and box plots are not valid chart types.
Which of the following is not a valid plotting function of Pyplot ?
- plot()
- bar()
- line()
- pie()
Answer
line()
Reason — The line()
function is not a valid plotting function of Pyplot. In Matplotlib's Pyplot module, the correct function for creating line plots is plot()
.
Which of the following plotting functions does not plot multiple data series ?
- plot()
- bar()
- pie()
- barh()
Answer
pie()
Reason — The pie()
function in Matplotlib's Pyplot module can plot only one data sequence. On the other hand, functions like plot()
, bar()
, and barh()
can plot multiple data series in a single chart.
The plot which tells the trend between two graphed variables is the ............... graph/chart.
- line
- scatter
- bar
- pie
Answer
line
Reason — A line chart or line graph is a type of chart which displays information as a series of data points called "markers" connected by straight line segments.
The plot which tells the correlation between two variables which may not be directly related is ............... graph/chart.
- line
- scatter
- bar
- pie
Answer
scatter
Reason — A scatter chart is a graph of plotted points on two axes that show the relationship between two sets of data.
A ............... is a summarization tool for discrete or continuous data.
- quartile
- histogram
- mean
- median
Answer
histogram
Reason — A histogram is a summarization tool for discrete or continuous data. A histogram provides a visual interpretation of numerical data by indicating the number of data points that lie within a range of values.
A visual representation of the statistical five number summary of a given dataset is known as ............... .
- histogram
- frequency distribution
- box plot
- frequency polygon
Answer
box plot
Reason — A box plot provides a visual representation of the statistical five-number summary of a given dataset. It includes the highest and lowest numbers, the median, and the upper and lower quartiles.
Which of the following functions is used to create a line chart ?
- line()
- plot()
- chart()
- plotline()
Answer
plot()
Reason — The plot()
function is used to create a line chart or line graph.
Which of the following function will produce a bar chart ?
- plot()
- bar()
- plotbar()
- barh()
Answer
bar(), barh()
Reason — With Pyplot, a bar chart is created using bar()
and barh()
functions.
Which of the following function will create a vertical bar chart ?
- plot()
- bar()
- plotbar()
- barh()
Answer
bar()
Reason — The bar()
function will create a vertical bar chart.
Which of the following function will create a horizontal bar chart ?
- plot()
- bar()
- plotbar()
- barh()
Answer
barh()
Reason — The barh()
function will create a horizontal bar chart.
To specify the style of line as dashed, which argument of plot() needs to be set ?
- line
- width
- style
- linestyle
Answer
linestyle
Reason — To specify the style of the line as dashed in Matplotlib's plot()
function, we need to set the linestyle
argument to 'dashed'.
The data points plotted on a graph are called ............... .
- points
- pointers
- marks
- markers
Answer
markers
Reason — The data points being plotted on a graph/chart are called markers.
A ............... graph is a type of chart which displays information as a series of data points connected by straight line segments.
- line
- bar
- pie
- box plot
Answer
line
Reason — A line chart or line graph is a type of chart which displays information as a series of data points called "markers" connected by straight line segments.
To create scatter charts using plot(), which argument is skipped ?
- marker
- linestyle
- markeredgecolor
- linewidth
Answer
linestyle
Reason — When creating scatter charts using Matplotlib's plot()
function, the linestyle
argument is skipped because scatter plots do not use line styles.
In scatter(), which argument is used to specify the size of data points ?
- size
- s
- marker
- markersize
Answer
s
Reason — The s
argument in scatter()
is used to specify the size of data points.
Which argument of bar() lets you set the thickness of bar ?
- thick
- thickness
- width
- barwidth
Answer
width
Reason — The width
argument allows to control the thickness of the bars in a bar chart created using the bar()
function in Matplotlib.
To change the width of bars in a bar chart, which of the following arguments with a float value is used ?
- hwidth
- width
- breath
- barwidth
Answer
width
Reason — The width
argument with a float value is used to change the width of bars in a bar chart created using the bar()
function in Matplotlib.
Which function lets you set the title of the plot ?
- title()
- plottitle()
- graphtitle()
- all of these
Answer
title()
Reason — The title()
function sets the title of the plot in Matplotlib.
The command used to give a heading to a graph is ............... .
- plt.show()
- plt.plot()
- plt.xlabel()
- plt.title()
Answer
plt.title()
Reason — The plt.title()
command is used in Matplotlib's Pyplot module to give a heading or title to a graph.
Which function would you use to set the limits for x-axis of the plot ?
- limits()
- xlimits()
- xlim()
- lim()
Answer
xlim()
Reason — The xlim()
function is used to set the limits for the x-axis of a plot in Matplotlib.
Which function is used to show legends ?
- display()
- show()
- legend()
- legends()
Answer
legend()
Reason — The legend()
function in Matplotlib's Pyplot module is used to show legends in a plot.
Which argument must be set with plotting functions for legend() to display the legends ?
- data
- label
- name
- sequence
Answer
label
Reason — The label
argument must be set with plotting functions for the legend()
function to display the legends correctly.
Which function is used to create a histogram ?
- histo()
- histogram()
- hist()
- histtype
Answer
hist()
Reason — The hist()
function is used to create a histogram.
Which argument in hist() is used to create a stacked bar type histogram ?
- histt
- histtype
- type
- barstacked
Answer
histtype
Reason — The histtype
argument in Matplotlib's hist()
function is used to create a stacked bar type histogram. Setting histtype = 'barstacked'
creates a histogram where bars for each bin are stacked on top of each other, representing different categories or subgroups within the data.
Which of the following functions can plot only one data series ?
- plot()
- bar()
- boxplot()
- pie()
Answer
pie()
Reason — The pie()
function in Matplotlib's Pyplot module can plot only one data series. On the other hand, functions like plot()
, bar()
, and boxplot()
can plot multiple data series in a single chart.
Which argument must be provided to create wedges out of a pie chart ?
- label
- autopct
- explode
- wedge
Answer
explode
Reason — The explode
argument is used in pie charts to visually separate one or more wedges from the rest of the pie chart.
Which argument should be set to display percentage share of each pie on a pie chart ?
- label
- autopct
- explode
- wedge
Answer
autopct
Reason — To view the percentage of share in a pie chart, we need to add an argument autopct
with a format string, such as "%1.1F%%".
Which function creates a box plot ?
- box()
- plot()
- boxplot()
- showbox()
Answer
boxplot()
Reason — The boxplot()
function is used to create box plots in Matplotlib.
Which argument of boxplot() is used to create a filled boxplot ?
- fill
- box
- patch_artist
- patch
Answer
patch_artist
Reason — The patch_artist
argument in the boxplot()
function is used to create a filled box plot. When set to True, it fills the boxes of the box plot with a color, making them more visually distinct.
A histogram is a plot that shows the underlying frequency distribution of a set of continuous data.
Pyplot interface is a collection of methods within matplotlib library of Python.
Pyplot's plot() function is used to create line charts.
Pyplot's barh() function is used to create horizontal bar charts.
Pyplot's scatter() function is used to create scatter charts.
Pyplot's hist() function is used to create histogram.
The datapoints plotted on a graph are called markers .
The linewidth argument of plot() specifies the width for the line.
The linestyle argument of plot() specifies the style of the line.
The width argument of bar() specifies the bar width.
The xticks() function is used to specify ticks for x-axis.
To save a plot, savefig() function is used.
The orientation argument of hist() is set to create a horizontal histogram.
The showmeans argument shows the arithmetic mean on a boxplot.
The notch argument in a boxplot() creates a notched boxplot.
The loc argument of legend() provides the location of legend.
Using Python Matplotlib histogram can be used to count how many values fall into each interval. (line plot / bar graph / histogram)
PyPlot is a sub-library of matplotlib library.
Answer
True
Reason — PyPlot is a sub-library of Matplotlib library. It allows users to construct 2D plots easily and interactively. Pyplot essentially reproduces plotting functions and behavior of MATLAB.
Statement import pyplot.matplotlib is a valid statement for working on pyplot functions.
Answer
False
Reason — The statement import matplotlib.pyplot
is the valid and commonly used way to import the PyPlot submodule from the Matplotlib library.
By default, pie chart is printed in elliptical or oval shape.
Answer
True
Reason — By default, a pie chart is printed in an elliptical or oval shape.
The default shape of pie chart cannot be changed from oval.
Answer
False
Reason — By default, a pie chart is printed in an elliptical or oval shape. It can be changed to a circle by using the axis()
function of Pyplot and passing the 'equal' argument to it.
A line chart can be plotted using pyplot library's line() function.
Answer
False
Reason — In Matplotlib's Pyplot library, the function used to plot a line chart is plot()
.
A line chart can be plotted using pyplot library's plot() function.
Answer
True
Reason — In Matplotlib's Pyplot library, the function used to plot a line chart is plot()
.
A bar chart can be plotted using pyplot library's bar() function.
Answer
True
Reason — In Matplotlib's Pyplot library, the function used to plot a bar chart is bar()
.
A bar chart can be plotted using pyplot library's barh() function.
Answer
True
Reason — The barh()
function in Matplotlib's Pyplot library is used to create horizontal bar charts.
It is not possible to plot multiple series of values in the same bar graph.
Answer
False
Reason — It is possible to plot multiple series of values in the same bar graph using Matplotlib's Pyplot library because the bar()
and barh()
functions support handling multiple datasets.
A standard marker of representing a non-number data in Python libraries is NaN.
Answer
True
Reason — A standard marker for representing missing or non-number data in Python libraries is NaN (Not a Number).
If the linestyle argument is missing along with markerstyle-string in a plot(), a scatter type chart get created.
Answer
True
Reason — When both the linestyle
argument and the marker
argument (markerstyle-string) are not specified in the plot()
function, the resulting chart can resemble a scatter plot. In this case, the points will be plotted without connected lines, similar to how a scatter plot displays data points.
The bar() function can also create horizontal bar charts.
Answer
False
Reason — The bar()
function in Matplotlib's Pyplot library can create vertical bar charts, while the barh()
function creates horizontal bar charts.
The pie() function can plot multiple data series.
Answer
False
Reason — The pie()
function in Matplotlib's Pyplot module can plot only one data series.
The plot is always as per the data series being plotted irrespective of the xlim().
Answer
False
Reason — The plot appearance can be affected by the data series being plotted, but it can also be influenced by functions such as xlim()
which determine the range of values shown on the x-axis.
Frequency polygon is created from histogram.
Answer
True
Reason — When a frequency polygon is drawn manually, it is based on the data that would be used to create a histogram.
What is not true about Data Visualization ?
(a) Graphical representation of information and data.
(b) Helps users in analyzing a large amount of data in a simpler way.
(c) Data Visualization makes complex data more accessible, understandable, and usable.
(d) No library needs to be imported to create charts in Python language.
Answer
No library needs to be imported to create charts in Python language.
Reason — To create charts and visualizations in Python, we need to import libraries such as Matplotlib.
Assertion. The matplotlib library of Python is used for data visualization.
Reason. The PyPlot interface of matplotlib library is used for 2D plotting.
- Both A and R are true and R is the correct explanation of A.
- Both A and R are true but R is not the correct explanation of A.
- A is true but R is false.
- A is false but R is true.
Answer
Both A and R are true and R is the correct explanation of A.
Explanation
The Matplotlib library is used for data visualization in Python, providing a variety of tools and functionalities for creating different types of plots, charts, and graphs. The Pyplot module, which is a collection of methods within the Matplotlib library, allows users to construct 2D plots easily and interactively.
Assertion. A scatter chart simply plots the data points on a chart to show the trend in the data.
Reason. A line chart connects the plotted data points with a line.
- Both A and R are true and R is the correct explanation of A.
- Both A and R are true but R is not the correct explanation of A.
- A is true but R is false.
- A is false but R is true.
Answer
Both A and R are true but R is not the correct explanation of A.
Explanation
The scatter chart is a graph of plotted points on two axes that show the relationship between two sets of data. On the other hand, a line chart, or line graph, is a type of chart that displays information as a series of data points called 'markers' connected by straight line segments.
Assertion. Both scatter() and plot() functions of PyPlot can create scatter charts.
Reason. The plot() function can create line charts as well as scatter charts.
- Both A and R are true and R is the correct explanation of A.
- Both A and R are true but R is not the correct explanation of A.
- A is true but R is false.
- A is false but R is true.
Answer
Both A and R are true and R is the correct explanation of A.
Explanation
Both the scatter()
and plot()
functions in Pyplot can create scatter charts. The plot()
function in Pyplot can create both line charts and scatter charts. When specifying marker styles without providing a linestyle argument, the plot()
function will create a scatter chart.
Assertion. For the same sets of data, you can create various charts using plot(), scatter(), pie(), bar() and barh().
Reason. All the data sets of a plot(), scatter(), bar() cannot be used by pie() ; it will work with only a single set of data.
- Both A and R are true and R is the correct explanation of A.
- Both A and R are true but R is not the correct explanation of A.
- A is true but R is false.
- A is false but R is true.
Answer
A is false but R is true.
Explanation
We can create various charts using plot()
, scatter()
, bar()
, and barh()
for the same datasets, but not using pie()
. The pie()
function specifically works with a single set of data, whereas the other functions can handle multiple datasets or series.
Assertion. Five-point statistical summary of a data set can be visually represented.
Reason. The boxplot() function can plot the highest and lowest numbers of a data range, its median along with the upper and lower quartiles.
- Both A and R are true and R is the correct explanation of A.
- Both A and R are true but R is not the correct explanation of A.
- A is true but R is false.
- A is false but R is true.
Answer
Both A and R are true and R is the correct explanation of A.
Explanation
The five-point statistical summary of a dataset can be visually represented through a box plot. A box plot is used to display the range and middle half of ranked data. It uses five important numbers from the data range: the extremes (highest and lowest numbers), the median, and the upper and lower quartiles, comprising the five-number statistical summary.
Assertion. Line graph is a tool for comparison and is created by plotting a series of several points and connecting them with a straight line.
Reason. You should never use a line chart when the chart is in a continuous data set.
- Both A and R are true and R is the correct explanation of A.
- Both A and R are true but R is not the correct explanation of A.
- A is true but R is false.
- A is false but R is true.
Answer
A is true but R is false.
Explanation
A line graph is a tool for comparison, created by plotting a series of data points called 'markers' and connecting them with straight lines. This makes it easier to compare different data points and observe patterns. Line charts are suitable for continuous data sets, displaying information as a series of data and not restricted to discontinuous data sets.
Name the library of which the PyPlot is an interface.
Answer
PyPlot is an interface provided by the Matplotlib library.
Write the statement to import PyPlot in your script.
Answer
The statement to import PyPlot in our script is as follows:
import matplotlib.pyplot as plt
Name the functions to create the following :
(a) line chart
(b) bar chart
(c) horizontal bar chart
(d) histogram
(e) scatter chart
(j) boxplot
(g) pie chart
Answer
(a) line chart: plot()
function
(b) bar chart: bar()
function
(c) horizontal bar chart: barh()
function
(d) histogram: hist()
function
(e) scatter chart: scatter()
function
(j) boxplot: boxplot()
function
(g) pie chart: pie()
function
What is a line chart ?
Answer
A line chart, or line graph, is a type of chart that displays information as a series of data points called 'markers' connected by straight line segments.
What is a scatter chart ? How is it different from line chart ?
Answer
The scatter chart is a graph of plotted points on two axes that show the relationship between two sets of data. With a scatter plot, a mark or marker (usually a dot or small circle), represents a single data point. With one mark (point) for every data point a visual distribution of the data can be seen. Depending on how tightly the points cluster together, we may be able to discern a clear trend in the data.
The difference is that with a scatter plot, the decision is made from the data points such that the individual points should not be connected directly together with a line but, instead express a trend.
What is the utility of pie chart ?
Answer
A pie chart is used to show parts in relation to the whole, often representing percentage shares and numerical proportions.
What is a bar chart ? How is it useful as compared to the line chart ?
Answer
A bar graph / bar chart is a graphical display of data using bars of different heights.
Compared to a line chart, which connects data points with lines, a bar chart is useful for comparing discrete categories rather than showing continuous trends over time. Bar charts are effective for highlighting differences in values between categories and are particularly useful when dealing with categorical data or comparing data across different groups or time periods.
What is a histogram ? What is its usage/utility ?
Answer
A histogram is a summarization tool for discrete or continuous data. A histogram provides a visual interpretation of numerical data by indicating the number of data points that lie within a range of values (called "bins"). Histograms are a great way to show results of continuous data, such as: weight, height, how much time, and so forth.
What is a boxplot ? Which situations are more appropriate for boxplot ?
Answer
A boxplot is a graphical representation of the distribution of a dataset through five summary statistics: the extremes (the highest and the lowest numbers), the median, and the upper and lower quartiles.
Box plots are suitable for visualizing the spread of data, identifying outliers, comparing data distribution between different groups or categories, and assessing symmetry in a dataset.
What is a frequency polygon ? What is it utility ?
Answer
A frequency polygon is a type of frequency distribution graph. In a frequency polygon, the number of observations is marked with a single point at the midpoint of an interval. A straight line then connects each set of points. Frequency polygons make it easy to compare two or more distributions on the same set of axes.
Name the function to label axes.
Answer
The functions to label axes in a plot using Matplotlib's Pyplot library are xlabel()
for the x-axis and ylabel()
for the y-axis.
Name the function to give title to a plot.
Answer
The function title()
in Matplotlib's Pyplot library is used to add a title to a plot.
Name the function to set figure size of a plot.
Answer
The figure()
function in Matplotlib's Pyplot library is used to set figure size of a plot.
Name the function to set limits for the axes.
Answer
The function to set limits for the axes in a plot using Matplotlib's Pyplot library is xlim()
for the x-axis and ylim()
for the y-axis.
Name the function to show legends on a plot.
Answer
The legend()
function in Matplotlib's Pyplot library is used to display a legend on the plot.
Name the function to add ticks on axes.
Answer
The functions to add ticks on axes in a plot using Matplotlib's Pyplot library are xticks()
for the x-axis and yticks()
for the y-axis.
What is the significance of data visualization ?
Answer
Patterns, trends and correlations that might go undetected in text-based data can be exposed and recognized easier with data visualization techniques or tools such as line chart, bar chart, pie chart, histogram, scatter chart etc. Thus with data visualization tools, information can be processed in efficient manner and hence better decisions can be made.
How does Python support data visualization ?
Answer
Python supports data visualizations by providing some useful libraries for visualization. Most commonly used data visaulization library is matplotlib. Matplotlib is a Python library, also sometimes known as the plotting library. The matplotlib library offers very extensive range of 2D plot types and output formats. It offers complete 2D support along with limited 3D graphic support. It is useful in producing publication quality figures in interactive environment across platforms. It can also be used for animations as well. There are many other libraries of Python that can be used for data Visualization but matplotlib is very popular for 2D plotting.
What is the use of matplotlib and pyplot ?
Answer
For data visualization in Python, the matplotlib library's Pyplot interface is used. Matplotlib is a Python library that provides interfaces and functionalities for 2D graphics, similar to MATLAB's in various forms. It offers both a quick way to visualize data in Python and creates publication-quality figures in many formats. The Matplotlib library offers various named collections of methods. Pyplot, as one such interface, enables users to construct 2D plots easily and interactively.
What are the popular ways of plotting data ?
Answer
The popular ways of plotting data include line charts, bar charts, histograms, scatter plots, pie charts, box plots.
Compare bar() and barh() functions.
Answer
bar() function | barh() function |
---|---|
This function is used to create vertical bar charts. | This function is used to create horizontal bar charts. |
In a vertical bar chart, the bars are plotted along the vertical axis (y-axis) with their lengths representing the values being plotted. | In a horizontal bar chart, the bars are plotted along the horizontal axis (x-axis) with their lengths representing the values being plotted. |
The first sequence given in the bar() forms the x-axis and the second sequence values are plotted on y-axis. | The first sequence given in the barh() forms the y-axis and the second sequence values are plotted on x-axis. |
What is the role of legends in a graph/chart ?
Answer
In a chart/graph, there may be multiple datasets plotted. To distinguish among various datasets plotted in the same chart, legends are used. Legends can be different colors/patterns assigned to different specific datasets. The legends are shown in a corner of a chart/graph.
What will happen if you use legend() without providing any label for the data series being plotted ?
Answer
Using legend()
function without labels results in default labels (e.g., "line 1," "line 2"). This can confuse viewers as it lacks meaningful information about the data series being plotted.
What do you understand by xlimit and ylimit ? How are these linked to data being plotted ?
Answer
The xlimit and ylimit determine which data values are visible on the x-axis and y-axis in a plot or chart respectively. Only the data values that fall within these limits will be plotted. If no data value maps to the specified x-limits or y-limits, nothing will show on the plot for that particular axis range.
When should you use
(i) a line chart
(ii) a bar chart
(iii) a scatter chart
(iv) pie chart
(v) boxplot ?
Answer
(i) Line Chart — Use a line chart to show trends or changes over time. It's suitable for displaying continuous data series and highlighting patterns or fluctuations.
(ii) Bar Chart — Use a bar chart to compare categories or groups. It's effective for displaying discrete data and showing differences or relationships between items.
(iii) Scatter Chart — Use a scatter chart to visualize relationships between two variables. It's helpful for identifying correlations or trends in data points.
(iv) Pie Chart — Use a pie chart to represent parts of a whole. It's useful for showing the proportion or distribution of different categories within a dataset.
(v) Boxplot — The box plot is used to show the range and the middle half of ranked data while identifying outliers or variability.
A list namely temp contains average temperatures for seven days of last week. You want to see how the temperature changed in last seven days. Which chart type will you plot for the same and why ?
Answer
A line chart is the suitable choice for visualizing how the temperature changed over the last seven days. The line chart shows trends over time and displays continuous data, making it ideal for representing temperature values. The chart's ability to connect data points allows viewers to easily observe temperature trends and understand variations across the seven-day period.
What is histogram ? How do you create histograms in Python ?
Answer
A histogram is a summarization tool for discrete or continuous data, providing a visual interpretation of numerical data by showing the number of data points that fall within a specified range of values.
The hist()
function of the Pyplot module is used to create and plot a histogram from a given sequence of numbers. The syntax for using the hist()
function in Pyplot is as follows:
matplotlib.pyplot.hist(x, bins = None, cumulative = False, histtype = 'bar', align = 'mid', orientation = 'vertical', )
.
What are various types of histograms that can be created through hist() function ?
Answer
The hist()
function in Matplotlib's Pyplot module allows creating various types of histograms. These include the default bar histogram (histtype='bar'), step histogram (histtype='step'), stepfilled histogram (histtype='stepfilled'), barstacked histogram (histtype='barstacked').
When should you create histograms and when should you create bar charts to present data visually ?
Answer
Histograms are great for displaying specific ranges of values and are ideal for visualizing the results of continuous data, such as the ages of students in a class. Bar charts, on the other hand, are effective for comparing categorical or discrete data across different categories or groups, such as comparing the sales performance of different products.
What is cumulative histogram ? How do you create it using PyPlot ?
Answer
A cumulative histogram is a graphical representation in which each bin displays the count of data points within that bin as well as the counts of all smaller bins. The final bin in this histogram indicates the total number of data points in the dataset.
In Matplotlib's hist function, we can create a cumulative histogram by setting the cumulative
parameter to True. The syntax is as follows: matplotlib.pyplot.hist(x, bins = None, histtype='barstacked', cumulative=True)
.
What is frequency polygon ? How do you create it ?
Answer
A frequency polygon is a type of frequency distribution graph. In a frequency polygon, the number of observations is marked with a single point at the midpoint of an interval, and a straight line then connects each set of points.
We can create frequency polygon in following two ways:
- Drawing Frequency Polygon Manually
- Creating Frequency Polygon through a Line Chart
What is 5 point summary ?
Answer
The five-point summary is a descriptive statistics tool that provides a concise summary of the distribution of a dataset. It consists of five important numbers of a data range:
- the minimum range value
- the maximum range value
- the upper quartile
- the lower quartile
- the median
What is Boxplot ? How do you create it in Pyplot ?
Answer
A boxplot is a visual representation of the statistical five number summary of a given data set, including the extremes (the highest and the lowest numbers), the median, the upper and lower quartiles.
With Pyplot, a boxplot is created using boxplot() function. The syntax is as follows : matplotlib.pyplot.boxplot(x, notch = None, vert = None, meanline = None, showmeans = None, showbox = None,)
.
Execute the following codes and find out what happens ? (Libraries have been imported already ; plt is the alias name for matplotlib.pyplot)
A = np.arange(2, 20, 2)
B = np.log(A)
plt.plot(A, B)
Will this code produce error ? Why/Why not ?
Answer
Executing the provided code will not produce an error. It will generate a plot of the logarithm of A against A itself.
The line A = np.arange(2, 20, 2)
creates an array A
using NumPy's arange()
function. It starts from 2, increments by 2, and includes values up to 20. This results in the array [2, 4, 6, 8, 10, 12, 14, 16, 18]. Next, the line B = np.log(A)
calculates the natural logarithm of each element in array A
using NumPy's log()
function and stores the results in array B
. Finally, plt.plot(A, B)
plots the values in array A
along the x-axis and the corresponding values in array B
along the y-axis using Matplotlib's plot()
function.
Execute the following codes and find out what happens ? (Libraries have been imported already ; plt is the alias name for matplotlib.pyplot)
A = np.arange(2, 20, 2)
B = np.log(A)
plt.bar(A, B)
Will this code produce error ? Why/Why not ?
Answer
Executing the provided code will not produce an error. However, the resulting plot might not be as expected because the x-axis values are discrete and categorical, not continuous.
The line A = np.arange(2, 20, 2)
creates an array A
using NumPy's arange()
function. It starts from 2, increments by 2, and includes values up to 20. This results in the array [2, 4, 6, 8, 10, 12, 14, 16, 18]. Next, the line B = np.log(A)
calculates the natural logarithm of each element in array A
using NumPy's log()
function and stores the results in array B
. Finally, plt.bar(A, B)
creates a bar plot using Matplotlib's bar()
function. It plots the values in array A
along the x-axis and the corresponding values in array B
along the y-axis.
Execute the following codes and find out what happens ? (Libraries have been imported already ; plt is the alias name for matplotlib.pyplot)
X = np.arange(1, 18, 2.655)
B = np.log(X)
plt.scatter(X, Y)
Will this code produce error ? Why/Why not ?
Answer
The code will produce an error because the variable Y
is not defined.
The corrected code is:
X = np.arange(1, 18, 2.655)
B = np.log(X)
plt.scatter(X, B)
The line X = np.arange(1, 18, 2.655)
creates an array X
using NumPy's arange()
function. It starts from 1, increments by 2.655, and generates values less than 18. The resulting array will look like [1., 3.655, 6.31, 8.965, 11.62, 14.275, 16.93]. Next, the line B = np.log(X)
calculates the natural logarithm of each element in array X
using NumPy's log()
function. Finally, the line plt.scatter(X, Y)
attempts to use Matplotlib's scatter()
function to create a scatter plot. However, Y
is not defined in code, leading to a NameError.
Write the output from the given python code :
import matplotlib.pyplot as plt
Months = ['Dec', 'Jan', 'Feb', 'Mar']
Attendance = [70, 90, 75, 95]
plt.bar(Months, Attendance)
plt.show()
Answer
This code snippet uses Matplotlib to create a bar chart. The list Months
contains the names of the months ['Dec', 'Jan', 'Feb', 'Mar'], while the list Attendance
holds corresponding attendance values [70, 90, 75, 95]. The plt.bar()
function is then used to create a bar plot, where each bar represents a month and its height corresponds to the attendance value. Finally, plt.show()
is called to display the plot.
Write a program to add titles for the X-axis, Y-axis and for the whole chart in below code.
import matplotlib.pyplot as plt
Months = ['Dec', 'Jan', 'Feb', 'Mar']
Attendance = [70, 90, 75, 95]
plt.bar(Months, Attendance)
plt.show()
Answer
import matplotlib.pyplot as plt
Months = ['Dec', 'Jan', 'Feb', 'Mar']
Attendance = [70, 90, 75, 95]
plt.bar(Months, Attendance)
plt.xlabel('Months')
plt.ylabel('Attendance')
plt.title('Attendance Report')
plt.show()
plt.plot(A, B) produces (A and B are the sequences same as created in question 1) chart as :
Write code to produce charts as shown below:
Answer
import numpy as np
import matplotlib.pyplot as plt
A = np.arange(2, 20, 2)
B = np.log(A)
C = np.log(A) * 1.2
plt.plot(A, B)
plt.plot(A, C)
plt.show()
import numpy as np
import matplotlib.pyplot as plt
A = np.arange(2, 20, 2)
B = np.log(A)
C = np.log(A) * (-1.2)
plt.plot(A, B)
plt.plot(A, C)
plt.show()
Write suitable Python code to create 'Favourite Hobby' Bar Chart as shown below :
Also give suitable python statement to save this chart.
Answer
import matplotlib.pyplot as plt
hobbies = ['Dance', 'Music', 'Painting', 'Playing Sports']
people_count = [300, 400, 100, 500]
plt.bar(hobbies, people_count)
plt.xlabel('Hobbies')
plt.ylabel('Number of People')
plt.title('Favourite Hobby')
plt.savefig('favourite_hobby_chart.png')
plt.show()
Consider the following graph. Write the Python code to plot it. Also add the Title, label for X and Y axis.
Using the following data for plotting the graph
smarks = [10, 40, 30, 60, 55]
sname = ["Sahil", "Deepak", "Anil", "Ravi", "Riti"]
Answer
import matplotlib.pyplot as plt
smarks = [10, 40, 30, 60, 55]
sname = ["Sahil", "Deepak", "Anil", "Ravi", "Riti"]
plt.plot(sname, smarks)
plt.xlabel('Student Name')
plt.ylabel('Marks Scored')
plt.title('Marks Secured by Students in Term-1')
plt.show()
Given a data frame df1 as shown below :
1990 | 2000 | 2010 | |
---|---|---|---|
a | 52 | 340 | 890 |
b | 64 | 480 | 560 |
c | 78 | 688 | 1102 |
d | 94 | 766 | 889 |
Write code to create a scatter chart from the 1990 and 2010 columns of dataframe df1.
Answer
import pandas as pd
import matplotlib.pyplot as plt
data = {'1990': [52, 64, 78, 94],
'2000': [340, 480, 688, 766],
'2010': [890, 560, 1102, 889]}
df1 = pd.DataFrame(data, index=['a', 'b', 'c', 'd'])
plt.scatter(df1['1990'], df1['2010'])
plt.show()
Given a data frame df1 as shown below :
1990 | 2000 | 2010 | |
---|---|---|---|
a | 52 | 340 | 890 |
b | 64 | 480 | 560 |
c | 78 | 688 | 1102 |
d | 94 | 766 | 889 |
Write code to create a line chart from the 1990 and 2000 columns of dataframe df1.
Answer
import pandas as pd
import matplotlib.pyplot as plt
data = {'1990': [52, 64, 78, 94],
'2000': [340, 480, 688, 766],
'2010': [890, 560, 1102, 889]}
df1 = pd.DataFrame(data, index=['a', 'b', 'c', 'd'])
plt.plot(df1['1990'], df1['2000'])
plt.show()
Given a data frame df1 as shown below :
1990 | 2000 | 2010 | |
---|---|---|---|
a | 52 | 340 | 890 |
b | 64 | 480 | 560 |
c | 78 | 688 | 1102 |
d | 94 | 766 | 889 |
Write code to create a bar chart plotting the three columns of dataframe df1.
Answer
import matplotlib.pyplot as plt
data = {'1990': [52, 64, 78, 94],
'2000': [340, 480, 688, 766],
'2010': [890, 560, 1102, 889]}
df1 = pd.DataFrame(data, index=['a', 'b', 'c', 'd'])
df1.plot(kind = 'bar')
plt.show()
The score of four teams in 5 IPL matches is available to you. Write a program to plot these in a bar chart.
Answer
import matplotlib.pyplot as plt
import numpy as np
Matches = ['Match 1', 'Match 2', 'Match 3', 'Match 4', 'Match 5']
Team_A = [150, 160, 170, 180, 190]
Team_B = [140, 150, 160, 170, 180]
Team_C = [130, 140, 150, 160, 170]
Team_D = [120, 130, 140, 150, 160]
X = np.arange(len(Matches))
plt.bar(Matches, Team_A, width = 0.15)
plt.bar(X + 0.15, Team_B, width = 0.15)
plt.bar(X + 0.30, Team_C, width = 0.15)
plt.bar(X + 0.45, Team_D, width = 0.15)
plt.xlabel('Matches')
plt.ylabel('Scores')
plt.title('IPL Scores')
plt.legend()
plt.show()
The score of a team in 5 IPL matches is available to you. Write a program to create a pie chart from this data, showing the last match's performance as a wedge.
Answer
import matplotlib.pyplot as plt
Matches = ['Match 1', 'Match 2', 'Match 3', 'Match 4', 'Match 5']
Team = [150, 160, 170, 180, 190]
expl = [0, 0, 0, 0, 0.2]
plt.pie(Team, labels = Matches, explode = expl)
plt.title('Team A Scores')
plt.show()
The prices of a stock for 3 months are given. Write a program to show the variations in prices for each month by 3 lines on same line chart. Make sure to add legends and labels. Show grid also.
Answer
import matplotlib.pyplot as plt
months = ['January', 'February', 'March']
prices_stock_A = [100, 120, 110]
prices_stock_B = [90, 110, 100]
prices_stock_C = [95, 115, 105]
plt.plot(months, prices_stock_A, label='Stock A', marker='o')
plt.plot(months, prices_stock_B, label='Stock B', marker='s')
plt.plot(months, prices_stock_C, label='Stock C', marker='^')
plt.xlabel('Months')
plt.ylabel('Prices')
plt.title('Stock Prices Variation')
plt.legend()
plt.grid(True)
plt.show()
A distribution data stores about 1000 random number. Write a program to create a scatter chart from this data with varying point sizes.
Answer
import numpy as np
import matplotlib.pyplot as plt
X = np.random.randint(1, 100, size = (1000,))
Y = np.random.randint(1, 100, size = (1000,))
sizes = np.random.randint(10, 100, size=100)
plt.scatter(X, Y, s = sizes, color = 'r')
plt.show()
Navya has started an online business. A list stores the number of orders in last 6 months. Write a program to plot this data on a horizontal bar chart.
Answer
import matplotlib.pyplot as plt
orders = [150, 200, 180, 250, 300, 220]
months = ['January', 'February', 'March', 'April', 'May', 'June']
plt.barh(months, orders)
plt.xlabel('Number of Orders')
plt.ylabel('Month')
plt.title('Number of Orders in Last 6 Months')
plt.show()
Given the following set of data :
Weight measurements for 16 small orders of French-fries (in grams).
78 72 69 81 63 67 65 75
79 74 71 83 71 79 80 69
Create a simple histogram from the above data.
Answer
import matplotlib.pyplot as plt
weights = [78, 72, 69, 81, 63, 67, 65, 75, 79, 74, 71, 83, 71, 79, 80, 69]
plt.hist(weights)
plt.title('Weight Distribution of French Fries Orders')
plt.show()
Given the following set of data :
Weight measurements for 16 small orders of French-fries (in grams).
78 72 69 81 63 67 65 75
79 74 71 83 71 79 80 69
Create a horizontal histogram from the above data.
Answer
import matplotlib.pyplot as plt
weights = [78, 72, 69, 81, 63, 67, 65, 75, 79, 74, 71, 83, 71, 79, 80, 69]
plt.hist(weights, orientation = 'horizontal')
plt.title('Weight Distribution of French Fries Orders')
plt.show()
Given the following set of data :
Weight measurements for 16 small orders of French-fries (in grams).
78 72 69 81 63 67 65 75
79 74 71 83 71 79 80 69
Create a step type of histogram from the above data.
Answer
import matplotlib.pyplot as plt
weights = [78, 72, 69, 81, 63, 67, 65, 75, 79, 74, 71, 83, 71, 79, 80, 69]
plt.hist(weights, histtype = 'step')
plt.title('Weight Distribution of French Fries Orders')
plt.show()
Given the following set of data :
Weight measurements for 16 small orders of French-fries (in grams).
78 72 69 81 63 67 65 75
79 74 71 83 71 79 80 69
Create a cumulative histogram from the above data.
Answer
import matplotlib.pyplot as plt
weights = [78, 72, 69, 81, 63, 67, 65, 75, 79, 74, 71, 83, 71, 79, 80, 69]
plt.hist(weights, cumulative = True)
plt.title('Weight Distribution of French Fries Orders')
plt.show()
Create an ndarray containing 16 values and then plot this array along with dataset of previous question in same histogram, normal histograms.
Answer
import numpy as np
import matplotlib.pyplot as plt
weights = [78, 72, 69, 81, 63, 67, 65, 75, 79, 74, 71, 83, 71, 79, 80, 69]
random_array = np.arange(16)
plt.hist(weights)
plt.hist(random_array)
plt.title('Normal Histograms')
plt.show()
Create an ndarray containing 16 values and then plot this array along with dataset of previous question in same histogram, cumulative histograms.
Answer
import numpy as np
import matplotlib.pyplot as plt
weights = [78, 72, 69, 81, 63, 67, 65, 75, 79, 74, 71, 83, 71, 79, 80, 69]
random_array = np.arange(16)
plt.hist(weights, cumulative = True)
plt.hist(random_array, cumulative = True)
plt.title('Cumulative Histograms')
plt.show()
Create an ndarray containing 16 values and then plot this array along with dataset of previous question in same histogram, horizontal histograms.
Answer
import numpy as np
import matplotlib.pyplot as plt
weights = [78, 72, 69, 81, 63, 67, 65, 75, 79, 74, 71, 83, 71, 79, 80, 69]
random_array = np.arange(16)
plt.hist(weights, orientation = 'horizontal')
plt.hist(random_array, orientation = 'horizontal')
plt.title('Horizontal Histograms')
plt.show()
Out of above plotted histograms, which ones can be used for creating frequency polygons ? Can you draw frequency polygons from all the above histograms ?
Answer
Out of the above plotted histograms, none can be used for creating frequency polygons. We cannot draw frequency polygons from all the above histograms because to construct a frequency polygon, we need a step-type histogram. A frequency polygon is constructed by connecting the midpoints of the tops of the bars of a histogram. Step-type histograms provide a clear outline to draw these connections.
Create/draw frequency polygon from the data used in above questions.
Answer
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
weights = [78, 72, 69, 81, 63, 67, 65, 75, 79, 74, 71, 83, 71, 79, 80, 69]
plt.figure(figsize = (10, 5))
n, edges, p = plt.hist(weights, bins = 40, histtype = 'step')
m = 0.5 * (edges[1:] + edges[:-1])
m = m.tolist()
l = len(m)
m.insert(0, m[0] - 10)
m.append(m[l-1] + 10)
n = n.tolist()
n.insert(0, 0)
n.append(0)
plt.plot(m, n, '-^')
plt.show()
From the following ordered set of data :
63, 65, 67, 69, 69, 71, 71, 72, 74, 75, 78, 79, 79, 80, 81, 83
Create a horizontal boxplot.
Answer
import matplotlib.pyplot as plt
data = [63, 65, 67, 69, 69, 71, 71, 72, 74, 75, 78, 79, 79, 80, 81, 83]
plt.boxplot(data, vert = False)
plt.show()
From the following ordered set of data :
63, 65, 67, 69, 69, 71, 71, 72, 74, 75, 78, 79, 79, 80, 81, 83
Create a vertical boxplot.
Answer
import matplotlib.pyplot as plt
data = [63, 65, 67, 69, 69, 71, 71, 72, 74, 75, 78, 79, 79, 80, 81, 83]
plt.boxplot(data)
plt.show()
From the following ordered set of data :
63, 65, 67, 69, 69, 71, 71, 72, 74, 75, 78, 79, 79, 80, 81, 83
Show means in the boxplot.
Answer
import matplotlib.pyplot as plt
data = [63, 65, 67, 69, 69, 71, 71, 72, 74, 75, 78, 79, 79, 80, 81, 83]
plt.boxplot(data, showmeans = True)
plt.show()
From the following ordered set of data :
63, 65, 67, 69, 69, 71, 71, 72, 74, 75, 78, 79, 79, 80, 81, 83
Create boxplot without the box.
Answer
import matplotlib.pyplot as plt
data = [63, 65, 67, 69, 69, 71, 71, 72, 74, 75, 78, 79, 79, 80, 81, 83]
plt.boxplot(data, showbox = False)
plt.show()
Sina has created ordered set of data from the number of new customers registered on his online service centre in last 20 months.
Write a program to plot this data on a filled boxplot with means shown.
Answer
import matplotlib.pyplot as plt
data = [100, 120, 95, 110, 105, 130, 115, 125, 135, 140, 120, 110, 105, 115, 130, 125, 110, 115, 120, 135]
plt.boxplot(data, patch_artist = True, showmeans = True)
plt.title('Number of New Customers Registered in Last 20 Months')
plt.show()