Identify data required to be maintained to perform the following services:
- Declare exam results and print e-certificates.
- Register participants in an exhibition and issue biometric ID cards.
- To search for an image by a search engine.
- To book an OPD appointment with a hospital in a specific department.
Answer
Declare exam results and print e-certificates.
Data required: StudentName, Class, RollNo, exam scores, course information, subject information, certificate template.Register participants in an exhibition and issue biometric ID cards :
Data required: Participant Name, mobile number, email, organization, ID card template.To search for an image by a search engine :
Data required: Image Tags, Image URL.To book an OPD appointment with a hospital in a specific department :
Data required: Patient Name, mobile number, age, gender, address, medical history, department name.
A school having 500 students wants to identify beneficiaries of the merit-cum means scholarship, achieving more than 75% for two consecutive years and having family income less than 5 lakh per annum. Briefly describe data processing steps to be taken by the school to prepare the beneficiaries list of school.
Answer
Data processing steps:
- Collect student information including name, address, academic qualifications, marks sheets for the last two consecutive years, family income certificate, phone number, email.
- Verify the accuracy and completeness of the collected data. Check if the student has achieved more than 75% in their academics for two consecutive years. Verify the family income of each student to ensure it is less than 5 lakh per annum.
- Identify students who meet the eligibility criteria of achieving more than 75% for two consecutive years and having a family income less than 5 lakh per annum and make a list.
A bank 'xyz' wants to know about its popularity among the residents of a city 'ABC' on the basis of number of bank accounts each family has and the average monthly account balance of each person. Briefly describe the steps to be taken for collecting data and what results can be checked through processing of the collected data.
Answer
- Conduct a survey of households in city 'ABC' to gather data on the number of bank accounts each family holds and the average monthly account balance per person.
- Ensure the accuracy and completeness of the collected data by cross-checking and validating responses.
- The results that can be checked through processing of the collected data are as follows:
- Average number of bank accounts per family for bank 'xyz'.
- Average monthly account balance per person for bank 'xyz' customers.
- Popularity index of bank 'xyz' compared to competitor banks based on these metrics.
- Demographic profiles of bank xyz's account holders, including age distribution, income brackets, and other relevant characteristics.
- Trends and patterns in account ownership and account balances across different demographic segments.
Identify type of data being collected/generated in the following scenarios:
- Recording a video
- Marking attendance by teacher
- Writing tweets
- Filling an application form online
Answer
- Recording a video — This generates unstructured data.
- Marking attendance by teacher — This generates structured data.
- Writing tweets — This generates unstructured data.
- Filling an application form online — This generates structured data.
Consider the temperature (in Celsius) of 7 days of a week as 34, 34, 27, 28, 27, 34, 34. Identify the appropriate statistical technique to be used to calculate the following:
- Find the average temperature.
- Find the temperature Range of that week.
- Find the standard deviation temperature.
Answer
- Find the average temperature — Mean
- Find the temperature Range of that week — Range = Maximum Temperature - Minimum Temperature
- Find the standard deviation temperature — Standard deviation
A school teacher wants to analyse results. Identify the appropriate statistical technique to be used along with its justification for the following cases:
Teacher wants to compare performance in terms of division secured by students in Class XII A and Class XII B where each class strength is same.
Teacher has conducted five unit tests for that class in months July to November and wants to compare the class performance in these five months.
Answer
Standard deviation is the statistical technique used for comparing the performance of students in Class XII A and Class XII B. It considers all division scores, measuring their spread around the given data. Smaller value of standard deviation means data are less spread while a larger value of standard deviation means data are more spread. By comparing standard deviations, the teacher can statistically assess the performance differences between the two classes, helping to identify trends and the overall variability in division scores.
To compare class performance in five unit tests from July to November, calculate the mean of students marks for each month. Compare the mean marks across successive months, if the mean increases, the class performance is improving, while a decrease indicates a lack of improvement.
Suppose annual day of your school is to be celebrated. The school has decided to felicitate those parents of the students studying in classes XI and XII, who are the alumni of the same school. In this context, answer the following questions:
Which statistical technique should be used to find out the number of students whose both parents are alumni of this school ?
How varied are the age of parents of the students of that school ?
Answer
The statistical technique that should be used to find out the number of students whose both parents are alumni of the school is the mode, which involves counting how many students in classes XI and XII fit this criterion. The mode represents the most frequently occurring value.
To determine the variation in the age of parents of students at the school, the appropriate statistical technique is standard deviation.
For the annual day celebrations, the teacher is looking for an anchor in a class of 42 students. The teacher would make selection of an anchor on the basis of singing skill, writing skill, as well as monitoring skill.
Which mode of data collection should be used?
How would you represent the skill of students as data?
Answer
The appropriate mode of data collection for selecting an anchor based on singing skill, writing skill, and monitoring skill would be direct observation and evaluation by the teacher. This can involve conducting auditions or assessments where each student demonstrates their skills in singing, writing, and monitoring.
The skills of students can be represented as structured data in categorical form using a scale of 1 to 5 with 1 being the lowest and 5 being the highest. Create a structured data format with columns for Singing Proficiency, Writing Proficiency, and Monitoring Proficiency. Each student is then categorized based on their proficiency level in these areas.
Differentiate between structured and unstructured data giving one example.
The principal of a school wants to do following analysis on the basis of food items procured and sold in the canteen:
- Compare the purchase and sale price of fruit juice and biscuits.
- Compare sales of fruit juice, biscuits and samosa.
- Variation in sale price of fruit juices of different companies for same quantity (in ml).
Create an appropriate dataset for these items (fruit juice, biscuits, samosa) by listing their purchase price and sale price. Apply basic statistical techniques to make the comparisons.
Answer
Structured Data | Unstructured Data |
---|---|
Data which is organized and can be recorded in a well defined format is called structured data. | Data that is not organized and does not have a defined format is called unstructured data. |
An example of structured data is a spreadsheet containing columns for item names, purchase prices, and sale prices. | An example of unstructured data include web pages consisting of text as well as multimedia contents (image, graphics, audio/video). |
Sample Dataset for Fruit Juice, Biscuits and Samosa
Product | Company | Sales price per unit | Purchase price per unit | Units Sold |
---|---|---|---|---|
Fruit Juice (250 ml) | Real | 40 | 35 | 10 |
Fruit Juice (250 ml) | Tropicana | 45 | 30 | 15 |
Samosa | Samosa Party | 30 | 27 | 20 |
Samosa | Chaayos | 36 | 30 | 25 |
Samosa | Chai Point | 33 | 29 | 15 |
Biscuit | Parle | 10 | 6 | 30 |
Biscuit | Britannia | 15 | 12 | 20 |
Biscuit | Sunfeast | 20 | 17 | 15 |
For Fruit Juice:
Mean Purchase Price = (35 + 30) / 2 = 32.5
Mean Sale Price = (40 + 45) / 2 = 42.5
Difference = Mean Sale Price - Mean Purchase Price = 42.5 - 32.5 = 10
The mean purchase price of fruit juice is 10 rupees less than the mean sales price of fruit juice.
For Biscuits:
Mean Purchase Price = (6 + 12 + 17) / 3 = 11.67
Mean Sale Price = (10 + 15 + 20) / 3 = 15
Difference = Mean Sale Price - Mean Purchase Price = 15 - 11.67 = 3.33
The mean purchase price of biscuits is 3.33 rupees less than the mean sales price of biscuits.For Fruit Juice:
Total units sold = 10 (Real) + 15 (Tropicana) = 25 units
For Biscuits:
Total units sold = 30 (Parle) + 20 (Britannia) + 15 (Sunfeast) = 65 units
For Samosas:
Total units sold = 20 (Samosa Party) + 25 (Chaayos) + 15 (Chai Point) = 60 units
Comparing the total units sold:
Biscuits had the highest sales with 65 units sold.
Samosas had the second-highest sales with 60 units sold.
Fruit juice had the lowest sales with 25 units sold.Sale price of Real Fruit Juice (250 ml): 40
Sale price of Tropicana Fruit Juice (250 ml): 45
Difference = Sale price of Tropicana - Sale price of Real
Difference = 45 - 40 = 5
The sale price of Tropicana fruit juice is 5 rupees higher than the sale price of Real fruit juice for the same quantity (250 ml).