1. What is the primary goal of Big Data analytics?
-
To store data in a compact format.
-
To generate insights from small datasets.
-
To process and analyze large volumes of structured and unstructured data.
-
To visualize data in 3D.
2. Which of the following is a key feature of machine learning?
-
The ability to automatically improve with experience.
-
The ability to create visual representations of data.
-
The ability to make decisions based on pre-programmed rules.
-
The ability to interpret data as images.
3. What type of data processing involves transforming raw data into meaningful insights?
-
Data cleansing
-
Data visualization
-
Data engineering
-
Data analysis
4. Which algorithm is commonly used for supervised learning in machine learning?
-
K-means clustering
-
Decision Trees
-
Apriori Algorithm
-
Naive Bayes
5. What is Big Data?
-
Large volumes of data that cannot be processed by traditional data processing tools.
-
Data stored in small databases.
-
Structured data stored in a relational database.
-
Data that is processed by manual methods.
6. Which of the following is a machine learning technique used for unsupervised learning?
-
Linear Regression
-
K-means clustering
-
Logistic Regression
-
Random Forest
7. What does a decision tree model represent in machine learning?
-
A flowchart of decisions and their possible consequences.
-
A collection of unsorted data points.
-
A method for clustering data.
-
A graph of relationships between different classes.
8. Which of these techniques is used for handling missing data in a dataset?
-
Imputation
-
Clustering
-
Normalization
-
Classification
9. What is feature selection in the context of Big Data analytics?
-
The process of scaling data for analysis.
-
The process of storing data in a smaller format.
-
The process of cleaning data by removing missing values.
-
The process of selecting a subset of relevant features from a large dataset.
10. Which of the following is a major challenge in Big Data analytics?
-
Lack of storage capacity
-
Inconsistent data formats and structures
-
Limited computational power
-
Availability of small datasets
11. Which algorithm is used to make predictions based on historical data in machine learning?
-
Regression
-
Clustering
-
Classification
-
Association
12. What is the purpose of cross-validation in machine learning?
-
To increase the size of the dataset.
-
To evaluate the performance of a model on different subsets of data.
-
To optimize the storage of data.
-
To reduce the dimensionality of data.
13. Which of the following is a technique used for dimensionality reduction in machine learning?
-
Principal Component Analysis (PCA)
-
Decision Trees
-
Random Forests
-
K-Nearest Neighbors
14. What is the primary goal of clustering in Big Data analytics?
-
To group similar data points together.
-
To predict future trends in data.
-
To transform data into visual representations.
-
To store data in structured formats.
15. Which type of machine learning algorithm is used for classification problems?
-
Linear Regression
-
K-Nearest Neighbors
-
K-means clustering
-
Decision Trees
16. What is Big Data analytics primarily used for in business?
-
To create small datasets for easy analysis.
-
To make sense of large amounts of unstructured data and generate insights.
-
To optimize database performance.
-
To summarize data using basic statistics.
17. Which of the following is an example of a supervised learning algorithm?
-
K-means clustering
-
Random Forest
-
DBSCAN
-
Apriori Algorithm
18. What is the main advantage of using a Random Forest model in machine learning?
-
It helps in regression and classification problems by combining multiple decision trees.
-
It performs well on small datasets.
-
It is used primarily for clustering problems.
-
It is faster than decision trees for training.
19. What is the purpose of the 'k' in k-means clustering?
-
It defines the number of clusters to divide the dataset into.
-
It is used to scale the features of the data.
-
It defines the number of nearest neighbors to use.
-
It is used to evaluate model accuracy.
20. What is the purpose of using ensemble methods like bagging and boosting?
-
To simplify the data storage process.
-
To reduce the computational complexity of models.
-
To preprocess data before analysis.
-
To combine the predictions of multiple models to improve accuracy.
21. What does the term "Big Data" primarily refer to in the context of analytics?
-
Data that is too large or complex for traditional data-processing techniques to handle.
-
Data stored in a compressed file format.
-
Data that can be processed on a personal computer.
-
Data that is available in real-time.
22. Which of the following is a popular framework for processing large-scale data in Big Data analytics?
-
TensorFlow
-
Apache Spark
-
NLTK
-
OpenCV
23. Which of the following describes the process of data normalization?
-
Scaling data to a specific range to ensure it is comparable.
-
Converting categorical data into numerical values.
-
Reducing the dimensionality of the data.
-
Splitting data into training and test sets.
24. What is a neural network used for in machine learning?
-
To optimize machine learning algorithms.
-
To store data in a database.
-
To model complex relationships and make predictions based on data.
-
To process unstructured data only.
25. What is the purpose of dimensionality reduction in machine learning?
-
To reduce the number of input features in a dataset while retaining important information.
-
To increase the size of a dataset.
-
To remove noise from a dataset.
-
To group similar data points together.
26. What does the term 'bias' refer to in machine learning?
-
A measure of a modelβs complexity.
-
A method of improving model performance.
-
A technique for preprocessing data.
-
An error introduced by the modelβs assumptions.
27. What is the purpose of a confusion matrix in machine learning?
-
To evaluate the performance of a classification model.
-
To calculate the training time of a model.
-
To improve the accuracy of the dataset.
-
To visualize the distribution of data points.
28. What is the primary advantage of using deep learning over traditional machine learning techniques in Big Data analytics?
-
Deep learning models are simpler and faster.
-
Deep learning can automatically extract features from large datasets without manual feature engineering.
-
Deep learning is not suitable for unstructured data.
-
Deep learning models require less data.
29. What does the term 'scalability' mean in the context of Big Data processing?
-
The ability of a system to handle an increasing amount of work or data.
-
The ability to store data in smaller units.
-
The ability to visualize complex data.
-
The ability to reduce the data processing time.
30. Which of the following is an example of an unsupervised learning technique in machine learning?
-
Support vector machines
-
Linear regression
-
K-means clustering
-
Decision trees