SE-AIS Final Sheet
![]() |
![]() |
![]() |
Title of test:![]() SE-AIS Final Sheet Description: Advanced Topic In Information System |




New Comment |
---|
NO RECORDS |
It is the procedure to find patterns and necessary details from huge amount of data collected from various sources for a period of time. Data mining. Web content-mining. Web structure-mining. Web usage-mining. It is the process finding patterns- models or knowledge from the contents of a web page. Data mining. Web content-mining. Web structure-mining. Web usage-mining. It is the process of recognizing the underlying correlations among the web pages and other online objects. Data mining. Web content-mining. Web structure-mining. Web usage-mining. It is the process of mining browsing patterns from the usage information of the customers. Data mining. Web content-mining. Web structure-mining. Web usage-mining. How would they know the things they are buying are of good quality and whether they serve well?. Discriminative classifiers. Generative classifiers. Customer information. Build a function from an input set to class label. In this Phase- the crawler module to collect data through 10 information given . Training phase. Data Preprocessing - Filtering. Classification. J48 Decision Tree. Outliers and extreme values using weka. Training phase. Data Preprocessing - Filtering. Classification. J48 Decision Tree. Determine the most significant attribute. Training phase. Data Preprocessing - Filtering. Classification. J48 Decision Tree. It is a discriminative classifier. Training phase. Data Preprocessing - Filtering. Classification. J48 Decision Tree. It is a simple probabilistic classifier to predict the probabilities of class labels. Naive Bayes. Multilayer Perception. Python SciPy. print(dataset.describe()). It is an implementation of Neural Network which has two nonlinear activation functions each of which maps weighted inputs to the output of each neuron. Naive Bayes. Multilayer Perception. Python SciPy. print(dataset.describe()). It is the most useful package for machine learning in Python. Naive Bayes. Multilayer Perception. Python SciPy. print(dataset.describe()). Show all of the numerical values that have the same scale (Statistical Summary). Naive Bayes. Multilayer Perception. Python SciPy. print(dataset.describe()). Show each class that has the same number of instances (class distribution). print(dataset.groupby(-class-).size()). Data Visualization: Univariate Plots. Data Visualization: Multivariate Plots. Principal component analysis (PCA). box- whisker- and histograms plots. print(dataset.groupby(-class-).size()). Data Visualization: Univariate Plots. Data Visualization: Multivariate Plots. Principal component analysis (PCA). scatter plot matrix. print(dataset.groupby(-class-).size()). Data Visualization: Univariate Plots. Data Visualization: Multivariate Plots. Principal component analysis (PCA). It is to reduce the dimensionality. print(dataset.groupby(-class-).size()). Data Visualization: Univariate Plots. Data Visualization: Multivariate Plots. Principal component analysis (PCA). reducing a lot of the dimensionality of the data set. Principal component analysis (PCA). SMOTEBoost technique. Equal Width binning. Normalization. Can be used in the imbalanced data. Principal component analysis (PCA). SMOTEBoost technique. Equal Width binning. Normalization. Can be used in search dynamically for the optimal width and number of bins for the target class. Principal component analysis (PCA). SMOTEBoost technique. Equal Width binning. Normalization. Can be used in data set consists of attributes with different scales and units. Principal component analysis (PCA). SMOTEBoost technique. Equal Width binning. Normalization. Can be used in learning from each instance from other known outputs of the data set. K-Nearest Neighbors (K-NN). Euclidian distance. Recommender systems. An Example of Recommender systems. Can be used in measuring the similarity metric amongst neighbors. K-Nearest Neighbors (K-NN). Euclidian distance. Recommender systems. An Example of Recommender systems. These basic models to content-based and collaborative filtering. K-Nearest Neighbors (K-NN). Euclidian distance. Recommender systems. An Example of Recommender systems. YouTube uses it to decide which video to play next on autoplay. K-Nearest Neighbors (K-NN). Euclidian distance. Recommender systems. An Example of Recommender systems. They are based on popularity or average audience. Simple recommenders. Content-based recommenders. Collaborative filtering recommenders. The first step of recommendation function. They are based on a particular item metadata. Simple recommenders. Content-based recommenders. Collaborative filtering recommenders. The first step of recommendation function. They are based preference that a user would give an item-based on past ratings and preferences of other users. Simple recommenders. Content-based recommenders. Collaborative filtering recommenders. The first step of recommendation function. Get the index. Simple recommenders. Content-based recommenders. Collaborative filtering recommenders. The first step of recommendation function. Get the list of cosine similarity scores. The second step of recommendation function. The third step of recommendation function. The fourth step of recommendation function. Null-based Completeness (NBC). Sort the list of tuples. The second step of recommendation function. The third step of recommendation function. The fourth step of recommendation function. Null-based Completeness (NBC). Get the top 10 elements of this list. The second step of recommendation function. The third step of recommendation function. The fourth step of recommendation function. Null-based Completeness (NBC). It Measures the values that are missing- normally presented as nulls. The second step of recommendation function. The third step of recommendation function. The fourth step of recommendation function. Null-based Completeness (NBC). It measures the tuples or records that are missing. Tuple-based Completeness(TBC). Schema-based Completeness (SBC). Population-based Completeness (PBC). Integrated database. It measures the missing schema elements like attribute and entities. Tuple-based Completeness(TBC). Schema-based Completeness (SBC). Population-based Completeness (PBC). Integrated database. It measures the missing individuals from datasets under measure. Tuple-based Completeness(TBC). Schema-based Completeness (SBC). Population-based Completeness (PBC). Integrated database. Can be used where the participants must agree to a unified view of data structure that is transparent to all. Tuple-based Completeness(TBC). Schema-based Completeness (SBC). Population-based Completeness (PBC). Integrated database. It is the problem of studying an agent in an environment- the agent has to interact with the environment in order to maximize some cumulative rewards. Reinforcement Learning. Markov Decision Process (MDP). Optimal Value Functions and Policy. Dynamic Programming. It is a collection of States- Actions- Transition Probabilities-Rewards- Discount Factor: (S- A- P- R- ?). Reinforcement Learning. Markov Decision Process (MDP). Optimal Value Functions and Policy. Dynamic Programming. It is a function that gives the maximum value at each state among all policies. Reinforcement Learning. Markov Decision Process (MDP). Optimal Value Functions and Policy. Dynamic Programming. It is mainly an optimization over plain recursion. Reinforcement Learning. Markov Decision Process (MDP). Optimal Value Functions and Policy. Dynamic Programming. Can be used to solve the lack of delivery of goods to a distant place or a relatively long time and high cost of providing the purchased product hinders further development of ecommerce. Cooperation between online shops dealing with cross-border trade. Petri net. Funnel analysis. Process mining. It is one of several mathematical modeling languages for the description of distributed systems. It is a class of discrete event dynamic system. Cooperation between online shops dealing with cross-border trade. Petri net. Funnel analysis. Process mining. It is the mapping and analysis of a series of events that lead towards a defined goal- like completing the sign-up or making a purchase (understand user behavior). Cooperation between online shops dealing with cross-border trade. Petri net. Funnel analysis. Process mining. It is a set of techniques used for obtaining knowledge of and extracting insights from processes by the means of analyzing the event data- generated during the execution of the process. Cooperation between online shops dealing with cross-border trade. Petri net. Funnel analysis. Process mining. It is converting an event log into a process model. Process discovery. Conformance checking. throughput analysis/bottleneck detection. pm4py library. It is investigating the differences between the model and what happens in real life. Process discovery. Conformance checking. throughput analysis/bottleneck detection. pm4py library. It is accounting for the intensity of events’ execution (measured by time spent to complete a particular event). Process discovery. Conformance checking. throughput analysis/bottleneck detection. pm4py library. It is used to process discovery algorithms in Python. Process discovery. Conformance checking. throughput analysis/bottleneck detection. pm4py library. It is the algorithm scans the traces (sequences in the event log) for ordering relations and builds the footprint matrix. Alpha Miner. Heuristic Miner. Inductive Miner. An experience economy. It is an improvement of the Alpha Miner algorithm and acts on the Directly-Follows Graph. It can be converted into a Petri net. Alpha Miner. Heuristic Miner. Inductive Miner. An experience economy. It is an improvement of both the Alpha Miner and Heuristics Miner. Alpha Miner. Heuristic Miner. Inductive Miner. An experience economy. They need to create memorable events—“the experience”—and that is what customers are willing to pay for. Alpha Miner. Heuristic Miner. Inductive Miner. An experience economy. It is a process of delivering to each customer the right production the right place and at the right time. Personalization. The model that employs process mining- recommender systems- and big data analysis. Behavioral data Profile. Demographic data Profile. Can be used to solve the breakthrough concerns the way enormous volumes of data concerning customers can be gathered and analyzed . Personalization. The model that employs process mining- recommender systems- and big data analysis. Behavioral data Profile. Demographic data Profile. purchase history (products or services purchased by the customer); products viewed but not purchased; products added to the cart but eventually abandoned; products being searched for;. Personalization. The model that employs process mining- recommender systems- and big data analysis. Behavioral data Profile. Demographic data Profile. Age- gender- residential area (address)- education- occupation;. Personalization. The model that employs process mining- recommender systems- and big data analysis. Behavioral data Profile. Demographic data Profile. Interests (movies- music- books- hobbies)- friends. Social profile. Social media profile. Lifestyle data Profile. Family details Profile. Activities on social media (e.g. Facebook likes and dislikes-Twitter followings- etc.);. Social profile. Social media profile. Lifestyle data Profile. Family details Profile. Type of property owned- pets;. Social profile. Social media profile. Lifestyle data Profile. Family details Profile. Marital status- children;. Social profile. Social media profile. Lifestyle data Profile. Family details Profile. e.g. smartphone brand;. Device-related data Profile. Psychographics Profile. Personal wishes Profile. Contextual data Profile. Religious and political views;. Device-related data Profile. Psychographics Profile. Personal wishes Profile. Contextual data Profile. Expectations and interests expressed directly by the customer;. Device-related data Profile. Psychographics Profile. Personal wishes Profile. Contextual data Profile. e.g. customer’s location-related data such as current weather or social events being held. Device-related data Profile. Psychographics Profile. Personal wishes Profile. Contextual data Profile. Color- lighting level- appearance of objects (size and shape). Visual atmospheric dimension such as. Aural atmospheric dimension such as. Olfactory atmospheric dimension such as. Tactile atmospheric dimension such as. Volume- pitch- tempo and style of sounds. Visual atmospheric dimension such as. Aural atmospheric dimension such as. Olfactory atmospheric dimension such as. Tactile atmospheric dimension such as. Nature and intensity of sound. Visual atmospheric dimension such as. Aural atmospheric dimension such as. Olfactory atmospheric dimension such as. Tactile atmospheric dimension such as. Temperature- texture and contact. Visual atmospheric dimension such as. Aural atmospheric dimension such as. Olfactory atmospheric dimension such as. Tactile atmospheric dimension such as. Nature and intensity of taste sensations. Taste atmospheric dimension such as. Aural atmospheric dimension such as. Olfactory atmospheric dimension such as. Tactile atmospheric dimension such as. It can solve Complicated calculation for image processing and the inability to control all places- due to a limited image. Taste atmospheric dimension such as. Historical data in repeatedly visited areas. Olfactory atmospheric dimension such as. Tactile atmospheric dimension such as. It is the procedure to find patterns and necessary details from huge amount of data collected from various sources for a period of time. Means That Data mining. TRUE. FALSE. It is the process finding patterns- models or knowledge from the contents of a web page. Means That Web content-mining. TRUE. FALSE. It is the process of recognizing the underlying correlations among the web pages and other online objects. Means That Web structure-mining. TRUE. FALSE. It is the process of mining browsing patterns from the usage information of the customers Means That Web usage-mining. TRUE. FALSE. Build a function from an input set to class label. Means That Web usage-mining. TRUE. FALSE. Build a model of a joint probability and predict the class label of an input instance using Bayes rules Means That Generative classifiers. TRUE. FALSE. It refers to the personal data of the customers- commodity information refers to the product features such as priceamount left etc. and server information refers to the cookieslogs generated by a user session. Means That Generative classifiers. TRUE. FALSE. This problem can be solved using Data Collection-Preprocessing and then Classification. Means That Customer information. TRUE. FALSE. In this Phase- the crawler module to collect data through 10 information given . Means That Training phase. TRUE. FALSE. Outliers and extreme values using weka Means That Data Preprocessing - Filtering. TRUE. FALSE. Determine the most significant attribute Means That Data Preprocessing - Filtering. TRUE. FALSE. It is a discriminative classifier. Means That Classification. TRUE. FALSE. It is a simple probabilistic classifier to predict the probabilities of class labels. Means That Naive Bayes. TRUE. FALSE. It is an implementation of Neural Network which has two nonlinear activation functions each of which maps weighted inputs to the output of each neuron. Means That Naive Bayes. TRUE. FALSE. It is the most useful package for machine learning in Python. Means That Multilayer Perception. TRUE. FALSE. Show all of the numerical values that have the same scale (Statistical Summary) Means That Python SciPy. TRUE. FALSE. Show each class that has the same number of instances (class distribution) Means That print(dataset.describe()). TRUE. FALSE. box- whisker- and histograms plots Means That Data Visualization: Univariate Plots. TRUE. FALSE. scatter plot matrix Means That Data Visualization: Univariate Plots. TRUE. FALSE. It is to reduce the dimensionality Means That Principal component analysis (PCA). TRUE. FALSE. reducing a lot of the dimensionality of the data set. Means That Principal component analysis (PCA). TRUE. FALSE. Can be used in the imbalanced data. Means That Principal component analysis (PCA). TRUE. FALSE. Can be used in search dynamically for the optimal width and number of bins for the target class. Means That SMOTEBoost technique. TRUE. FALSE. Can be used in data set consists of attributes with different scales and units. Means That Normalization. TRUE. FALSE. Can be used in learning from each instance from other known outputs of the data set. Means That K-Nearest Neighbors (KNN). TRUE. FALSE. Can be used in measuring the similarity metric amongst neighbors. Means That Euclidian distance. TRUE. FALSE. These basic models to content-based and collaborative filtering Means That Euclidian distance. TRUE. FALSE. YouTube uses it to decide which video to play next on autoplay. Means That Recommender systems. TRUE. FALSE. They are based on popularity or average audience. Means That An Example of Recommender systems. TRUE. FALSE. They are based on a particular item metadata. Means That Content-based recommenders. TRUE. FALSE. They are based preference that a user would give an itembased on past ratings and preferences of other users. Means That Content-based recommenders. TRUE. FALSE. Get the index. Means That Collaborative filtering recommenders. TRUE. FALSE. Get the list of cosine similarity scores Means That The first step of recommendation function. TRUE. FALSE. Sort the list of tuples Means That The third step of recommendation function. TRUE. FALSE. Get the top 10 elements of this list. Means That The third step of recommendation function. TRUE. FALSE. It Measures the values that are missing- normally presented as nulls Means That The fourth step of recommendation function. TRUE. FALSE. It measures the tuples or records that are missing. Means That Tuple-based Completeness(TBC). TRUE. FALSE. It measures the missing schema elements like attribute and entities. Means That Schema-based Completeness (SBC). TRUE. FALSE. It measures the missing individuals from datasets under measure. Means That Schema-based Completeness (SBC). TRUE. FALSE. Can be used where the participants must agree to a unified view of data structure that is transparent to all Means That Population-based Completeness (PBC). TRUE. FALSE. It is the problem of studying an agent in an environment- the agent has to interact with the environment in order to maximize some cumulative rewards. Means That Reinforcement Learning. TRUE. FALSE. It is a collection of States- Actions- Transition Probabilities-Rewards- Discount Factor: (S- A- P- R- ?) Means That Reinforcement Learning. TRUE. FALSE. It is a function that gives the maximum value at each state among all policies Means That Optimal Value Functions and Policy. TRUE. FALSE. It is mainly an optimization over plain recursion. Means That Dynamic Programming. TRUE. FALSE. Can be used to solve the lack of delivery of goods to a distant place or a relatively long time and high cost of providing the purchased product hinders further development of ecommerce. Means That Dynamic Programming. TRUE. FALSE. It is one of several mathematical modeling languages for the description of distributed systems. It is a class of discrete event dynamic system. Means That Cooperation between online shops dealing with cross-border trade. TRUE. FALSE. It is the mapping and analysis of a series of events that lead towards a defined goal- like completing the sign-up or making a purchase (understand user behavior). Means That Petri net. TRUE. FALSE. It is a set of techniques used for obtaining knowledge of and extracting insights from processes by the means of analyzing the event data- generated during the execution of the process. Means That Funnel analysis. TRUE. FALSE. It is converting an event log into a process model. Means That Process discovery. TRUE. FALSE. It is investigating the differences between the model and what happens in real life. Means That Conformance checking. TRUE. FALSE. It is accounting for the intensity of events’ execution (measured by time spent to complete a particular event). Means That throughput analysis/bottleneck detection. TRUE. FALSE. It is used to process discovery algorithms in Python Means That pm4py library. TRUE. FALSE. It is the algorithm scans the traces (sequences in the event log) for ordering relations and builds the footprint matrix. Means That Alpha Miner. TRUE. FALSE. It is an improvement of the Alpha Miner algorithm and acts on the Directly-Follows Graph. It can be converted into a Petrinet. Means That Heuristic Miner. TRUE. FALSE. It is an improvement of both the Alpha Miner and HeuristicsMiner. Means That Inductive Miner. TRUE. FALSE. They need to create memorable events—“the experience”—and that is what customers are willing to pay for Means That Inductive Miner. TRUE. FALSE. It is a process of delivering to each customer the right production the right place and at the right time. Means That Personalization. TRUE. FALSE. Can be used to solve the breakthrough concerns the way enormous volumes of data concerning customers can be gathered and analyzed . Means That The model that employs process mining- recommender systems- and big data analysis. TRUE. FALSE. purchase history (products or services purchased by the customer); products viewed but not purchased; products added to the cart but eventually abandoned; products being searched for; Means That Behavioral data Profile. TRUE. FALSE. Age- gender- residential area (address)- education-occupation; Means That Demographic data Profile. TRUE. FALSE. Interests (movies- music- books- hobbies)- friends Means That Demographic data Profile. TRUE. FALSE. Activities on social media (e.g. Facebook likes and dislikes-Twitter followings- etc.); Means That Social media profile. TRUE. FALSE. Type of property owned- pets; Means That Lifestyle data Profile. TRUE. FALSE. Marital status- children; Means That Family details Profile. TRUE. FALSE. e.g. smartphone brand; Means That Device-related data Profile. TRUE. FALSE. Religious and political views; Means That Psychographics Profile. TRUE. FALSE. Expectations and interests expressed directly by the customer;Means That Personal wishes Profile. TRUE. FALSE. e.g. customer’s location-related data such as current weather or social events being held. Means That Personal wishes Profile. TRUE. FALSE. Color- lighting level- appearance of objects (size and shape) Means That Contextual data Profile. TRUE. FALSE. Volume- pitch- tempo and style of sounds Means That Aural atmospheric dimension such as. TRUE. FALSE. Nature and intensity of sound Means That Aural atmospheric dimension such as. TRUE. FALSE. Temperature- texture and contact Means That Olfactory atmospheric dimension such as. TRUE. FALSE. Nature and intensity of taste sensations Means That Taste atmospheric dimension such as. TRUE. FALSE. It can solve Complicated calculation for image processing and the inability to control all places- due to a limited image Means That Historical data in repeatedly visited areas. TRUE. FALSE. To get a quick idea of how many instances (rows) and how many attributes (columns) the data contains with the shape property. print(dataset.shape). print(dataset.head(20)). print(dataset.describe()). print(dataset.groupby(-class-).size()). a good idea to actually eyeball your data. print(dataset.shape). print(dataset.head(20)). print(dataset.describe()). print(dataset.groupby(-class-).size()). To include the count- mean- the min and max values as well as some percentiles. print(dataset.shape). print(dataset.head(20)). print(dataset.describe()). print(dataset.groupby(-class-).size()). To look at the number of instances (rows) that belong to each class. print(dataset.shape). print(dataset.head(20)). print(dataset.describe()). print(dataset.groupby(-class-).size()). To create a validation Dataset. X_train- X_validation- Y_train- Y_validation = train_test_split(Xy-test_size=0.20- random_state=1). Support Vector Machines (SVM). LR and LDA. KNN- CART- NB and SVM. One of the algorithms to classify iris flowers is. X_train- X_validation- Y_train- Y_validation = train_test_split(Xy-test_size=0.20- random_state=1). Support Vector Machines (SVM). LR and LDA. KNN- CART- NB and SVM. They are simple linear algorithms. X_train- X_validation- Y_train- Y_validation = train_test_split(Xy-test_size=0.20- random_state=1). Support Vector Machines (SVM). LR and LDA. KNN- CART- NB and SVM. They are simple nonlinear algorithms. X_train- X_validation- Y_train- Y_validation = train_test_split(Xy-test_size=0.20- random_state=1). Support Vector Machines (SVM). LR and LDA. KNN- CART- NB and SVM. To compare algorithms. pyplot.boxplot(results- labels=names). SVC(gamma=-auto-)- model.fit(X_train- Y_train)-model.predict(X_validation). print(confusion_matrix(Y_validation- predictions)). Decide on the metric or score to rate movies on. To Make predictions on validation dataset you can use. pyplot.boxplot(results- labels=names). SVC(gamma=-auto-)- model.fit(X_train- Y_train)-model.predict(X_validation). print(confusion_matrix(Y_validation- predictions)). Decide on the metric or score to rate movies on. To evaluate predictions. pyplot.boxplot(results- labels=names). SVC(gamma=-auto-)- model.fit(X_train- Y_train)-model.predict(X_validation). print(confusion_matrix(Y_validation- predictions)). Decide on the metric or score to rate movies on. The frist step of recommender Movies_based ssystem is. pyplot.boxplot(results- labels=names). SVC(gamma=-auto-)- model.fit(X_train- Y_train)-model.predict(X_validation). print(confusion_matrix(Y_validation- predictions)). Decide on the metric or score to rate movies on. The second step of recommender Movies_based ssystem is. Calculate the score for every movie. Sort the movies based on the score and output the top results. metadata[-vote_count-].quantile(0.90). metadata.copy().loc[metadata[-vote_count-] >= m]. To calculate the minimum number of votes required to be inthe chart. Calculate the score for every movie. Sort the movies based on the score and output the top results. metadata[-vote_count-].quantile(0.90). metadata.copy().loc[metadata[-vote_count-] >= m]. To filter out all qualified movies into a new DataFrame. Calculate the score for every movie. Sort the movies based on the score and output the top results. metadata[-vote_count-].quantile(0.90). metadata.copy().loc[metadata[-vote_count-] >= m]. To computes the weighted rating of each movie based on IMDB formula. return (v/(v+m) * R) + (m/(m+v) * C). q_movies.apply(weighted_rating- axis=1). q_movies.sort_values(-score-- ascending=False). q_movies[[-title-- -vote_count-- -vote_average-- -score-]].head(20). To Define a new feature score and calculate its value with weighted_rating(). return (v/(v+m) * R) + (m/(m+v) * C). q_movies.apply(weighted_rating- axis=1). q_movies.sort_values(-score-- ascending=False). q_movies[[-title-- -vote_count-- -vote_average-- -score-]].head(20). To sort movies based on score calculated above. return (v/(v+m) * R) + (m/(m+v) * C). q_movies.apply(weighted_rating- axis=1). q_movies.sort_values(-score-- ascending=False). q_movies[[-title-- -vote_count-- -vote_average-- -score-]].head(20). To Print the top movies. return (v/(v+m) * R) + (m/(m+v) * C). q_movies.apply(weighted_rating- axis=1). q_movies.sort_values(-score-- ascending=False). q_movies[[-title-- -vote_count-- -vote_average-- -score-]].head(20). To replace NaN with an empty string. metadata[-overview-].fillna(--). tfidf.fit_transform(metadata[-overview-]). linear_kernel(tfidf_matrix- tfidf_matrix). ist(enumerate(cosine_sim[idx])). To construct the required TF-IDF matrix by fitting and transforming the data. metadata[-overview-].fillna(--). tfidf.fit_transform(metadata[-overview-]). linear_kernel(tfidf_matrix- tfidf_matrix). ist(enumerate(cosine_sim[idx])). To compute the cosine similarity. metadata[-overview-].fillna(--). tfidf.fit_transform(metadata[-overview-]). linear_kernel(tfidf_matrix- tfidf_matrix). ist(enumerate(cosine_sim[idx])). To get the pairwsie similarity scores of all movies with that movie. metadata[-overview-].fillna(--). tfidf.fit_transform(metadata[-overview-]). linear_kernel(tfidf_matrix- tfidf_matrix). ist(enumerate(cosine_sim[idx])). To parse the stringified features into their corresponding python objects- you can use. from ast import literal_eval. metadata[feature].apply(clean_data). from sklearn.feature_extraction.text. from sklearn.metrics.pairwise. To convert all strings to lower case and strip names of spaces. from ast import literal_eval. metadata[feature].apply(clean_data). from sklearn.feature_extraction.text. from sklearn.metrics.pairwise. To import CountVectorizer- you can use. from ast import literal_eval. metadata[feature].apply(clean_data). from sklearn.feature_extraction.text. from sklearn.metrics.pairwise. To import cosine_similarity- you can use. from ast import literal_eval. metadata[feature].apply(clean_data). from sklearn.feature_extraction.text. from sklearn.metrics.pairwise. To carry out process discovery- the dataset must contain the following. Case ID- Event- and Timestamp. alpha_miner- inductive_miner- heuristics_miner- and dfg_discovery. petrinet- process_tree- heuristics_net- and dfg. pn_visualizer.apply(net- initial_marking- final_markingparameters= parameter-svariant= pn_visualizer.Variants.FREQUENCY- log=log). From pm4py.algo.discovery- you can import. Case ID- Event- and Timestamp. alpha_miner- inductive_miner- heuristics_miner- and dfg_discovery. petrinet- process_tree- heuristics_net- and dfg. pn_visualizer.apply(net- initial_marking- final_markingparameters= parameter-svariant= pn_visualizer.Variants.FREQUENCY- log=log). From pm4py.visualization- you can import. Case ID- Event- and Timestamp. alpha_miner- inductive_miner- heuristics_miner- and dfg_discovery. petrinet- process_tree- heuristics_net- and dfg. pn_visualizer.apply(net- initial_marking- final_markingparameters= parameter-svariant= pn_visualizer.Variants.FREQUENCY- log=log). To add information about frequency to the viz. Case ID- Event- and Timestamp. alpha_miner- inductive_miner- heuristics_miner- and dfg_discovery. petrinet- process_tree- heuristics_net- and dfg. pn_visualizer.apply(net- initial_marking- final_markingparameters= parameter-svariant= pn_visualizer.Variants.FREQUENCY- log=log). To creating the graph. dfg_discovery.apply(logvariant= dfg_discovery.Variants.PERFORMANCE). Does not guarantee that the discovered model will be sound. It is an improvement of both the Alpha Miner and Heuristics Miner. inductive_miner.apply(log). Alpha Miner algorithm has characteristics- on of them is. dfg_discovery.apply(logvariant= dfg_discovery.Variants.PERFORMANCE). Does not guarantee that the discovered model will be sound. It is an improvement of both the Alpha Miner and Heuristics Miner. inductive_miner.apply(log). Inductive Miner. dfg_discovery.apply(logvariant= dfg_discovery.Variants.PERFORMANCE). Does not guarantee that the discovered model will be sound. It is an improvement of both the Alpha Miner and Heuristics Miner. inductive_miner.apply(log). To create a petri net from scratch. dfg_discovery.apply(logvariant= dfg_discovery.Variants.PERFORMANCE). Does not guarantee that the discovered model will be sound. It is an improvement of both the Alpha Miner and Heuristics Miner. inductive_miner.apply(log). It is one of the most popular methods for topic modelling. Latent Dirichlet Allocation. Assign a topic randomly to each word in every document. Iterate through each word in all the documents. Proportion of the assignments to the topic “t” over all documents for this word. The frist step of how does LDA work is. Latent Dirichlet Allocation. Assign a topic randomly to each word in every document. Iterate through each word in all the documents. Proportion of the assignments to the topic “t” over all documents for this word. The second step of how does LDA work is. Latent Dirichlet Allocation. Assign a topic randomly to each word in every document. Iterate through each word in all the documents. Proportion of the assignments to the topic “t” over all documents for this word. The third step of how does LDA work is. Latent Dirichlet Allocation. Assign a topic randomly to each word in every document. Iterate through each word in all the documents. Proportion of the assignments to the topic “t” over all documents for this word. The fourth step of how does LDA work is. Reassign a new topic for this word for which P(t/d) * P(w/t) is maximum. Repeat the above two steps until the topic assignments become stable. You will use Gensim library. gensim.models.ldamodel.LdaModel(data- num_topics=2-id2word=mapping- passes=15). The fifth step of how does LDA work is. Reassign a new topic for this word for which P(t/d) * P(w/t) is maximum. Repeat the above two steps until the topic assignments become stable. You will use Gensim library. gensim.models.ldamodel.LdaModel(data- num_topics=2-id2word=mapping- passes=15). To use LDAModel. Reassign a new topic for this word for which P(t/d) * P(w/t) is maximum. Repeat the above two steps until the topic assignments become stable. You will use Gensim library. gensim.models.ldamodel.LdaModel(data- num_topics=2-id2word=mapping- passes=15). To train LDA model. Reassign a new topic for this word for which P(t/d) * P(w/t) is maximum. Repeat the above two steps until the topic assignments become stable. You will use Gensim library. gensim.models.ldamodel.LdaModel(data- num_topics=2-id2word=mapping- passes=15). To Distribute topics for the first document. print(ldamodel.get_document_topics(data[0])) [(0-0.8676003)- (1- 0.13239971)]. number of title words in sentence / number of words in document title. ?sim(Si- Sj? ) / max ?sim(Si- Sj?). number of numerical data in the sentence / length of sentence. Title Feature- you can use Extraction formula such as. print(ldamodel.get_document_topics(data[0])) [(0-0.8676003)- (1- 0.13239971)]. number of title words in sentence / number of words in document title. ?sim(Si- Sj? ) / max ?sim(Si- Sj?). number of numerical data in the sentence / length of sentence. Sentence to Sentence Similarity- you can use Extraction formula such as. print(ldamodel.get_document_topics(data[0])) [(0-0.8676003)- (1- 0.13239971)]. number of title words in sentence / number of words in document title. ?sim(Si- Sj? ) / max ?sim(Si- Sj?). number of numerical data in the sentence / length of sentence. Numerical Data- you can use Extraction formula such as. print(ldamodel.get_document_topics(data[0])) [(0-0.8676003)- (1- 0.13239971)]. number of title words in sentence / number of words in document title. ?sim(Si- Sj? ) / max ?sim(Si- Sj?). number of numerical data in the sentence / length of sentence. Temporal Feature- you can use Extraction formula such as. number of temporal information in the sentence / length of sentence. number of words occurring in the sentence / number of words occurring in the longest sentence. number of proper nouns in the sentence / length of sentence. number of nouns and verbs in the sentence / length of sentence. Length of Sentence- you can use Extraction formula such as. number of temporal information in the sentence / length of sentence. number of words occurring in the sentence / number of words occurring in the longest sentence. number of proper nouns in the sentence / length of sentence. number of nouns and verbs in the sentence / length of sentence. Proper Noun- you can use Extraction formula such as. number of temporal information in the sentence / length of sentence. number of words occurring in the sentence / number of words occurring in the longest sentence. number of proper nouns in the sentence / length of sentence. number of nouns and verbs in the sentence / length of sentence. Number of Nouns and Verbs- you can use Extraction formula such as. number of temporal information in the sentence / length of sentence. number of words occurring in the sentence / number of words occurring in the longest sentence. number of proper nouns in the sentence / length of sentence. number of nouns and verbs in the sentence / length of sentence. Frequent Semantic Term- you can use Extraction formula such as. number of frequent terms in the sentence / max(number of frequentterms). Latent Dirichlet Allocation (LDA). Most documents will contain only a relatively small number of topics. An extractive summary. It is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. number of frequent terms in the sentence / max(number of frequentterms). Latent Dirichlet Allocation (LDA). Most documents will contain only a relatively small number of topics. An extractive summary. The LDA approach assumes that. number of frequent terms in the sentence / max(number of frequentterms). Latent Dirichlet Allocation (LDA). Most documents will contain only a relatively small number of topics. An extractive summary. It comprises of the original sentences which are selected from the input document. number of frequent terms in the sentence / max(number of frequentterms). Latent Dirichlet Allocation (LDA). Most documents will contain only a relatively small number of topics. An extractive summary. It contains sentences that have to be reconstructed using deep natural language analysis. An abstractive summary. It means a string value cannot be updated. It deals with large amount of text and bring it to a presentable format. binascii.b2a_uu(text). In Text Processing- String Immutability. An abstractive summary. It means a string value cannot be updated. It deals with large amount of text and bring it to a presentable format. binascii.b2a_uu(text). In Text Processing- Reformatting Paragraphs. An abstractive summary. It means a string value cannot be updated. It deals with large amount of text and bring it to a presentable format. binascii.b2a_uu(text). In Text Processing- Converting binary to ascii. An abstractive summary. It means a string value cannot be updated. It deals with large amount of text and bring it to a presentable format. binascii.b2a_uu(text). In Text Processing- Strings as Files. They have a file which has multiple lines and they those lines become individual elements. It the unique words present in the file. Extraction is achieved from a text file by using regular expression. Data objects can represent a dictionary data type or even a data object containing the JSON data. In Text Processing- Filter Duplicate Words. They have a file which has multiple lines and they those lines become individual elements. It the unique words present in the file. Extraction is achieved from a text file by using regular expression. Data objects can represent a dictionary data type or even a data object containing the JSON data. In Text Processing- Extract URL from Text. They have a file which has multiple lines and they those lines become individual elements. It the unique words present in the file. Extraction is achieved from a text file by using regular expression. Data objects can represent a dictionary data type or even a data object containing the JSON data. In Text Processing- Pretty Print Numbers. They have a file which has multiple lines and they those lines become individual elements. It the unique words present in the file. Extraction is achieved from a text file by using regular expression. Data objects can represent a dictionary data type or even a data object containing the JSON data. In Text Processing- Text Processing State Machine. It is a directed graph- consisting of a set of nodes and a set of transition functions. Splitting up a larger body of text into smaller lines- words or even creating words for a non-English language. They can safely be ignored without sacrificing the meaning of the sentence. In wordnet- the words that denote the same concept and are interchangeable in many contexts so that they are grouped into unordered sets (synsets). In Text Processing- Tokenization. It is a directed graph- consisting of a set of nodes and a set of transition functions. Splitting up a larger body of text into smaller lines- words or even creating words for a non-English language. They can safely be ignored without sacrificing the meaning of the sentence. In wordnet- the words that denote the same concept and are interchangeable in many contexts so that they are grouped into unordered sets (synsets). In Text Processing- Remove Stopwords. It is a directed graph- consisting of a set of nodes and a set of transition functions. Splitting up a larger body of text into smaller lines- words or even creating words for a non-English language. They can safely be ignored without sacrificing the meaning of the sentence. In wordnet- the words that denote the same concept and are interchangeable in many contexts so that they are grouped into unordered sets (synsets). In Text Processing- Synonyms and Antonyms. It is a directed graph- consisting of a set of nodes and a set of transition functions. Splitting up a larger body of text into smaller lines- words or even creating words for a non-English language. They can safely be ignored without sacrificing the meaning of the sentence. In wordnet- the words that denote the same concept and are interchangeable in many contexts so that they are grouped into unordered sets (synsets). In Text Processing- pyspellchecker. It provides us this feature to find the words that may have been mis-spelled and also suggest the possible corrections. You can use it as a reference for getting the meaning of words. It is a group presenting multiple collections of text documents. It is an essential feature of text processing where we tag the words into grammatical categorization. In Text Processing- WordNet Interface. It provides us this feature to find the words that may have been mis-spelled and also suggest the possible corrections. You can use it as a reference for getting the meaning of words. It is a group presenting multiple collections of text documents. It is an essential feature of text processing where we tag the words into grammatical categorization. In Text Processing- Corpora Access. It provides us this feature to find the words that may have been mis-spelled and also suggest the possible corrections. You can use it as a reference for getting the meaning of words. It is a group presenting multiple collections of text documents. It is an essential feature of text processing where we tag the words into grammatical categorization. In Text Processing- Tagging Words. It provides us this feature to find the words that may have been mis-spelled and also suggest the possible corrections. You can use it as a reference for getting the meaning of words. It is a group presenting multiple collections of text documents. It is an essential feature of text processing where we tag the words into grammatical categorization. In Text Processing- Chunking. It is the process of grouping similar words together based on the nature of the word. Grouping the text as a group of words rather than individual words. Some English words occur together more frequently. It is a format for delivering regularly changing web content. In Text Processing- Chunk Classification. It is the process of grouping similar words together based on the nature of the word. Grouping the text as a group of words rather than individual words. Some English words occur together more frequently. It is a format for delivering regularly changing web content. In Text Processing- Bigrams. It is the process of grouping similar words together based on the nature of the word. Grouping the text as a group of words rather than individual words. Some English words occur together more frequently. It is a format for delivering regularly changing web content. In Text Processing- Reading RSS feed. It is the process of grouping similar words together based on the nature of the word. Grouping the text as a group of words rather than individual words. Some English words occur together more frequently. It is a format for delivering regularly changing web content. In Text Processing- Sentiment. It is about analyzing the general opinion of the audience. It means cleaning up anything messy by transforming them. It involves generating a summary from a large body of text which somewhat describes the context of the large body of text. It comes across situation where two or more words have a common root. In Text Processing- Text Munging. It is about analyzing the general opinion of the audience. It means cleaning up anything messy by transforming them. It involves generating a summary from a large body of text which somewhat describes the context of the large body of text. It comes across situation where two or more words have a common root. In Text Processing- Text Summarization. It is about analyzing the general opinion of the audience. It means cleaning up anything messy by transforming them. It involves generating a summary from a large body of text which somewhat describes the context of the large body of text. It comes across situation where two or more words have a common root. In Text Processing- Stemming Algorithms. It is about analyzing the general opinion of the audience. It means cleaning up anything messy by transforming them. It involves generating a summary from a large body of text which somewhat describes the context of the large body of text. It comes across situation where two or more words have a common root. To get a quick idea of how many instances (rows) and how many attributes (columns) the data contains with the shape property Means That print(dataset.shape). TRUE. FALSE. a good idea to actually eyeball your data. Means That print(dataset.head(20)). TRUE. FALSE. To include the count- mean- the min and max values as well as some percentiles. Means That print(dataset.describe()). TRUE. FALSE. To look at the number of instances (rows) that belong to each class Means That print(dataset.groupby(-class-).size()). TRUE. FALSE. To create a validation Dataset Means That X_train-X_validation- Y_train- Y_validation = train_test_split(X- ytest_size=0.20- random_state=1). TRUE. FALSE. One of the algorithms to classify iris flowers is Means That X_train- X_validation- Y_train- Y_validation = train_test_split(Xy- test_size=0.20- random_state=1). TRUE. FALSE. They are simple linear algorithms. Means That Support Vector Machines (SVM). TRUE. FALSE. They are simple nonlinear algorithms. Means That KNN- CARTNB and SVM. TRUE. FALSE. To compare algorithms Means That KNN- CART- NB and SVM. TRUE. FALSE. To Make predictions on validation dataset you can use Means That SVC(gamma=-auto-)- model.fit(X_train- Y_train)- model.predict(X_validation). TRUE. FALSE. To evaluate predictions Means That SVC(gamma=-auto-)-model.fit(X_train- Y_train)- model.predict(X_validation). TRUE. FALSE. The frist step of recommender Movies_based ssystem is Means That print(confusion_matrix(Y_validation- predictions)). TRUE. FALSE. The second step of recommender Movies_based system is Means That Decide on the metric or score to rate movies on. TRUE. FALSE. The third step of recommender Movies_based system is Means That Sort the movies based on the score and output the top results. TRUE. FALSE. To calculate the minimum number of votes required to be in the chart Means That Sort the movies based on the score and output the top results. TRUE. FALSE. To filter out all qualified movies into a new DataFrame Means That metadata.copy().loc[metadata[-vote_count-] >= m]. TRUE. FALSE. To computes the weighted rating of each movie based on IMDB formula Means That metadata.copy().loc[metadata[- vote_count-] >= m]. TRUE. FALSE. To Define a new feature score and calculate its value with weighted_rating() Means That return (v/(v+m) * R) + (m/(m+v)* C). TRUE. FALSE. To sort movies based on score calculated above Means That q_movies.apply(weighted_rating- axis=1). TRUE. FALSE. To Print the top movies Means That q_movies[[-title-- -vote_count-- -vote_average-- -score-]].head(20). TRUE. FALSE. To replace NaN with an empty string Means That metadata[-overview-].fillna(--). TRUE. FALSE. To construct the required TF-IDF matrix by fitting and transforming the data Means That metadata[-overview-].fillna(--). TRUE. FALSE. To compute the cosine similarity Means That linear_kernel(tfidf_matrix- tfidf_matrix). TRUE. FALSE. To get the pairwsie similarity scores of all movies with that movie Means That linear_kernel(tfidf_matrix- tfidf_matrix). TRUE. FALSE. To parse the stringified features into their corresponding python objects- you can use Means That ist(enumerate(cosine_sim[idx])). TRUE. FALSE. To convert all strings to lower case and strip names of spaces Means That metadata[feature].apply(clean_data). TRUE. FALSE. To import CountVectorizer- you can use Means That from sklearn.feature_extraction.text. TRUE. FALSE. To import cosine_similarity- you can use Means That from sklearn.feature_extraction.text. TRUE. FALSE. To carry out process discovery- the dataset must contain the following Means That Case ID- Event- and Timestamp. TRUE. FALSE. From pm4py.algo.discovery- you can import Means That Case ID- Event- and Timestamp. TRUE. FALSE. From pm4py.visualization- you can import Means That alpha_miner- inductive_miner- heuristics_miner- and dfg_discovery. TRUE. FALSE. To add information about frequency to the viz Means That pn_visualizer.apply(net- initial_marking- final_markingparameters= parametersvariant=pn_visualizer.Variants.FREQUENCY- log=log). TRUE. FALSE. To creatig the graph Means That dfg_discovery.apply(logvariant=dfg_discovery.Variants.PERFORMANCE). TRUE. FALSE. Alpha Miner algorithm has characteristics- on of them is Means That dfg_discovery.apply(logvariant= dfg_discovery.Variants.PERFORMANCE). TRUE. FALSE. Inductive Miner Means That It is an improvement of both the Alpha Miner and Heuristics Miner. TRUE. FALSE. To create a petri net from scratch Means That It is an improvement of both the Alpha Miner and Heuristics Miner. TRUE. FALSE. It is one of the most popular methods for topic modelling Means That inductive_miner.apply(log). TRUE. FALSE. The first step of how does LDA work is Means That Assign a topic randomly to each word in every document. TRUE. FALSE. The second step of how does LDA work is Means That Iterate through each word in all the documents. TRUE. FALSE. The third step of how does LDA work is Means That Iterate through each word in all the documents. TRUE. FALSE. The fourth step of how does LDA work is Means That Reassign a new topic for this word for which P(t/d) * P(w/t) is maximum. TRUE. FALSE. The fifth step of how does LDA work is Means That Reassign a new topic for this word for which P(t/d) * P(w/t) is maximum. TRUE. FALSE. To use LDAModel Means That You will use Gensim library. TRUE. FALSE. To train LDA model Means That gensim.models.ldamodel.LdaModel(data- num_topics=2-id2word=mapping- passes=15). TRUE. FALSE. To Distribute topics for the first document Means That print(ldamodel.get_document_topics(data[0])) [(0-0.8676003)- (1- 0.13239971)]. TRUE. FALSE. Title Feature- you can use Extraction formula such as Means That print(ldamodel.get_document_topics(data[0])) [(0-0.8676003)- (1- 0.13239971)]. TRUE. FALSE. Sentence to Sentence Similarity- you can use Extraction formula such as Means That number of title words in sentence/ number of words in document title. TRUE. FALSE. Numerical Data- you can use Extraction formula such as Means That number of numerical data in the sentence / length of sentence. TRUE. FALSE. Temporal Feature- you can use Extraction formula such as Means That number of numerical data in the sentence / length of sentence. TRUE. FALSE. Length of Sentence- you can use Extraction formula such as Means That number of temporal information in the sentence /length of sentence. TRUE. FALSE. Proper Noun- you can use Extraction formula such as Means That number of proper nouns in the sentence / length of sentence. TRUE. FALSE. Number of Nouns and Verbs- you can use Extraction formula such as Means That number of nouns and verbs in the sentence / length of sentence. TRUE. FALSE. Frequent Semantic Term- you can use Extraction formula such as Means That number of frequent terms in the sentence / max(number of frequentterms). TRUE. FALSE. It is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. Means That Latent Dirichlet Allocation (LDA). TRUE. FALSE. The LDA approach assumes that Means That Latent Dirichlet Allocation (LDA). TRUE. FALSE. It comprises of the original sentences which are selected from the input document. Means That Most documents will contain only a relatively small number of topics. TRUE. FALSE. It contains sentences that have to be reconstructed using deep natural language analysis. Means That An abstractive summary. TRUE. FALSE. In Text Processing- String Immutability Means That An abstractive summary. TRUE. FALSE. In Text Processing- Reformatting Paragraphs Means That It deals with large amount of text and bring it to a presentable format. TRUE. FALSE. In Text Processing- Converting binary to ascii Means That It deals with large amount of text and bring it to a presentable format. TRUE. FALSE. In Text Processing- Strings as Files Means That They have a file which has multiple lines and they those lines become individual elements. TRUE. FALSE. In Text Processing- Filter Duplicate Words Means That It the unique words present in the file. TRUE. FALSE. In Text Processing- Extract URL from Text Means That It the unique words present in the file. TRUE. FALSE. In Text Processing- Pretty Print Numbers Means That Extraction is achieved from a text file by using regular expression. TRUE. FALSE. In Text Processing- Text Processing State Machine Means That Data objects can represent a dictionary data type or even a data object containing the JSON data. TRUE. FALSE. In Text Processing- Tokenization Means That It is a directed graph- consisting of a set of nodes and a set of transition functions. TRUE. FALSE. In Text Processing- Remove Stopwords Means That Splitting up a larger body of text into smaller lines- words or even creating words for a non-English language. TRUE. FALSE. In Text Processing- Synonyms and Antonyms Means That They can safely be ignored without sacrificing the meaning of the sentence. TRUE. FALSE. In Text Processing- pyspellchecker Means That It provides us this feature to find the words that may have been mis-spelled and also suggest the possible corrections. TRUE. FALSE. In Text Processing- WordNet Interface Means That It provides us this feature to find the words that may have been misspelled and also suggest the possible corrections. TRUE. FALSE. In Text Processing- Corpora Access Means That You can use it as a reference for getting the meaning of words. TRUE. FALSE. In Text Processing- Tagging Words Means That It is an essential feature of text processing where we tag the words into grammatical categorization. TRUE. FALSE. In Text Processing- Chunking Means That It is the process of grouping similar words together based on the nature of the word. TRUE. FALSE. In Text Processing- Chunk Classification Means That It is the process of grouping similar words together based on the nature of the word. TRUE. FALSE. In Text Processing- Bigrams Means That Grouping the text as a group of words rather than individual words. TRUE. FALSE. In Text Processing- Reading RSS feed Means That Some English words occur together more frequently. TRUE. FALSE. In Text Processing- Sentiment Means That It is about analyzing the general opinion of the audience. TRUE. FALSE. In Text Processing- Text Munging Means That It is about analysing the general opinion of the audience. TRUE. FALSE. In Text Processing- Text Summarization Means That It means cleaning up anything messy by transforming them. TRUE. FALSE. In Text Processing- Stemming Algorithms Means That It involves generating a summary from a large body of text which somewhat describes the context of the large body of text. TRUE. FALSE. |