Question 1

Text mining is important to competitive advantage because knowledge is power,and knowledge is derived from text data sources.

Accepted Answer

Text mining allows organizations to extract valuable insights and knowledge from large amounts of text data sources such as customer feedback, social media, and website content. These insights can provide a competitive advantage by enabling organizations to identify trends, patterns, and opportunities that can inform business decisions and improve overall performance. Therefore, the statement that knowledge is power and that knowledge is derived from text data sources is true, making text mining important for gaining a competitive advantage.

Question 2

Web pages consisting of unstructured textual data coded in HTML or XML,hyperlink information,and logs of visitors' interactions provide rich data for effective and efficient knowledge discovery:

Accepted Answer

False

Question 3

Amazon.com leverages Web usage history usage dynamically and recognizes the user by reading a cookie written by a Web site on the visitor's computer.

Accepted Answer

Amazon.com does use web usage history and cookies to recognize users and personalize their experience on the website.

Question 4

The quality of search results is impossible to measure accurately using strictly quantitative measures such as click-through rate,abandonment,and search frequency.Additional quantitative and qualitative measures are required.

Accepted Answer

The statement is true. While click-through rate, abandonment, and search frequency are valuable metrics to consider when evaluating search results, they do not provide a comprehensive view of search quality. Other measures such as relevance, accuracy, and user satisfaction must also be taken into account.

Question 5

The benefits of text mining are greatest in areas where very large amounts of textual data are being generated,such as law,academic research,finance,and medicine.

Accepted Answer

Text mining is most beneficial in areas where there is a large amount of textual data, such as law, academic research, finance, and medicine. Therefore, the statement is true.

Question 6

Text mining can be used to increase cross-selling and up-selling by analyzing the unstructured data generated by call centers.

Accepted Answer

Text mining can be used to analyze customer feedback and conversations that take place in call centers to identify patterns and insights that can help companies make more effective cross-selling and up-selling recommendations to customers.

Question 7

The purpose and processes of text mining are different from those of data mining because with text mining the input to the process are data files such as Word documents,PDF files,text excerpts,and XML files.

Accepted Answer

Text mining and data mining both aim to discover patterns and extract useful information from data; the main difference lies in the type of data they process, not in the fundamental purpose or processes. Text mining specifically deals with unstructured textual data, while data mining can handle various data types, including structured and unstructured data.

Question 8

Unstructured data has a predetermined format.It is usually organized into records as categorical,ordinal,and continuous variables and stored in databases.

Accepted Answer

Unstructured data does not have a predetermined format and is not organized into records. It often includes text, images, videos, and other types of data that do not fit neatly into a database structure.

Question 9

Two advantages associated with the implementation of NLP are word sense disambiguation and syntactic ambiguity.

Accepted Answer

Word sense disambiguation is an advantage of NLP, but syntactic ambiguity is a challenge rather than an advantage.

Question 10

Stemming is the process of reducing inflected words to their base or root form.

Accepted Answer

Stemming is indeed the process of reducing inflected words to their base or root form. This is commonly used in natural language processing and text analysis to improve the accuracy of searches and text classification.

Question 11

The main categories of knowledge extraction methods are recall,search,and signaling.

Accepted Answer

The answer of The main categories of knowledge extraction methods...

Question 12

Customer experience management applications gather and report direct feedback from site visitors by benchmarking against other sites and offline channels,and by supporting predictive modeling of future visitor behavior.

Accepted Answer

The answer of Customer experience management applications gather and report...

Question 13

DARPA and MITRE teamed up to develop capabilities to automatically filter text-based information sources to generate actionable information in a timely manner.

Accepted Answer

The answer of DARPA and MITRE teamed up to develop...

Question 14

By applying a learning algorithm to parsed text,researchers from Stanford University's NLP lab have
developed methods that can automatically identify the concepts and relationships between those concepts in the text.

Accepted Answer

The answer of By applying a learning algorithm to parsed...

Question 15

Web crawlers are Web content mining tools that are used to read through the content of a Web site automatically.

Accepted Answer

The answer of Web crawlers are Web content mining tools...

Question 16

The goal of natural language processing (NLP)is syntax-driven text manipulation.

Accepted Answer

The answer of The goal of natural language processing (NLP)is...

Question 17

The corpus for the text mining process consists of organized ACII text files and structured data.

Accepted Answer

The answer of The corpus for the text mining process...

Question 18

Stop words,such as a,am,the,and was,are words that are filtered out prior to or after processing of natural language data.

Accepted Answer

The answer of Stop words,such as a,am,the,and was,are words that...

Question 19

A vast majority of all business data are captured and stored in structured text documents.

Accepted Answer

The answer of A vast majority of all business data...

Question 20

Compared to polygraphs for deception-detection,text-based deception detection has the advantages of being nonintrusive and widely applicable to textual data and transcriptions of voice recordings.

Accepted Answer

The answer of Compared to polygraphs for deception-detection,text-based deception detection...

Question 21

Which of the following is not one of the three main areas of Web mining?&#10;A) Web search mining&#10;B) Web content mining&#10;C) Web structure mining&#10;D) Web usage mining

Accepted Answer

The answer of Which of the following is not one...

Question 22

The two main approaches to text classification are ________ and ________.&#10;A) knowledge engineering; machine learning&#10;B) categorization; clustering&#10;C) association; trend analysis&#10;D) knowledge extraction;

Accepted Answer

The answer of The two main approaches to text classification...

Question 23

All of the following are popular application areas of text mining except:&#10;A) information extraction&#10;B) document summarization&#10;C) question answering&#10;D) data structuring

Accepted Answer

The answer of All of the following are popular application...

Question 24

________ is a branch of the field of linguistics and a part of natural language processing that studies the internal structure of words.&#10;A) Morphology&#10;B) Corpus&#10;C) Stemming&#10;D) Polysemes

Accepted Answer

The answer of ________ is a branch of the field...

Question 25

Using ________ as a rich source of knowledge and a strategic weapon,Kodak not only survives but excels in its market segment defined by innovation and constant change.&#10;A) visualization&#10;B) deception detection&#10;C) patent analysis&#10;D) semantic cues

Accepted Answer

The answer of Using ________ as a rich source of...

Question 26

Which of the following correctly defines a text mining term?&#10;A) Tagging is the number of times a word is found in a specific document.&#10;B) A token is an uncategorized block of text in a sentence.&#10;C) Rooting is the process of reducing inflected words to their base form.&#10;D) A term is a single word or multiword phrase extracted directly from the corpus by means of NLP methods.

Accepted Answer

The answer of Which of the following correctly defines a...

Question 27

At a very high level,the text mining process consists of each of the following tasks except:&#10;A) Create log frequencies.&#10;B) Establish the corpus.&#10;C) Create the term-document matrix.&#10;D) Extract the knowledge.

Accepted Answer

The answer of At a very high level,the text mining...

Question 28

A simple keyword-based search engine suffers from several deficiencies,which include all of the following except:&#10;A) A topic of any breath can easily contain hundreds or thousands of documents.&#10;B) Many documents that are highly relevant to a topic may not contain the exact keywords defining them.&#10;C) Web mining can identify authoritative Web pages.&#10;D) Many of the search results are marginally or not relevant to the topic.

Accepted Answer

The answer of A simple keyword-based search engine suffers from...

Question 29

Forward-thinking companies like Ask.com,Scholastic,and St.John Health System are actively using Web mining systems to answer important questions of "Who?" "Why?" and "How?" The benefits of integrating these systems:

A) are measured qualitatively in terms of customer satisfaction, but not measured using financial or other quantitative measure.
B) can be significant in terms of incremental financial growth and increasing customer loyalty and satisfaction.
C) have not yet outweighed the costs of the Web mining systems and analysis.
D) can be infinitely measurable.

Accepted Answer

The answer of Forward-thinking companies like Ask.com,Scholastic,and St.John Health System...

Question 30

It has been shown that the bag-of-word method may not produce good enough information content for text mining tasks.More advanced techniques such as ________ are needed.&#10;A) classification&#10;B) natural language processing&#10;C) evidence-based processing&#10;D) symbolic processing

Accepted Answer

The answer of It has been shown that the bag-of-word...

Question 31

A ________ is one or more Web pages that provide a collection of links to authoritative pages,reference sites,or a resource list on a specific topic.&#10;A) hub&#10;B) hyperlink-induced topic search&#10;C) spoke&#10;D) community

Accepted Answer

The answer of A ________ is one or more Web...

Question 32

A vast majority of business data are stored in text documents that are ________.&#10;A) mostly quantitative&#10;B) virtually unstructured&#10;C) semi-structured&#10;D) highly structured

Accepted Answer

The answer of A vast majority of business data are...

Question 33

Why will computers probably not be able to understand natural language the same way and with the same accuracy that humans do?&#10;A) A true understanding of meaning requires extensive knowledge of a topic beyond what is in the words, sentences, and paragraphs.&#10;B) The natural human language is too specific.&#10;C) The part of speech depends only on the definition and not on the context within which it is used.&#10;D) All of the above.

Accepted Answer

The answer of Why will computers probably not be able...

Question 34

Text mining is the semi-automated process of extracting ________ from large amounts of unstructured data sources.&#10;A) patterns&#10;B) useful information&#10;C) knowledge&#10;D) all of the above

Accepted Answer

The answer of Text mining is the semi-automated process of...

Question 35

Commercial software tools include all of the following except:&#10;A) GATE&#10;B) IBM Intelligent Miner Data Mining Suite&#10;C) SAS Text Miner&#10;D) SPSS Text Mining

Accepted Answer

The answer of Commercial software tools include all of the...

Question 36

Which of the following refers to developing useful information from the links included in the Web documents?&#10;A) Web content mining&#10;B) Web subject mining&#10;C) Web structure mining&#10;D) Web matter mining

Accepted Answer

The answer of Which of the following refers to developing...

Question 37

In ________,the problem is to group an unlabelled collection of objects,such as documents,customer comments,and Web pages into meaningful groups without any prior knowledge.&#10;A) search recall&#10;B) classification&#10;C) clustering&#10;D) grouping

Accepted Answer

The answer of In ________,the problem is to group an...

Question 38

When registered users revisit Amazon.com,they are greeted by name.This task involves recognizing the user by ________.&#10;A) pattern discovery&#10;B) association&#10;C) text mining&#10;D) reading a cookie

Accepted Answer

The answer of When registered users revisit Amazon.com,they are greeted...

Question 39

All of the following are types of data generated through Web page visits except:&#10;A) data stored in server access logs, referrer logs, agent logs, and client-side cookies&#10;B) user profiles&#10;C) hyperlink analysis&#10;D) metadata, such as page attributes, content attributes, and usage data

Accepted Answer

The answer of All of the following are types of...

Question 40

Why does the Web pose great challenges for effective and efficient knowledge discovery?&#10;A) The Web search engines are indexed-based.&#10;B) The Web is too dynamic.&#10;C) The Web is too specific to a domain.&#10;D) The Web infrastructure contains hyperlink information.

Accepted Answer

The answer of Why does the Web pose great challenges...

Question 41

________ is the grouping of similar documents without having a predefined set of categories.

Accepted Answer

The answer of ________ is the grouping of similar documents...

Question 42

________ mining is the extraction of useful information from data generated through Web page visits and transactions.

Accepted Answer

The answer of ________ mining is the extraction of useful...

Question 43

The ________ model,which is one where multiple sources of data describing the same population are integrated to increase the depth and richness of the resulting analysis,forms the framework of the Web site optimization ecosystem

Accepted Answer

The answer of The ________ model,which is one where multiple...

Question 44

________ is the semi-automated process of extracting patterns from large amounts of unstructured data sources.

Accepted Answer

The answer of ________ is the semi-automated process of extracting...

Question 45

________ words or noise words are words that are filtered out prior to or after processing of natural language data.

Accepted Answer

The answer of ________ words or noise words are words...

Question 46

In linguistics,a(n)________ is a large and structured set of texts prepared for the purpose of conducting knowledge discovery.

Accepted Answer

The answer of In linguistics,a(n)________ is a large and structured...

Question 47

At a very high level,the first of three consecutive tasks in the text mining process is to establish the ________,which is a list of organized documents.

Accepted Answer

The answer of At a very high level,the first of...

Question 48

________ is a technique used to detect favorable and unfavorable opinions toward specific products and services using textual data sources,such as customer feedback in Web postings and the detection of unfavorable rumors.

Accepted Answer

The answer of ________ is a technique used to detect...

Question 49

The term &#34;stop-words&#34; are used by text mining to ________ commonly used words.

Accepted Answer

The answer of The term &#34;stop-words&#34; are used by text...

Question 50

In the text mining process,the output of task two is a flat file called a ________ where the cells are populated with the term frequencies.

Accepted Answer

The answer of In the text mining process,the output of...

Question 51

________ applications focus on &#34;who and how&#34; questions by gathering and reporting direct feedback from site visitors,by benchmarking against other sites and offline channels,and by supporting predictive modeling of future visitor behavior.

Accepted Answer

The answer of ________ applications focus on &#34;who and how&#34;...

Question 52

________ is the process of identifying valid,novel,potentially useful,and ultimately understandable patterns in data stored in structured databases,where the data are organized in records structured by categorical,ordinal,or continuous variables.

Accepted Answer

The answer of ________ is the process of identifying valid,novel,potentially...

Question 53

One of the main approaches to text classification is ________ in which an expert's knowledge is encoded into the system either declaratively or in the form of procedural classification rules.

Accepted Answer

The answer of One of the main approaches to text...

Question 54

A(n)________ is one or more Web pages that provide a collection of links to authoritative pages.

Accepted Answer

The answer of A(n)________ is one or more Web pages...

Question 55

________ is the discovery and analysis of interesting and useful information from the Web,about the Web,and usually though Web-based tools.

Accepted Answer

The answer of ________ is the discovery and analysis of...

Question 56

________ is the process of reducing inflected words to their base or root form.

Accepted Answer

The answer of ________ is the process of reducing inflected...

Question 57

Analysis of the information collected by Web servers can help better understand user behavior.Analysis of this data is called ________ analysis.

Accepted Answer

The answer of Analysis of the information collected by Web...

Question 58

________ is an important component of text mining and is a subfield of artificial intelligence and computational linguistics.It studies the problem of understanding the natural human language.

Accepted Answer

The answer of ________ is an important component of text...

Question 59

Web analytics,CEM,and VOC applications form the foundation of the Web site ________ ecosystem that supports the online business's ability to positively influence desired outcomes

Accepted Answer

The answer of Web analytics,CEM,and VOC applications form the foundation...

Question 60

________ mining is the process of extracting useful information from the links embedded in Web documents.

Accepted Answer

The answer of ________ mining is the process of extracting...

Question 61

What are three of the challenges for effective and efficient knowledge discovery posed by the Web?

Accepted Answer

The answer of What are three of the challenges for...

Question 62

What is the primary purpose of text mining within the context of knowledge discovery?

Accepted Answer

The answer of What is the primary purpose of text...

Question 63

Compare and contrast text mining and data mining.

Accepted Answer

The answer of Compare and contrast text mining and data...

Question 64

List two options for managing or reducing the dimensionality (size)of the term-document matrix (TDM).

Accepted Answer

The answer of List two options for managing or reducing...

Deck 7: Text and Web Mining