Deck 5: Text and Web Mining

Full screen (f)
exit full mode
Question
Two advantages associated with the implementation of NLP are word sense disambiguation and syntactic ambiguity.
Use Space or
up arrow
down arrow
to flip the card.
Question
A vast majority of business data is captured and stored in text documents that are structured.
Question
The main purpose of establishing the corpus is to collect all of the documents related to the context being studied.
Question
Stemming is the process of reducing inflected words to their base or root form.
Question
By applying a learning algorithm to parsed text, researchers from Stanford University's NLP lab have
developed methods that can automatically identify the concepts and relationships between those concepts in the text.
Question
DARPA and MITRE teamed up to develop capabilities to automatically filter text-based information sources to generate actionable information in a timely manner.
Question
Amazon.com leverages Web usage history dynamically and recognizes the user by reading a cookie written by a Web site on the visitor's computer.
Question
Customer experience management applications gather and report direct feedback from site visitors by benchmarking against other sites and offline channels, and by supporting predictive modeling of future visitor behavior.
Question
Text mining is important to competitive advantage because knowledge is power, and knowledge is derived from text data sources.
Question
Web crawlers are Web content mining tools that are used to read through the content of a Web site automatically.
Question
The purpose and processes of text mining are different from those of data mining because with text mining the input to the process are data files such as Word documents, PDF files, text excerpts, and XML files.
Question
The main categories of knowledge extraction methods are recall, search, and signaling.
Question
The goal of natural language processing (NLP) is syntax-driven text manipulation.
Question
The quality of search results is impossible to measure accurately using strictly quantitative measures such as click-through rate, abandonment, and search frequency. Additional quantitative and qualitative measures are required.
Question
The benefits of text mining are greatest in areas where very large amounts of textual data are being generated, such as law, academic research, finance, and medicine.
Question
Unstructured data has a predetermined format. It is usually organized into records as categorical, ordinal, and continuous variables and stored in databases.
Question
Web pages consisting of unstructured textual data coded in HTML and logs of visitors' interactions provide rich data that can easily provide effective and efficient knowledge discovery.
Question
Text mining can be used to increase cross-selling and up-selling by analyzing the unstructured data generated by call centers.
Question
Stop words, such as a, am, the, and was, are words that are filtered out prior to or after processing of natural language data.
Question
Compared to polygraphs for deception-detection, text-based deception detection has the advantages of being nonintrusive and widely applicable to textual data and transcriptions of voice recordings.
Question
In ________, the problem is to group an unlabelled collection of objects, such as documents, customer comments, and Web pages into meaningful groups without any prior knowledge.

A) search recall
B) classification
C) clustering
D) grouping
Question
Which of the following correctly defines a text mining term?

A) Tagging is the number of times a word is found in a specific document.
B) A token is an uncategorized block of text in a sentence.
C) Rooting is the process of reducing inflected words to their base form.
D) A term is a single word or multiword phrase extracted directly from the corpus by means of NLP methods.
Question
Commercial software tools include all of the following except:

A) GATE
B) IBM Intelligent Miner Data Mining Suite
C) SAS Text Miner
D) SPSS Text Mining
Question
Forward-thinking companies like Ask.com, Scholastic, and St. John Health System are actively using Web mining systems to answer important questions of "Who?" "Why?" and "How?" The benefits of integrating these systems:

A) are measured qualitatively in terms of customer satisfaction, but not measured using financial or other quantitative measure.
B) can be significant in terms of incremental financial growth and increasing customer loyalty and satisfaction.
C) have not yet outweighed the costs of the Web mining systems and analysis.
D) can be infinitely measurable.
Question
________ is a branch of the field of linguistics and a part of natural language processing that studies the internal structure of words.

A) Morphology
B) Corpus
C) Stemming
D) Polysemes
Question
A ________ is one or more Web pages that provide a collection of links to authoritative pages, reference sites, or a resource list on a specific topic.

A) hub
B) hyperlink-induced topic search
C) spoke
D) community
Question
A simple keyword-based search engine suffers from several deficiencies, which include all of the following except:

A) a topic of any breath can easily contain hundreds or thousands of documents
B) many documents that are highly relevant to a topic may not contain the exact keywords defining them
C) web mining can identify authoritative Web pages
D) many of the search results are marginally or not relevant to the topic
Question
Which of the following refers to developing useful information from the links included in the Web documents?

A) Web content mining
B) Web subject mining
C) Web structure mining
D) Web matter mining
Question
Using ________ as a rich source of knowledge and a strategic weapon, Kodak not only survives but excels in its market segment defined by innovation and constant change.

A) visualization
B) deception detection
C) patent analysis
D) semantic cues
Question
It has been shown that the bag-of-word method may not produce good enough information content for text mining tasks. More advanced techniques such as ________ are needed.

A) classification
B) natural language processing
C) evidence-based processing
D) symbolic processing
Question
At a very high level, the text mining process consists of each of the following tasks except:

A) create log frequencies
B) establish the corpus
C) create the term-document matrix
D) extract the knowledge
Question
Why will computers probably not be able to understand natural language the same way and with the same accuracy that humans do?

A) A true understanding of meaning requires extensive knowledge of a topic beyond what is in the words, sentences, and paragraphs.
B) The natural human language is too specific.
C) The part of speech depends only on the definition and not on the context within which it is used.
D) All of the above.
Question
Which of the following is not one of the three main areas of Web mining?

A) Web search mining
B) Web content mining
C) Web structure mining
D) Web usage mining
Question
When registered users revisit Amazon.com, they are greeted by name. This task involves recognizing the user by ________.

A) pattern discovery
B) association
C) text mining
D) reading a cookie
Question
Text mining is the semi-automated process of extracting ________ from large amounts of unstructured data sources.

A) patterns
B) useful information
C) knowledge
D) all of the above
Question
The two main approaches to text classification are ________ and ________.

A) knowledge engineering; machine learning
B) categorization; clustering
C) association; trend analysis
D) knowledge extraction; association
Question
Why does the Web pose great challenges for effective and efficient knowledge discovery?

A) The Web search engines are indexed-based.
B) The Web is too dynamic.
C) The Web is too specific to a domain.
D) The Web infrastructure contains hyperlink information.
Question
A vast majority of business data are stored in text documents that are ________.

A) mostly quantitative
B) virtually unstructured
C) semi-structured
D) highly structured
Question
All of the following are popular application areas of text mining except:

A) information extraction
B) document summarization
C) question answering
D) data structuring
Question
All of the following are types of data generated through Web page visits except:

A) data stored in server access logs, referrer logs, agent logs, and client-side cookies
B) user profiles
C) hyperlink analysis
D) metadata, such as page attributes, content attributes, and usage data
Question
________ is an important component of text mining and is a subfield of artificial intelligence and computational linguistics. It studies the problem of understanding the natural human language.
Question
________ mining is the extraction of useful information from data generated through Web page visits and transactions.
Question
________ mining is the process of extracting useful information from the links embedded in Web documents.
Question
________ is the process of reducing inflected words to their base or root form.
Question
One of the main approaches to text classification is ________ in which an expert's knowledge is encoded into the system either declaratively or in the form of procedural classification rules.
Question
A(n) ________ is one or more Web pages that provide a collection of links to authoritative pages.
Question
Fundamental to the optimization process is ________, gathering data and information that can then be transformed into tangible analysis and recommendations for improvement using Web mining tools and techniques.
Question
The term "stop-words" are used by text mining to ________ commonly used words.
Question
At a very high level, the first of three consecutive tasks in the text mining process is to establish the ________, which is a list of organized documents.
Question
________ is the semi-automated process of extracting patterns from large amounts of unstructured data sources.
Question
In linguistics, a(n) ________ is a large and structured set of texts prepared for the purpose of conducting knowledge discovery.
Question
In the text mining process, the output of task two is a flat file called a ________ matrix where the cells are populated with the term frequencies.
Question
________ words or noise words are words that are filtered out prior to or after processing of natural language data.
Question
Web analytics, CEM, and VOC applications form the foundation of the Web site ________ ecosystem that supports the online business' ability to positively influence desired outcomes.
Question
The ________ model, which is one where multiple sources of data describing the same population are integrated to increase the depth and richness of the resulting analysis, forms the framework of the Web site optimization ecosystem.
Question
________ applications focus on "who and how" questions by gathering and reporting direct feedback from site visitors, by benchmarking against other sites and offline channels, and by supporting predictive modeling of future visitor behavior.
Question
Analysis of the information collected by Web servers can help better understand user behavior. Analysis of this data is called ________ analysis.
Question
________ is the grouping of similar documents without having a predefined set of categories.
Question
________ analysis is a technique used to detect favorable and unfavorable opinions toward specific products and services using textual data sources, such as customer feedback in Web postings and the detection of unfavorable rumors.
Question
________ is the process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases, where the data are organized in records structured by categorical, ordinal, or continuous variables.
Question
List three business applications of Web mining.
Question
List two options for managing or reducing the dimensionality (size) of the term-document matrix (TDM).
Question
What is the primary purpose of text mining within the context of knowledge discovery?
Question
Compare and contrast text mining and data mining.
Question
Diagram and explain the three-step text mining process.
Question
NLP has successfully been applied to a variety of tasks via computer programs to automatically process natural human language that previously could only be done by humans. List three of the most popular of these tasks.
Question
Define the three main areas of Web mining and each area's source of information.
Question
Why will computers probably not be able to understand natural language the same way and with the same accuracy that humans do?
Question
Describe a marketing application of text mining.
Question
What are three of the challenges for effective and efficient knowledge discovery posed by the Web?
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/70
auto play flashcards
Play
simple tutorial
Full screen (f)
exit full mode
Deck 5: Text and Web Mining
1
Two advantages associated with the implementation of NLP are word sense disambiguation and syntactic ambiguity.
False
2
A vast majority of business data is captured and stored in text documents that are structured.
False
3
The main purpose of establishing the corpus is to collect all of the documents related to the context being studied.
True
4
Stemming is the process of reducing inflected words to their base or root form.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
5
By applying a learning algorithm to parsed text, researchers from Stanford University's NLP lab have
developed methods that can automatically identify the concepts and relationships between those concepts in the text.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
6
DARPA and MITRE teamed up to develop capabilities to automatically filter text-based information sources to generate actionable information in a timely manner.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
7
Amazon.com leverages Web usage history dynamically and recognizes the user by reading a cookie written by a Web site on the visitor's computer.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
8
Customer experience management applications gather and report direct feedback from site visitors by benchmarking against other sites and offline channels, and by supporting predictive modeling of future visitor behavior.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
9
Text mining is important to competitive advantage because knowledge is power, and knowledge is derived from text data sources.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
10
Web crawlers are Web content mining tools that are used to read through the content of a Web site automatically.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
11
The purpose and processes of text mining are different from those of data mining because with text mining the input to the process are data files such as Word documents, PDF files, text excerpts, and XML files.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
12
The main categories of knowledge extraction methods are recall, search, and signaling.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
13
The goal of natural language processing (NLP) is syntax-driven text manipulation.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
14
The quality of search results is impossible to measure accurately using strictly quantitative measures such as click-through rate, abandonment, and search frequency. Additional quantitative and qualitative measures are required.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
15
The benefits of text mining are greatest in areas where very large amounts of textual data are being generated, such as law, academic research, finance, and medicine.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
16
Unstructured data has a predetermined format. It is usually organized into records as categorical, ordinal, and continuous variables and stored in databases.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
17
Web pages consisting of unstructured textual data coded in HTML and logs of visitors' interactions provide rich data that can easily provide effective and efficient knowledge discovery.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
18
Text mining can be used to increase cross-selling and up-selling by analyzing the unstructured data generated by call centers.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
19
Stop words, such as a, am, the, and was, are words that are filtered out prior to or after processing of natural language data.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
20
Compared to polygraphs for deception-detection, text-based deception detection has the advantages of being nonintrusive and widely applicable to textual data and transcriptions of voice recordings.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
21
In ________, the problem is to group an unlabelled collection of objects, such as documents, customer comments, and Web pages into meaningful groups without any prior knowledge.

A) search recall
B) classification
C) clustering
D) grouping
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
22
Which of the following correctly defines a text mining term?

A) Tagging is the number of times a word is found in a specific document.
B) A token is an uncategorized block of text in a sentence.
C) Rooting is the process of reducing inflected words to their base form.
D) A term is a single word or multiword phrase extracted directly from the corpus by means of NLP methods.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
23
Commercial software tools include all of the following except:

A) GATE
B) IBM Intelligent Miner Data Mining Suite
C) SAS Text Miner
D) SPSS Text Mining
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
24
Forward-thinking companies like Ask.com, Scholastic, and St. John Health System are actively using Web mining systems to answer important questions of "Who?" "Why?" and "How?" The benefits of integrating these systems:

A) are measured qualitatively in terms of customer satisfaction, but not measured using financial or other quantitative measure.
B) can be significant in terms of incremental financial growth and increasing customer loyalty and satisfaction.
C) have not yet outweighed the costs of the Web mining systems and analysis.
D) can be infinitely measurable.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
25
________ is a branch of the field of linguistics and a part of natural language processing that studies the internal structure of words.

A) Morphology
B) Corpus
C) Stemming
D) Polysemes
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
26
A ________ is one or more Web pages that provide a collection of links to authoritative pages, reference sites, or a resource list on a specific topic.

A) hub
B) hyperlink-induced topic search
C) spoke
D) community
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
27
A simple keyword-based search engine suffers from several deficiencies, which include all of the following except:

A) a topic of any breath can easily contain hundreds or thousands of documents
B) many documents that are highly relevant to a topic may not contain the exact keywords defining them
C) web mining can identify authoritative Web pages
D) many of the search results are marginally or not relevant to the topic
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
28
Which of the following refers to developing useful information from the links included in the Web documents?

A) Web content mining
B) Web subject mining
C) Web structure mining
D) Web matter mining
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
29
Using ________ as a rich source of knowledge and a strategic weapon, Kodak not only survives but excels in its market segment defined by innovation and constant change.

A) visualization
B) deception detection
C) patent analysis
D) semantic cues
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
30
It has been shown that the bag-of-word method may not produce good enough information content for text mining tasks. More advanced techniques such as ________ are needed.

A) classification
B) natural language processing
C) evidence-based processing
D) symbolic processing
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
31
At a very high level, the text mining process consists of each of the following tasks except:

A) create log frequencies
B) establish the corpus
C) create the term-document matrix
D) extract the knowledge
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
32
Why will computers probably not be able to understand natural language the same way and with the same accuracy that humans do?

A) A true understanding of meaning requires extensive knowledge of a topic beyond what is in the words, sentences, and paragraphs.
B) The natural human language is too specific.
C) The part of speech depends only on the definition and not on the context within which it is used.
D) All of the above.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
33
Which of the following is not one of the three main areas of Web mining?

A) Web search mining
B) Web content mining
C) Web structure mining
D) Web usage mining
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
34
When registered users revisit Amazon.com, they are greeted by name. This task involves recognizing the user by ________.

A) pattern discovery
B) association
C) text mining
D) reading a cookie
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
35
Text mining is the semi-automated process of extracting ________ from large amounts of unstructured data sources.

A) patterns
B) useful information
C) knowledge
D) all of the above
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
36
The two main approaches to text classification are ________ and ________.

A) knowledge engineering; machine learning
B) categorization; clustering
C) association; trend analysis
D) knowledge extraction; association
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
37
Why does the Web pose great challenges for effective and efficient knowledge discovery?

A) The Web search engines are indexed-based.
B) The Web is too dynamic.
C) The Web is too specific to a domain.
D) The Web infrastructure contains hyperlink information.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
38
A vast majority of business data are stored in text documents that are ________.

A) mostly quantitative
B) virtually unstructured
C) semi-structured
D) highly structured
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
39
All of the following are popular application areas of text mining except:

A) information extraction
B) document summarization
C) question answering
D) data structuring
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
40
All of the following are types of data generated through Web page visits except:

A) data stored in server access logs, referrer logs, agent logs, and client-side cookies
B) user profiles
C) hyperlink analysis
D) metadata, such as page attributes, content attributes, and usage data
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
41
________ is an important component of text mining and is a subfield of artificial intelligence and computational linguistics. It studies the problem of understanding the natural human language.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
42
________ mining is the extraction of useful information from data generated through Web page visits and transactions.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
43
________ mining is the process of extracting useful information from the links embedded in Web documents.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
44
________ is the process of reducing inflected words to their base or root form.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
45
One of the main approaches to text classification is ________ in which an expert's knowledge is encoded into the system either declaratively or in the form of procedural classification rules.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
46
A(n) ________ is one or more Web pages that provide a collection of links to authoritative pages.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
47
Fundamental to the optimization process is ________, gathering data and information that can then be transformed into tangible analysis and recommendations for improvement using Web mining tools and techniques.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
48
The term "stop-words" are used by text mining to ________ commonly used words.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
49
At a very high level, the first of three consecutive tasks in the text mining process is to establish the ________, which is a list of organized documents.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
50
________ is the semi-automated process of extracting patterns from large amounts of unstructured data sources.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
51
In linguistics, a(n) ________ is a large and structured set of texts prepared for the purpose of conducting knowledge discovery.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
52
In the text mining process, the output of task two is a flat file called a ________ matrix where the cells are populated with the term frequencies.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
53
________ words or noise words are words that are filtered out prior to or after processing of natural language data.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
54
Web analytics, CEM, and VOC applications form the foundation of the Web site ________ ecosystem that supports the online business' ability to positively influence desired outcomes.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
55
The ________ model, which is one where multiple sources of data describing the same population are integrated to increase the depth and richness of the resulting analysis, forms the framework of the Web site optimization ecosystem.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
56
________ applications focus on "who and how" questions by gathering and reporting direct feedback from site visitors, by benchmarking against other sites and offline channels, and by supporting predictive modeling of future visitor behavior.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
57
Analysis of the information collected by Web servers can help better understand user behavior. Analysis of this data is called ________ analysis.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
58
________ is the grouping of similar documents without having a predefined set of categories.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
59
________ analysis is a technique used to detect favorable and unfavorable opinions toward specific products and services using textual data sources, such as customer feedback in Web postings and the detection of unfavorable rumors.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
60
________ is the process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases, where the data are organized in records structured by categorical, ordinal, or continuous variables.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
61
List three business applications of Web mining.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
62
List two options for managing or reducing the dimensionality (size) of the term-document matrix (TDM).
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
63
What is the primary purpose of text mining within the context of knowledge discovery?
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
64
Compare and contrast text mining and data mining.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
65
Diagram and explain the three-step text mining process.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
66
NLP has successfully been applied to a variety of tasks via computer programs to automatically process natural human language that previously could only be done by humans. List three of the most popular of these tasks.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
67
Define the three main areas of Web mining and each area's source of information.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
68
Why will computers probably not be able to understand natural language the same way and with the same accuracy that humans do?
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
69
Describe a marketing application of text mining.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
70
What are three of the challenges for effective and efficient knowledge discovery posed by the Web?
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
locked card icon
Unlock Deck
Unlock for access to all 70 flashcards in this deck.