2015年7月23日星期四

統計方法－課程大綱

統計方法－課程大綱
週次 Week	進度說明 Progress Description
1	Introduction 介紹
2	Probability (Ⅰ) 可能性/機率(Ⅰ)
3	Probability (Ⅱ) 可能性/機率(Ⅱ)
4	Sampling and Sampling Distribution 抽樣與抽樣分佈
5	Interval Estimation 區間估計
6	Midterm exam (in the computer laboratory) 期中考試（在計算機實驗室）
7	Hypothesis Testing (Ⅰ) 假設檢驗（Ⅰ）
8	Hypothesis Testing (Ⅱ) 假設檢驗（Ⅱ）
9	Test of Independence and Goodness of Fit 獨立和善良的測試的飛度
10	ANOVA and Experimental Design(Ⅰ) 方差分析和實驗設計（Ⅰ）
11	ANOVA and Experimental Design(Ⅱ) 方差分析和實驗設計（Ⅱ）
12	ANOVA and Experimental Design(Ⅲ) 方差分析和實驗設計（Ⅲ）
13	Midterm exam (in the computer laboratory) 期中考試（在計算機實驗室）
14	Regression Analysis（Ⅰ）回歸分析（Ⅰ）
15	Regression Analysis（Ⅱ）回歸分析（Ⅱ）
16	Regression Analysis（Ⅲ）回歸分析（Ⅲ）
17	Regression Analysis（Ⅳ）回歸分析（Ⅳ）
18	Final exam (in the computer laboratory) 期末考試（在計算機實驗室）
※以上每週進度教師可依上課情況做適度調整

2015年7月21日星期二

資料包絡分析法

資料包絡分析法
http://www.wunan.com.tw/www2/download/preview/1FQF.PDF

【連結清單】

2015年5月15日星期五

資料探勘文獻

期刊與研討會

IEEE Transactions on Knowledge and Data Engineering (TKDE)

Journal of Data Mining and Knowledge Discovery (JDMKD)

Journal of Very Large Database Systems (JVLDS)

Journal of Visual Language and Computing (JVLC)

Journal of Intelligent Information Systems (JIIS)

Journal of Intelligent Data Analysis (JIDA)

Data and Knowledge Engineering (DKE)

Machine Learning (ML)

ACM SIGMOD Record (SIGMODR)

ACM Int’l Conf. on Management of Data (ICMOD)

IEEE Int’l Conf. on Data Engineering (ICDE)

IEEE Int’l Conf. on Information Visualization (ICIV)

Int’l Conf. on Knowledge Discovery and Data Mining (ICKDD)

Int’l Conf. on Very Large Databases (ICVLDB)

Int’l Conf. on Information and Knowledge Management (CIKM)

Int’l Symp. on Methodologies for Intelligent Systems (ISMIS)

Conference on Machine Learning

網路資源

Ÿ http://www.acm.org/sigkdd/

Ÿ http://www.nautilus-systems.com/books.html

Ÿ http://www.lib.iastate.edu/

Iowa State University have made a systemaic effort to identify and acquire the more important monographs and conference proceedings on Data Mining and Knowledge Dicovery in Databases. Select 'Library Catalog' and search 'data mining' or 'knowledge discovery' in the keyword [General Keyword] search.

Ÿ http://www.ct.monash.edu.au/research.html

data mining group has much resources

Ÿ The collection of Computer Science bibliographies

http://liinwww.ira.uka.de/bibliography/index.html

http://www.cs.monash.edu.au/mirrors/bibliography/

Ÿ http://www.research.att.com/~lewis/reuters21578.html

Lewis, D., 1997. The reuters-21578, distribution 1.0

Ÿ http://src.doc.ac.uk/bysubject/computing/overview.html

Ÿ http://www.cbu.edu/sciences/inetcs.html

Ÿ http://www.ulb.ac.be/di/bookmarks/book.html#cs

概論

Fayyad, U., “From data mining to knowledge discovery: an overview”, Advances in KDD

Brachman, R., “Mining business databases”, Communication of ACM, Nov. 1996

Simoudis, E., “Reality check for data mining”, IEEE EXPERT, Oct. 1996

Fayyad, U., “Knowledge discovery and data mining: towards a unifying framework”, KDD96

Piatetsky-Shapiro, G., “An overview of issues in developing industrial data mining and knowledge discovery applications”, ICKDD 96

Mitchell, T., Machine Learning, McGraw-Hill, 1997

資料探勘應用

John, G., “Stock selection using rule induction”, IEEE EXPERT, Oct. 1996

Dao, S.“Applying a data miner to heterogeneous schema integration”, KDD95

Dzeroski, S., “Knowledge discovery in a water quality database”, KDD95

Ezawa, K., “Knowledge discovery in telecommunication services data using Bayesian network models”, KDD95

Feelders, A., “Data mining for loan evaluation at ABN AMRO: a case study”, KDD95

Sanjeev, A., “Discovering enrollment knowledge in university databases”, KDD95

Tsumoto, S., “Automated discovery of functional components of proteins from Amino-Acid sequences based on rough sets and change of representation”, KDD95

Fitzsimons, M., “The application of rule induction and neural networks for television audience prediction”, Proc. of ESOMAR/EMAC/AFM symposium on information based decision making in marketing, 1993, pp.69-82

Schmitz, J., “CoverStory – automated news finding in marketing”, DSS Transactions, ed. L. Volino, 46-54. Providence, R.I.: Institute of Management Sciences

Anand, T., “Opportunity explorer: navigating large databases using knowledge discovery templates”, JIIS 4(1): 27-38

Hall, J., “Applying computational intelligence to the investment process”, Proc. of CIFER-96: computational intelligence in financial engineering, IEEE Press

Senator, T., “The financial crimes enforcement network AI system (FAIS)”, AI magazine, winter 1995, 21-39

Davis, A., “Management of cellular fraud: knowledge-based detection, classification and prevention”, Proc. of 13^th Int. Conf. on AI, expert systems and natural language, v2, p.155-164

Data mining applications section in KDD96

網際網路資料探勘

Carbonell, J., “Learning from the WEB”, ISMIS 97

Chen, M.-S., “Data mining for path traversal patterns in a web environment”, Int’l Conf. On Distributing Computing Systems, 1996 (COMPENDEX 91~)

Etzioni, O., “The World-Wide Web: quagmire or gold mine?”, CACM, v.39, no.11, 1996

Hsu, Y.-J. and Wen-Tan Yih, (of Taiwan U.) “Template-based information mining from HTML documents”, Proc. of 14^th National Conf. on A.I., 1997

Soderland, S. “Learning to extract text-based information from the world wide web”, ICKDD 97

Zaiane, O., “Resource and knowledge discovery in global information systems: a preliminary design and experiment”, KDD95

Zamir, O., “Fast and intuitive clustering of web documents”, ICKDD 97

http://netmining.dfw.ibm.com

http://www.almaden.ibm.com/cs/k53/clever.html

IPO Keywords: world wide web AND information retrieval

文件資料探勘

Soderland, S. “Learning to extract text-based information from the world wide web”, ICKDD 97

Hahn, U., “Deep knowledge mining from natural language text sources”, (CIKM97)

Feldman, R., “Knowledge discovery in textual databases”, KDD95

Feldman, R., “Mining associations in text in the presence of background knowledge”, KDD96

Feldman, R., “Document explorer: discovering knowledge in document collections”, ISMIS 97

Zari, G., “Conceptual modeling of the “meaning” of textual narrative documents”, ISMIS 97

Esposito, F., “Knowledge revision for document understanding”, ISMIS 97

Reuters-22173 corpus: a collection of 22,173 indexed documents appearing on the Reuters newswire in 1987; Reuters Ltd, Carnegie Group, David Lewis, Information Retrieval Laboratory at the University of Massachusetts; available via ftp from: ciir-ftp.cs.umass.edu:/pub/reuters1/corpus.tar.Z.

簡立峰，中研院資科所中文資訊處理實驗室：Csmart系統

多媒體資料庫資料探勘

Ester, M., “A database interface for clustering in large spatial databases”, KDD95

Li, C., “Knowledge-based scientific discovery in geological databases”, KDD95

Stolorz, P., “Fast spatio-temporal data mining of large geophysical datasets”, KDD95

Knorr, E., “Extraction of spatial proximity patterns by concept generalization”, KDD96

Padmanabhan, B., “Pattern discovery in temporal databases: a temporal logic approach”, KDD96

Czyzewski, A., “Mining knowledge in noisy audio data”, KDD96

Ester, M., “A density-based algorithm for discovering clusters in large spatioal databases with noise”, KDD96

Kaufman, K., “A method for reasoning with stuctured and continuous attributes in the INLEN-2 multistrategy knowledge discovery system”, KDD96

Lagus, K., “Self-organizing maps of document collections: a new approach to interactive exploration”, KDD96

關連法則

Holsheimer, M., “A perspective on databases and data mining”, KDD95

Feldman, R., “Mining associations in text in the presence of background knowledge”, KDD96

Cheung, D., “Maintenance of discovered knowledge: a case in multi-level association rules”, KDD96

Agrawal, R., “Mining association rules between sets of items in large databases”, ICMOD 1993

Agrawal, R., “Fast algorithms for mining association rules”, ICVLDB 94

Savasere, A., “An efficient algorithm for mining association rules in large databases’, ICVLDB 95

Srikant, R., “Mining quantitative association rules in large relational tables”, ICMOD 96

Fukuda, T., “Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization”, ICMOD 96

Brin, S., “Dynamic itemset counting and implication rules for market basket data”, ICMOD 97

Brin, S., “Beyond market baskets: generalizing association rules to correlations”, ICMOD 97

Han, E.-H., “Scalable parallel data mining for association rules”, ICMOD 97

Lent, B, “Clustering association rules”, ICDE 97

Park, J., “Mining association rules with adjustable accuracy”, CIKM 97

Singh, L., “Generating association rules from semi-structured documents using a concept hierarchy”, CIKM 97

時間序列

Mannila, H., “Discovering frequent episodes in sequences”, KDD95

Mannila, H., “Discovering generalized episodes using minimal occurrences”, KDD96

Mannila, H., “Rule discovery from time series”, KDD98

Agrawal, R. ``Efficient Similarity Search in Sequence Databases'', 4th Int'l Conf. on Foundations of Data Organization and Algorithms, 1993

Agrawal, R. ``Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases'', 21st Int'l Conf. on VLDB, 1995.

Agrawal, R. ``Mining Sequential Patterns'', Int'l Conf. on Data Engineering, 1995.

Agrawal, R., “Querying shapes of histories”, VLDB95 Proc.

Berndt, D., “Finding patterns in time series: a dynamic programming approach”, Advances in KDD, 1996

Goldin, D., “On similarity queries for time-series data: constraint specification and implementation”, 1^st int’l conf. on the principles and practice of constraint programming, LNCS 976, Sept. 1995

Jagadish, H., “Similarity-based queries”, PODS95 Proc.

Keogh, E., “A probabilistic approach to fast pattern matching in time series databases”, KDD97

Laird, P., “Identifying and using patterns in sequential data”, 4^th Int’l Workshop on Algorithmic Learning Theory, 1993, Springer-Verlag, pp.1-18

Lent, B., “Discovering trends in text databases”, KDD97 Proc.

Rafiei, D., “Similarity-based queries for time series data”, SIGMOD97 Proc.

Shim, K. "High-dimensional Similarity Joins", 13th Int'l Conf. on Data Engineering, 1997.

Srikant, R. ``Mining Sequential Patterns: Generalizations and Performance Improvements'', Fifth Int'l Conf. on Extending Database Technology, 1996.

Visualization and Data Exploration

Brunk, C., “MineSet: an integrated system for data mining”, ICKDD 97

Catarci, T., “Visual query systems for databases: a survey”, JVLC 97

Derthick, M., “An interactive visualization environment for data exploration”, ICKDD 97

Feldman, R., “Visualization techniques to explore data mining results for document collections”, ICKDD 97

Gebhardt, M., “A toolkit for negotiation support interfaces to multi-dimensional data”, ICMOD97

Hee, H.-Y., “Visualization support for data mining”, IEEE EXPERT, Oct. 1996

Livny, M., “DEVise: integrated querying and visual exploration of large datasets”, ICMOD97

Mihalisin, T., “Fast robust visual data mining”, ICKDD97

Rao, S., “Providing better support for a class of decision support queries”, ICMOD96

Roth, S., “Visage: a usr interface environment for exploring information”, ICIV 96

Selfridge, P., “IDEA: interactive data exploration and analysis”, ICMOD 96

Ahlberg, C., “Spotfire: an information exploration environment”, SIGMODR v25 n4, Dec. 96

Kennedy, J., “A framework for information visualization”, SIGMODR v25 n4, Dec. 96

Keim, D. “Pixel-oriented database visualizations”, SIGMODR v25 n4, Dec. 96

Ioannidis, Y., “Dynamic information visualization”, SIGMODR v25 n4, Dec. 96

Hasan, M., “Applying database visualization to the world wide web”, SIGMODR v25 n4, Dec. 96

OLAP, Data Cube, and Data Warehousing

Chaudhuri, S., “An overview of data warehousing and OLAP technology”, SIGMODR, March, 97

Colliat, G., “OLAP, relational, and multidimensional database systems”, SIGMODR, Sept. 1996

Gray, J., “Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals”, JDMKD 97

Harinarayan, V., “Implementing data cubes efficiently”, ICMOD 96

Ho, C.-T., “Range queries in OLAP data cubes”, ICMOD 97

Roussopoulos, N., “Cubetree: organization of and bulk updates on the data cube”, ICMOD97

Mumick, I., “Maintenance of data cubes and summary tables in a warehouse”, ICMOD97

Agrawal, R., “Modeling multidimensional databases”, ICDE 97

Gupta, H., “Index selection for OLAP”, ICDE 97

Labio, W., “Physical database design for data warehouses”, ICDE 97

Gyssens, M., “A foundation for multi-dimensional databases”, ICVLDB 97

Ross, K., “Fast computation of sparse datacubes”, ICVLDB 97

Clustering

Similarity

Weber, R. et al., A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces, Int’l Conf. on VLDB, 1998.

Measures of interestingness

Kamber, M., “Evaluating the interestingness of characteristic rules”, KDD96

Silberschatz, A., “On subjective measures of interestingness in knowledge discovery”, KDD95

Suzuki, E., “Exceptional knowledge discovery in database based on information theory”, KDD96

知識表示與資料探勘

人工智慧、專家系統教科書或知識表示法專書

Aronis, J., “Exploiting background knowledge in automated discovery”, KDD96

特徵選擇

Kohavi, R., “Feature subset selection using the wrapper method: overfitting and dynamic search space topology”, KDD95

Seshadri, V., “Feature extraction for massive data mining”, KDD95

Cherkauer, K., “Growing simpler decision trees to facilitate knowledge discovery”, KDD96

Urpani, D., “RITIO - rule induction two in one”, KDD96

2015年5月11日星期一

Excel 「規劃求解」範例一

http://yes.nctu.edu.tw/Lecture/PC/Office/Excel/Tutor/Solution/Example/Exam1/Index.htm

2015年4月11日星期六

整體學習(Ensemble Learning)入門

(轉載)
“監督學習(Supervised Learning)”是各種統計學習方法中最單純，最容易理解的形式。
一般而言，監督學習的正規定義可以這樣來描述。
每筆資料點(data point) 是由一個特徵向量，我們以表示之，和一個類別標籤(class label) 所組成；同時，假定有一個未知(underlying)的函式存在，對於每一筆訓練的資料點來說，是恆成立的。
於是學習演算法的目標就是要找一個令人滿意的近似函式h，並使得針對任何一筆新增的特徵向量 Χ y f Χ y),( fy Χ= )( Χnew 所求得的類別標籤可以愈接近原始函式計算的結果。
這個近似函式，我們就稱為分類器(classifier)，如此命名的原因，是因為它可以將輸入的特徵向量分發或歸類到某一個真實或接近真實的類別。
監督學習能被應用於很多的問題上，包括手寫辨識、醫學診斷和部分語音或文字的標籤處理。..詳文... 整體學習(Ensemble Learning)入門

群集化（clustering）

(轉載)
科學與工程技術期刊第三卷第一期民國九十六年
利用共生詞彙特性發展一個二階段文件群集法
http://journal.dyu.edu.tw/dyujo/document/setjournal/s3-1-9-18.pdf

摘要
群集化（clustering）
是在資料探勘領域中被廣泛應用的技術，將其概念應用於文字探勘的領域中，亦是近來的熱門研究議題。
若將群集化技術應用於文件型態的資料時，常會採用向量空間模型（vector space model, VSM）來表達文件資料，然而在學術研究上卻發現有兩個缺失：一為無法辨識文中詞彙間的關聯性，造成文件誤判。
在向量空間模型中，每個關鍵詞彙所構成的維度都是獨立的，無法區別文中詞彙間的關聯性（包括一詞多義、一義多詞、以及共同發生詞彙），使得進行文件相似度的比對時可能會造成誤判的情況，降低文件群集之品質。
另一缺失則為如維度太高，易造成群集失準的問題。
向量空間模型的維度是由文件集所有的關鍵詞彙之數量而定，當文件所萃取出來的關鍵字過多時，便會使得向量空間模型的維度增加，導致群集的結果也比較不準確。
為了改善向量空間模型的兩大缺點，本文嘗試提出一個二階段的文件群集法，第一階段先將關鍵字進行群集，第二階段再利用這些關鍵字群集將文件分群；本文透過關聯規則技術的應用，來改善向量空間模型的缺失並增進文件群集的品質。
此外，關鍵字群集後的結果還可以幫助文件群集作概括性的描述。本文以 Reuters-21578 文件集進行實驗評估，將本論文所提出的文件群集法與傳統的文件群集法相比較，實驗結果證實本論文所提出的方法確實能得到高品質的文件群集。

何謂信賴區間

(轉載)維基百科網站

在統計學中，一個機率樣本的信賴區間（Confidence interval）是對這個樣本的某個總體參數的區間估計。信賴區間展現的是這個參數的真實值有一定機率落在測量結果的周圍的程度。信賴區間給出的是被測量參數的測量值的可信程度，即前面所要求的「一定機率」。這個機率被稱為信心水準。舉例來說，如果在一次大選中某人的支持率為55%，而信心水準0.95上的信賴區間是（50%,60%），那麼他的真實支持率有百分之九十五的機率落在百分之五十和百分之六十之間，因此他的真實支持率不足一半的可能性小於百分之2.5（假設分布是對稱的）。

如例子中一樣，信心水準一般用百分比表示，因此信心水準0.95上的信賴區間也可以表達為：95%信賴區間。信賴區間的兩端被稱為置信極限。對一個給定情形的估計來說，信心水準越高，所對應的信賴區間就會越大。

對信賴區間的計算通常要求對估計過程的假設（因此屬於參數統計），比如說假設估計的誤差是成常態分佈的。

信賴區間只在頻率統計中使用。在貝葉斯統計中的對應概念是可信區間。但是可信區間和信賴區間是建立在不同的概念基礎上的，因此一般上說取值不會一樣。置信空間表示通過計算估計值所在的區間。信心水準表示準確值落在這個區間的機率。信賴區間表示具體值範圍，信心水準是個機率值。例如：估計某件事件完成會在10~12日之間，但這個估計準確性大約只有80%：表示信賴區間（10,12），信心水準80%。要想提高信心水準，就要放寬置信空間。

置信度(摘自Bai du百科)

(轉載)置信度
http://translate.google.com.tw/translate?hl=zh-TW&sl=zh-CN&u=http://baike.baidu.com/view/434404.htm&prev=search
在統計學中，一個概率樣本的置信區間 （Confidence interval）是對這個樣本的某個總體參數的區間估計。置信區間展現的是這個參數的真實值有一定概率落在測量結果的周圍的程度。置信區間給出的是被測量參數的測量值的可信程度，即前面所要求的“一定概率”。這個概率被稱為置信水平 。

[簡介]

如果在一次大選中某人的支持率為55%，而置信水平0.95上的置信區間是（50%,60%），那麼他的真實支持率有百分之九十五的機率落在百分之五十和百分之六十之間，因此他的真實支持率不足一半的可能性小於百分之2.5（假設分佈是對稱的）。

如例子中一樣，置信水平一般用百分比表示，因此置信水平0.95上的置信區間也可以表達為：95%置信區間。置信區間的兩端被稱為置信極限 。對一個給定情形的估計來說，置信水平越高，所對應的置信區間就會越大。

對置信區間的計算通常要求對估計過程的假設（因此屬於參數統計），比如說假設估計的誤差是成正態分佈的。

置信區間只在頻率統計中使用。在貝葉斯統計中的對應概念是可信區間。

但是可信區間和置信區間是建立在不同的概念基礎上的，因此一般上說取值不會一樣。

置信空間表示通過計算估計值所在的區間。

置信水平表示準確值落在這個區間的概率。

置信區間表示具體值範圍，置信水平是個概率值。

例如：估計某件事件完成會在10~12日之間，但這個估計準確性大約只有80%：

表示置信區間（10,12），置信水平80%。要想提高置信水平，就要放寬置信空間。 [2]

置信水平是指總體參數值落在樣本統計值某一區內的概率；而置信區間是指在某一置信水平下，樣本統計值與總體參數值間誤差範圍。

置信區間越大，置信水平越高。

2015年4月6日星期一

第一次登入後再改密碼

電子信箱(學校)E-mail address(school)

說明：
學校電子信箱是學校與學生重要事項聯絡的管道，完成報到手續後，請由本校首頁「成功入口」登入，即可使用本校E-mail收送信件；「成功入口」系統使用問題，請洽詢計網中心一樓諮詢服務區。

新生（98學年度入學者）：
（部分新生要等完成報到手續，學籍資料建檔後，才能登入）

本地生：【身份證號】後4碼+【生日】後4碼。

如：
【身分證】：A123456789
【生日】：78年06月12日
======>> 則第一次登入密碼為：67890612

https://myidp.sso2.ncku.edu.tw/nidp/idff/sso?id=53&sid=0&option=credential&sid=0

2015年3月28日星期六

Minitab文件下載中心

Minitab文件下載中心
http://www.minitab.com.tw/support/filedownload_list.php

Minitab/QC 單機版

Minitab License Manager (Server)

Minitab/QC 網路版 (用戶端)

操作手冊

其他

訂閱：意見 (Atom)

2015年7月23日 星期四

2015年7月21日 星期二