java調用weka分類演算法_數據挖掘WEKA工具怎樣來用來進行文本分類有800多個測試文本求大神給出具體的步驟和通俗易懂的

Ⅰ 通過weka建立決策樹怎麼提取分類規則

你的意思是從訓練好的決策樹模型中自動提取出分類規則嗎？weka好像沒有可以直接從樹結構中提取規則的功能吧。

不過物世如亂中果模型不是太復雜的話手工統計每個從根節點到葉子節點的遍歷也很嘩螞山方便啊，每個遍歷上的內部節點加上樹枝就是if條件，葉子節點就是then的判斷結果。如果模型比較復雜的話可以考慮做個簡單的二次開發。

假設你用的是J48,用weka explorer把訓練好的決策樹另存下來（或者直接在代碼里用輸出流寫入文件），再用輸入流把決策樹讀入為一個sourcable對象，調用對象的tosource方法把決策樹代碼化，接下來就是文本處理的問題了，通過分析代碼結構得到相應的分類規則。

大概是這樣的讀入過程：

FileInputStream j48 = new FileInputStream("j48.model");
ObjectInputStream j48object = new ObjectInputStream(j48);

Sourcable j48code = (Sourcable) j48object.readObject();
System.out.println(j48code.toSource("J48 Tree"));

用上面幾行舉個例子，希望對你有啟發^^

Ⅱ 怎樣通過java代碼實現分詞後的text文本生成weka可以處理的arff文件

先把文件讀取出來後放到Instances中，在吧Instances保存成arff文件。保存arff文件代碼如下：

ArffSaversaver=newArffSaver();
saver.setInstances(data);
saver.setFile(newFile("c:	est.arff"));
saver.writeBatch();

Ⅲ 請教一下WEKA中ID3和C45演算法如何運行

您所謂的「具體實現」就是用這兩種演算法來訓練分類模型，完成分類任務吧......用weka explorer選擇您手頭要用來做實驗的訓練集、測試集以及相應的分類演算法，直接跑一遍...或者在自己的代碼里用weka的api寫幾行以完成同樣的工作...很簡單的哈，網上很多這樣的例子，您隨便找一個看兩眼就明白了。

Ⅳ 數據挖掘WEKA工具怎樣來用來進行文本分類有800多個測試文本，求大神給出具體的步驟和通俗易懂的

第一步，你要有中文的數據集；
第二步，數據集要准備成weka能處理的結構，這很好做到，回你把數據集壓縮了就答行了，因為它要求的格式是，一個類別的文件放一個文件夾下。但是還有一個問題，你的機器往往沒那麼多內存去處理這個數據集，那麼你可以選幾個類別出來，在每個類別中放幾十個文檔來做就可以了。
第三步，分詞。
第四步，使用weka wiki中的例子將數據集轉換成arff格式。

weka是一種機器學習演算法的集合，它可以用於分類，預測等。由於weka支持的數據格式是arff或csv的格式，因此在進行weka實驗的時候必須進行數據的預處理。一般，我們可以在EXCEL裡面導入TXT，然後另存為.CSV格式的文件（這個格式WEKA也是可以識別的），然後打開WEKA，–》TOOL–》 arffviewer中打開剛才的.CSV文件，另存為.arff就OK了！

Ⅳ 如何用weka將多種分類演算法集成起來

需兄岩要將文件轉換成標稱（nominal)類型，weka把exel中的數字看作是數據類型，不能處理，從而導致Apriori演算法沒法用。
WEKA的全名是懷卡托智能分析環境(Waikato Environment for Knowledge Analysis)，同時weka也是紐西蘭的一種鳥名，而WEKA的主要開發者來自紐西蘭。wekaWEKA作為一個公開的數據挖掘工作，集合了大量能承擔數據挖掘任務的機搭塵汪器學習演算法，包括對數據進行預處理，分類，回歸、聚類、關聯規則以及在新的互動式界面上的可視化。
如果想自己實現數據挖掘演算法的話，可以參考weka的介面文檔。在weka中集成自己的演算法甚至借鑒它的方法自己實現可視化工具並不是件很困難的事情。
2005年8月，在第11屆ACM SIGKDD國際會議上，懷卡託大學的Weka小組榮獲了數據挖掘和知識探索領域的最高服務獎，Weka系統得到了廣泛的認可，被譽為數據挖掘和機器學習歷史上的里程碑，是現今最完備的數據挖掘工具之一(已有11年的發展歷史)。Weka的每月次知仔數已超過萬次。

Ⅵ 是用python學數據挖掘好，還是用java學weka的開發好

主要是方便，python的第三方模塊很豐富，而且語法非常簡練，自由度很高，python的numpy、scipy、matplotlib模塊可以完成所有的spss的功能，而且可以根據自己的需要按照定製的方法對數據進行清洗、歸約，需要的情況下還可以跟sql進行連接，做機器學習，很多時候數據是從互聯網上用網路爬蟲收集的，python有urllib模塊，可以很簡單的完成這個工作，有些時候爬蟲收集數據還要對付某些網站的驗證碼，python有PIL模塊，可以方便的進行識別，如果需要做神經網路、遺傳演算法，scipy也可以完成這個工作，還有決策樹就用if-then這樣的代碼，做聚類不能局限於某幾種聚類，可能要根據實際情況進行調整，k-means聚類、DBSCAN聚類，有時候可能還要綜合兩種聚類方法對大規模數據進行聚類分析，這些都需要自行編碼來完成，此外，基於距離的分類方法，有很多距離表達方式可以選用，比如歐幾里得距離、餘弦距離、閔可夫斯基距離、城市塊距離，雖然並不復雜，但是用python編程實現很方便，基於內容的分類方法，python有強大的nltk自然語言處理模塊，對語言片語進行切分、收集、分類、統計等。
綜上，就是非常非常方便，只要你對python足夠了解，你發現你可以僅僅使用這一個工具快速實現你的所有想法

Ⅶ 從java中調用weka中的分類函數的問題

Instances instances = getArffData("arff文件路徑");

instances.setClassIndex(instances.numAttributes() - 1); 是設置class索引，糾正Class index not set!專
instances.setClass(attribute); 是設置class屬性，糾正Class attributenot set的！屬

Ⅷ 用weka貝葉斯公式決策分類的步驟是什麼求賜教。。。。最好給出詳細的步驟，謝謝！

你可以用程序debug跟蹤一下，以前是weka中NaiveBayesSimple類的主要函數和作用。
(1) globalInfo()
返回該分類器的描述字元串．
(2) getTechnicalInformation()
返回一個TechnicalInformation類型的對象實例，包含該類的技術背景等信息．
(3) getCapabilities()
返回默認參數．
(4) BuildClassifier(Instances instances)
BuildClassifier()方法從一個訓練數據集合instances構造一個分類器．求出所有名稱型屬性的後驗概率，類屬性的先驗概率，數值屬性的均值和方差，為後來的分類工作做准備．
(5) distributionForInstance (Instance instance)
該方法計算待分類實例instance屬於各個類標的百分比，並且將各個百分比數值存於一個數組中，最後返回該數組．
(6)toString()
把分類器的參數（均值，方差，各先驗概率，各後驗概率）以字元串的形式返回．
(7)normalDens(double x, double mean, double stdDev)
該方法用於根據正態分布（均值為mean，方差為stdDev）計算數值型屬性當屬性值為x時的概率密度．
(8) getRevision()
返回程序的版本號．
(9) Main()
當類從命令行被執行時，就會調用main()方法．他只是用所給的命令行選項告訴Weka的Evaluation類來評估樸素貝葉斯，並且列印所得到的數組．完成這個功能的一行表達式包括在try-catch聲明中．try-catch聲明用於發現Weka常式或其他Java方法中拋出的各種異常．

Ⅸ 請問如何向weka中添加新演算法

1.編寫新演算法，所螞伏州編寫的新演算法必須符合Weka的介面標准。在此以從Weka中文站（貌似已經打不開了，做實驗可以簡單復制一個clusterers目錄下已有的演算法改個名就好了）上下載的一個演算法（模糊C均值聚類演算法：FuzzyCMeans）的添加為例說明其具體過程。
2.由於FuzzyCMeans是聚類演算法，所以直接將FuzzyCMeans.java 源程序考到 weka.clusterers 包下。
3.再修改weka.gui.GenericObjectEditor.props ，在#Lists the Clusterers I want to
choose from的weka.clusterers.Clusterer=\下加入：weka.clusterers.FuzzyCMeans。
4.相應的修改weka.gui.GenericPropertiesCreator.props
，此去不用修改，因為包weka.clusterers已經存在，若加入新的包時則必須修改這里，加入新的包。
加入之後，重新編譯，運行後，可以在weka的Explorer界面上的Cluster選項卡中的聚類演算法中找到剛剛新添加的FuzzyCMeans演算法。
添加過程的關悶蔽鍵問題是要弄清楚Weka的內核以及其介面標准，然後編寫出符合此規范的新演算法。
演算法規范和weka源代碼正在分析學習中廳模。

Ⅹ 求助 weka 的ID3演算法java源碼

/*
*    This program is free software; you can redistribute it and/or modify
*    it under the terms of the GNU General Public License as published by
*    the Free Software Foundation; either version 2 of the License, or
*    (at your option) any later version.
*
*    This program is distributed in the hope that it will be useful,
*    but WITHOUT ANY WARRANTY; without even the implied warranty of
*    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
*    GNU General Public License for more details.
*
*    You should have received a  of the GNU General Public License
*    along with this program; if not, write to the Free Software
*    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*/
/*
*    Id3.java
*    Copyright (C) 1999 University of Waikato, Hamilton, New Zealand
*
*/
package weka.classifiers.trees;
import weka.classifiers.Classifier;
import weka.classifiers.Sourcable;
import weka.core.Attribute;
import weka.core.Capabilities;
import weka.core.Instance;
import weka.core.Instances;
import weka.core.;
import weka.core.RevisionUtils;
import weka.core.TechnicalInformation;
import weka.core.TechnicalInformationHandler;
import weka.core.Utils;
import weka.core.Capabilities.Capability;
import weka.core.TechnicalInformation.Field;
import weka.core.TechnicalInformation.Type;
import java.util.Enumeration;
/**
<!-- globalinfo-start -->
* Class for constructing an unpruned decision tree based on the ID3 algorithm. Can only deal with nominal attributes. No missing values allowed. Empty leaves may result in unclassified instances. For more information see: <br/>
* <br/>
* R. Quinlan (1986). Inction of decision trees. Machine Learning. 1(1):81-106.
* <p/>
<!-- globalinfo-end -->
*
<!-- technical-bibtex-start -->
* BibTeX:
* <pre>
* &#64;article{Quinlan1986,
*    author = {R. Quinlan},
*    journal = {Machine Learning},
*    number = {1},
*    pages = {81-106},
*    title = {Inction of decision trees},
*    volume = {1},
*    year = {1986}
* }
* </pre>
* <p/>
<!-- technical-bibtex-end -->
*
<!-- options-start -->
* Valid options are: <p/>
*
* <pre> -D
*  If set, classifier is run in debug mode and
*  may output additional info to the console</pre>
*
<!-- options-end -->
*
* @author Eibe Frank ([email protected])
* @version $Revision: 6404 $
*/
public class Id3
extends Classifier
implements TechnicalInformationHandler, Sourcable {
/** for serialization */
static final long serialVersionUID = -2693678647096322561L;
/** The node's successors. */
private Id3[] m_Successors;
/** Attribute used for splitting. */
private Attribute m_Attribute;
/** Class value if node is leaf. */
private double m_ClassValue;
/** Class distribution if node is leaf. */
private double[] m_Distribution;
/** Class attribute of dataset. */
private Attribute m_ClassAttribute;
/**
* Returns a string describing the classifier.
* @return a description suitable for the GUI.
*/
public String globalInfo() {
return  "Class for constructing an unpruned decision tree based on the ID3 "
+ "algorithm. Can only deal with nominal attributes. No missing values "
+ "allowed. Empty leaves may result in unclassified instances. For more "
+ "information see: 

"
+ getTechnicalInformation().toString();
}
/**
* Returns an instance of a TechnicalInformation object, containing
* detailed information about the technical background of this class,
* e.g., paper reference or book this class is based on.
*
* @return the technical information about this class
*/
public TechnicalInformation getTechnicalInformation() {
TechnicalInformation    result;
result = new TechnicalInformation(Type.ARTICLE);
result.setValue(Field.AUTHOR, "R. Quinlan");
result.setValue(Field.YEAR, "1986");
result.setValue(Field.TITLE, "Inction of decision trees");
result.setValue(Field.JOURNAL, "Machine Learning");
result.setValue(Field.VOLUME, "1");
result.setValue(Field.NUMBER, "1");
result.setValue(Field.PAGES, "81-106");
return result;
}
/**
* Returns default capabilities of the classifier.
*
* @return      the capabilities of this classifier
*/
public Capabilities getCapabilities() {
Capabilities result = super.getCapabilities();
result.disableAll();
// attributes
result.enable(Capability.NOMINAL_ATTRIBUTES);
// class
result.enable(Capability.NOMINAL_CLASS);
result.enable(Capability.MISSING_CLASS_VALUES);
// instances
result.setMinimumNumberInstances(0);
return result;
}
/**
* Builds Id3 decision tree classifier.
*
* @param data the training data
* @exception Exception if classifier can't be built successfully
*/
public void buildClassifier(Instances data) throws Exception {
// can classifier handle the data?
getCapabilities().testWithFail(data);
// remove instances with missing class
data = new Instances(data);
data.deleteWithMissingClass();
makeTree(data);
}
/**
* Method for building an Id3 tree.
*
* @param data the training data
* @exception Exception if decision tree can't be built successfully
*/
private void makeTree(Instances data) throws Exception {
// Check if no instances have reached this node.
if (data.numInstances() == 0) {
m_Attribute = null;
m_ClassValue = Instance.missingValue();
m_Distribution = new double[data.numClasses()];
return;
}
// Compute attribute with maximum information gain.
double[] infoGains = new double[data.numAttributes()];
Enumeration attEnum = data.enumerateAttributes();
while (attEnum.hasMoreElements()) {
Attribute att = (Attribute) attEnum.nextElement();
infoGains[att.index()] = computeInfoGain(data, att);
}
m_Attribute = data.attribute(Utils.maxIndex(infoGains));
// Make leaf if information gain is zero.
// Otherwise create successors.
if (Utils.eq(infoGains[m_Attribute.index()], 0)) {
m_Attribute = null;
m_Distribution = new double[data.numClasses()];
Enumeration instEnum = data.enumerateInstances();
while (instEnum.hasMoreElements()) {
Instance inst = (Instance) instEnum.nextElement();
m_Distribution[(int) inst.classValue()]++;
}
Utils.normalize(m_Distribution);
m_ClassValue = Utils.maxIndex(m_Distribution);
m_ClassAttribute = data.classAttribute();
} else {
Instances[] splitData = splitData(data, m_Attribute);
m_Successors = new Id3[m_Attribute.numValues()];
for (int j = 0; j < m_Attribute.numValues(); j++) {
m_Successors[j] = new Id3();
m_Successors[j].makeTree(splitData[j]);
}
}
}
/**
* Classifies a given test instance using the decision tree.
*
* @param instance the instance to be classified
* @return the classification
* @throws  if instance has missing values
*/
public double classifyInstance(Instance instance)
throws  {
if (instance.hasMissingValue()) {
throw new ("Id3: no missing values, "
+ "please.");
}
if (m_Attribute == null) {
return m_ClassValue;
} else {
return m_Successors[(int) instance.value(m_Attribute)].
classifyInstance(instance);
}
}
/**
* Computes class distribution for instance using decision tree.
*
* @param instance the instance for which distribution is to be computed
* @return the class distribution for the given instance
* @throws  if instance has missing values
*/
public double[] distributionForInstance(Instance instance)
throws  {
if (instance.hasMissingValue()) {
throw new ("Id3: no missing values, "
+ "please.");
}
if (m_Attribute == null) {
return m_Distribution;
} else {
return m_Successors[(int) instance.value(m_Attribute)].
distributionForInstance(instance);
}
}
/**
* Prints the decision tree using the private toString method from below.
*
* @return a textual description of the classifier
*/
public String toString() {
if ((m_Distribution == null) && (m_Successors == null)) {
return "Id3: No model built yet.";
}
return "Id3

" + toString(0);
}
/**
* Computes information gain for an attribute.
*
* @param data the data for which info gain is to be computed
* @param att the attribute
* @return the information gain for the given attribute and data
* @throws Exception if computation fails
*/
private double computeInfoGain(Instances data, Attribute att)
throws Exception {
double infoGain = computeEntropy(data);
Instances[] splitData = splitData(data, att);
for (int j = 0; j < att.numValues(); j++) {
if (splitData[j].numInstances() > 0) {
infoGain -= ((double) splitData[j].numInstances() /
(double) data.numInstances()) *
computeEntropy(splitData[j]);
}
}
return infoGain;
}
/**
* Computes the entropy of a dataset.
*
* @param data the data for which entropy is to be computed
* @return the entropy of the data's class distribution
* @throws Exception if computation fails
*/
private double computeEntropy(Instances data) throws Exception {
double [] classCounts = new double[data.numClasses()];
Enumeration instEnum = data.enumerateInstances();
while (instEnum.hasMoreElements()) {
Instance inst = (Instance) instEnum.nextElement();
classCounts[(int) inst.classValue()]++;
}
double entropy = 0;
for (int j = 0; j < data.numClasses(); j++) {
if (classCounts[j] > 0) {
entropy -= classCounts[j] * Utils.log2(classCounts[j]);
}
}
entropy /= (double) data.numInstances();
return entropy + Utils.log2(data.numInstances());
}
/**
* Splits a dataset according to the values of a nominal attribute.
*
* @param data the data which is to be split
* @param att the attribute to be used for splitting
* @return the sets of instances proced by the split
*/
private Instances[] splitData(Instances data, Attribute att) {
Instances[] splitData = new Instances[att.numValues()];
for (int j = 0; j < att.numValues(); j++) {
splitData[j] = new Instances(data, data.numInstances());
}
Enumeration instEnum = data.enumerateInstances();
while (instEnum.hasMoreElements()) {
Instance inst = (Instance) instEnum.nextElement();
splitData[(int) inst.value(att)].add(inst);
}
for (int i = 0; i < splitData.length; i++) {
splitData[i].compactify();
}
return splitData;
}
/**
* Outputs a tree at a certain level.
*
* @param level the level at which the tree is to be printed
* @return the tree as string at the given level
*/
private String toString(int level) {
StringBuffer text = new StringBuffer();
if (m_Attribute == null) {
if (Instance.isMissingValue(m_ClassValue)) {
text.append(": null");
} else {
text.append(": " + m_ClassAttribute.value((int) m_ClassValue));
}
} else {
for (int j = 0; j < m_Attribute.numValues(); j++) {
text.append("
");
for (int i = 0; i < level; i++) {
text.append("|  ");
}
text.append(m_Attribute.name() + " = " + m_Attribute.value(j));
text.append(m_Successors[j].toString(level + 1));
}
}
return text.toString();
}
/**
* Adds this tree recursively to the buffer.
*
* @param id          the unqiue id for the method
* @param buffer      the buffer to add the source code to
* @return            the last ID being used
* @throws Exception  if something goes wrong
*/
protected int toSource(int id, StringBuffer buffer) throws Exception {
int                 result;
int                 i;
int                 newID;
StringBuffer[]      subBuffers;
buffer.append("
");
buffer.append("  protected static double node" + id + "(Object[] i) {
");
// leaf?
if (m_Attribute == null) {
result = id;
if (Double.isNaN(m_ClassValue)) {
buffer.append("    return Double.NaN;");
} else {
buffer.append("    return " + m_ClassValue + ";");
}
if (m_ClassAttribute != null) {
buffer.append(" // " + m_ClassAttribute.value((int) m_ClassValue));
}
buffer.append("
");
buffer.append("  }
");
} else {
buffer.append("    checkMissing(i, " + m_Attribute.index() + ");

");
buffer.append("    // " + m_Attribute.name() + "
");
// subtree calls
subBuffers = new StringBuffer[m_Attribute.numValues()];
newID = id;
for (i = 0; i < m_Attribute.numValues(); i++) {
newID++;
buffer.append("    ");
if (i > 0) {
buffer.append("else ");
}
buffer.append("if (((String) i[" + m_Attribute.index()
+ "]).equals("" + m_Attribute.value(i) + ""))
");
buffer.append("      return node" + newID + "(i);
");
subBuffers[i] = new StringBuffer();
newID = m_Successors[i].toSource(newID, subBuffers[i]);
}
buffer.append("    else
");
buffer.append("      throw new IllegalArgumentException("Value '" + i["
+ m_Attribute.index() + "] + "' is not allowed!");
");
buffer.append("  }
");
// output subtree code
for (i = 0; i < m_Attribute.numValues(); i++) {
buffer.append(subBuffers[i].toString());
}
subBuffers = null;
result = newID;
}
return result;
}
/**
* Returns a string that describes the classifier as source. The
* classifier will be contained in a class with the given name (there may
* be auxiliary classes),
* and will contain a method with the signature:
* <pre><code>
* public static double classify(Object[] i);
* </code></pre>
* where the array <code>i</code> contains elements that are either
* Double, String, with missing values represented as null. The generated
* code is public domain and comes with no warranty. <br/>
* Note: works only if class attribute is the last attribute in the dataset.
*
* @param className the name that should be given to the source class.
* @return the object source described by a string
* @throws Exception if the source can't be computed
*/
public String toSource(String className) throws Exception {
StringBuffer        result;
int                 id;
result = new StringBuffer();
result.append("class " + className + " {
");
result.append("  private static void checkMissing(Object[] i, int index) {
");
result.append("    if (i[index] == null)
");
result.append("      throw new IllegalArgumentException("Null values "
+ "are not allowed!");
");
result.append("  }

");
result.append("  public static double classify(Object[] i) {
");
id = 0;
result.append("    return node" + id + "(i);
");
result.append("  }
");
toSource(id, result);
result.append("}
");
return result.toString();
}
/**
* Returns the revision string.
*
* @return        the revision
*/
public String getRevision() {
return RevisionUtils.extract("$Revision: 6404 $");
}
/**
* Main method.
*
* @param args the options for the classifier
*/
public static void main(String[] args) {
runClassifier(new Id3(), args);
}
}

導航:首頁 > 編程語言 > java調用weka分類演算法

java調用weka分類演算法

與java調用weka分類演算法相關的資料

友情鏈接