java调用weka分类算法_数据挖掘WEKA工具怎样来用来进行文本分类有800多个测试文本求大神给出具体的步骤和通俗易懂的

Ⅰ 通过weka建立决策树怎么提取分类规则

你的意思是从训练好的决策树模型中自动提取出分类规则吗？weka好像没有可以直接从树结构中提取规则的功能吧。

不过物世如乱中果模型不是太复杂的话手工统计每个从根节点到叶子节点的遍历也很哗蚂山方便啊，每个遍历上的内部节点加上树枝就是if条件，叶子节点就是then的判断结果。如果模型比较复杂的话可以考虑做个简单的二次开发。

假设你用的是J48,用weka explorer把训练好的决策树另存下来（或者直接在代码里用输出流写入文件），再用输入流把决策树读入为一个sourcable对象，调用对象的tosource方法把决策树代码化，接下来就是文本处理的问题了，通过分析代码结构得到相应的分类规则。

大概是这样的读入过程：

FileInputStream j48 = new FileInputStream("j48.model");
ObjectInputStream j48object = new ObjectInputStream(j48);

Sourcable j48code = (Sourcable) j48object.readObject();
System.out.println(j48code.toSource("J48 Tree"));

用上面几行举个例子，希望对你有启发^^

Ⅱ 怎样通过java代码实现分词后的text文本生成weka可以处理的arff文件

先把文件读取出来后放到Instances中，在吧Instances保存成arff文件。保存arff文件代码如下：

ArffSaversaver=newArffSaver();
saver.setInstances(data);
saver.setFile(newFile("c:	est.arff"));
saver.writeBatch();

Ⅲ 请教一下WEKA中ID3和C45算法如何运行

您所谓的“具体实现”就是用这两种算法来训练分类模型，完成分类任务吧......用weka explorer选择您手头要用来做实验的训练集、测试集以及相应的分类算法，直接跑一遍...或者在自己的代码里用weka的api写几行以完成同样的工作...很简单的哈，网上很多这样的例子，您随便找一个看两眼就明白了。

Ⅳ 数据挖掘WEKA工具怎样来用来进行文本分类有800多个测试文本，求大神给出具体的步骤和通俗易懂的

第一步，你要有中文的数据集；
第二步，数据集要准备成weka能处理的结构，这很好做到，回你把数据集压缩了就答行了，因为它要求的格式是，一个类别的文件放一个文件夹下。但是还有一个问题，你的机器往往没那么多内存去处理这个数据集，那么你可以选几个类别出来，在每个类别中放几十个文档来做就可以了。
第三步，分词。
第四步，使用weka wiki中的例子将数据集转换成arff格式。

weka是一种机器学习算法的集合，它可以用于分类，预测等。由于weka支持的数据格式是arff或csv的格式，因此在进行weka实验的时候必须进行数据的预处理。一般，我们可以在EXCEL里面导入TXT，然后另存为.CSV格式的文件（这个格式WEKA也是可以识别的），然后打开WEKA，–》TOOL–》 arffviewer中打开刚才的.CSV文件，另存为.arff就OK了！

Ⅳ 如何用weka将多种分类算法集成起来

需兄岩要将文件转换成标称（nominal)类型，weka把exel中的数字看作是数据类型，不能处理，从而导致Apriori算法没法用。
WEKA的全名是怀卡托智能分析环境(Waikato Environment for Knowledge Analysis)，同时weka也是新西兰的一种鸟名，而WEKA的主要开发者来自新西兰。wekaWEKA作为一个公开的数据挖掘工作，集合了大量能承担数据挖掘任务的机搭尘汪器学习算法，包括对数据进行预处理，分类，回归、聚类、关联规则以及在新的交互式界面上的可视化。
如果想自己实现数据挖掘算法的话，可以参考weka的接口文档。在weka中集成自己的算法甚至借鉴它的方法自己实现可视化工具并不是件很困难的事情。
2005年8月，在第11届ACM SIGKDD国际会议上，怀卡托大学的Weka小组荣获了数据挖掘和知识探索领域的最高服务奖，Weka系统得到了广泛的认可，被誉为数据挖掘和机器学习历史上的里程碑，是现今最完备的数据挖掘工具之一(已有11年的发展历史)。Weka的每月次知仔数已超过万次。

Ⅵ 是用python学数据挖掘好，还是用java学weka的开发好

主要是方便，python的第三方模块很丰富，而且语法非常简练，自由度很高，python的numpy、scipy、matplotlib模块可以完成所有的spss的功能，而且可以根据自己的需要按照定制的方法对数据进行清洗、归约，需要的情况下还可以跟sql进行连接，做机器学习，很多时候数据是从互联网上用网络爬虫收集的，python有urllib模块，可以很简单的完成这个工作，有些时候爬虫收集数据还要对付某些网站的验证码，python有PIL模块，可以方便的进行识别，如果需要做神经网络、遗传算法，scipy也可以完成这个工作，还有决策树就用if-then这样的代码，做聚类不能局限于某几种聚类，可能要根据实际情况进行调整，k-means聚类、DBSCAN聚类，有时候可能还要综合两种聚类方法对大规模数据进行聚类分析，这些都需要自行编码来完成，此外，基于距离的分类方法，有很多距离表达方式可以选用，比如欧几里得距离、余弦距离、闵可夫斯基距离、城市块距离，虽然并不复杂，但是用python编程实现很方便，基于内容的分类方法，python有强大的nltk自然语言处理模块，对语言词组进行切分、收集、分类、统计等。
综上，就是非常非常方便，只要你对python足够了解，你发现你可以仅仅使用这一个工具快速实现你的所有想法

Ⅶ 从java中调用weka中的分类函数的问题

Instances instances = getArffData("arff文件路径");

instances.setClassIndex(instances.numAttributes() - 1); 是设置class索引，纠正Class index not set!专
instances.setClass(attribute); 是设置class属性，纠正Class attributenot set的！属

Ⅷ 用weka贝叶斯公式决策分类的步骤是什么求赐教。。。。最好给出详细的步骤，谢谢！

你可以用程序debug跟踪一下，以前是weka中NaiveBayesSimple类的主要函数和作用。
(1) globalInfo()
返回该分类器的描述字符串．
(2) getTechnicalInformation()
返回一个TechnicalInformation类型的对象实例，包含该类的技术背景等信息．
(3) getCapabilities()
返回默认参数．
(4) BuildClassifier(Instances instances)
BuildClassifier()方法从一个训练数据集合instances构造一个分类器．求出所有名称型属性的后验概率，类属性的先验概率，数值属性的均值和方差，为后来的分类工作做准备．
(5) distributionForInstance (Instance instance)
该方法计算待分类实例instance属于各个类标的百分比，并且将各个百分比数值存于一个数组中，最后返回该数组．
(6)toString()
把分类器的参数（均值，方差，各先验概率，各后验概率）以字符串的形式返回．
(7)normalDens(double x, double mean, double stdDev)
该方法用于根据正态分布（均值为mean，方差为stdDev）计算数值型属性当属性值为x时的概率密度．
(8) getRevision()
返回程序的版本号．
(9) Main()
当类从命令行被执行时，就会调用main()方法．他只是用所给的命令行选项告诉Weka的Evaluation类来评估朴素贝叶斯，并且打印所得到的数组．完成这个功能的一行表达式包括在try-catch声明中．try-catch声明用于发现Weka例程或其他Java方法中抛出的各种异常．

Ⅸ 请问如何向weka中添加新算法

1.编写新算法，所蚂伏州编写的新算法必须符合Weka的接口标准。在此以从Weka中文站（貌似已经打不开了，做实验可以简单复制一个clusterers目录下已有的算法改个名就好了）上下载的一个算法（模糊C均值聚类算法：FuzzyCMeans）的添加为例说明其具体过程。
2.由于FuzzyCMeans是聚类算法，所以直接将FuzzyCMeans.java 源程序考到 weka.clusterers 包下。
3.再修改weka.gui.GenericObjectEditor.props ，在#Lists the Clusterers I want to
choose from的weka.clusterers.Clusterer=\下加入：weka.clusterers.FuzzyCMeans。
4.相应的修改weka.gui.GenericPropertiesCreator.props
，此去不用修改，因为包weka.clusterers已经存在，若加入新的包时则必须修改这里，加入新的包。
加入之后，重新编译，运行后，可以在weka的Explorer界面上的Cluster选项卡中的聚类算法中找到刚刚新添加的FuzzyCMeans算法。
添加过程的关闷蔽键问题是要弄清楚Weka的内核以及其接口标准，然后编写出符合此规范的新算法。
算法规范和weka源代码正在分析学习中厅模。

Ⅹ 求助 weka 的ID3算法java源码

/*
*    This program is free software; you can redistribute it and/or modify
*    it under the terms of the GNU General Public License as published by
*    the Free Software Foundation; either version 2 of the License, or
*    (at your option) any later version.
*
*    This program is distributed in the hope that it will be useful,
*    but WITHOUT ANY WARRANTY; without even the implied warranty of
*    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
*    GNU General Public License for more details.
*
*    You should have received a  of the GNU General Public License
*    along with this program; if not, write to the Free Software
*    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*/
/*
*    Id3.java
*    Copyright (C) 1999 University of Waikato, Hamilton, New Zealand
*
*/
package weka.classifiers.trees;
import weka.classifiers.Classifier;
import weka.classifiers.Sourcable;
import weka.core.Attribute;
import weka.core.Capabilities;
import weka.core.Instance;
import weka.core.Instances;
import weka.core.;
import weka.core.RevisionUtils;
import weka.core.TechnicalInformation;
import weka.core.TechnicalInformationHandler;
import weka.core.Utils;
import weka.core.Capabilities.Capability;
import weka.core.TechnicalInformation.Field;
import weka.core.TechnicalInformation.Type;
import java.util.Enumeration;
/**
<!-- globalinfo-start -->
* Class for constructing an unpruned decision tree based on the ID3 algorithm. Can only deal with nominal attributes. No missing values allowed. Empty leaves may result in unclassified instances. For more information see: <br/>
* <br/>
* R. Quinlan (1986). Inction of decision trees. Machine Learning. 1(1):81-106.
* <p/>
<!-- globalinfo-end -->
*
<!-- technical-bibtex-start -->
* BibTeX:
* <pre>
* &#64;article{Quinlan1986,
*    author = {R. Quinlan},
*    journal = {Machine Learning},
*    number = {1},
*    pages = {81-106},
*    title = {Inction of decision trees},
*    volume = {1},
*    year = {1986}
* }
* </pre>
* <p/>
<!-- technical-bibtex-end -->
*
<!-- options-start -->
* Valid options are: <p/>
*
* <pre> -D
*  If set, classifier is run in debug mode and
*  may output additional info to the console</pre>
*
<!-- options-end -->
*
* @author Eibe Frank ([email protected])
* @version $Revision: 6404 $
*/
public class Id3
extends Classifier
implements TechnicalInformationHandler, Sourcable {
/** for serialization */
static final long serialVersionUID = -2693678647096322561L;
/** The node's successors. */
private Id3[] m_Successors;
/** Attribute used for splitting. */
private Attribute m_Attribute;
/** Class value if node is leaf. */
private double m_ClassValue;
/** Class distribution if node is leaf. */
private double[] m_Distribution;
/** Class attribute of dataset. */
private Attribute m_ClassAttribute;
/**
* Returns a string describing the classifier.
* @return a description suitable for the GUI.
*/
public String globalInfo() {
return  "Class for constructing an unpruned decision tree based on the ID3 "
+ "algorithm. Can only deal with nominal attributes. No missing values "
+ "allowed. Empty leaves may result in unclassified instances. For more "
+ "information see: 

"
+ getTechnicalInformation().toString();
}
/**
* Returns an instance of a TechnicalInformation object, containing
* detailed information about the technical background of this class,
* e.g., paper reference or book this class is based on.
*
* @return the technical information about this class
*/
public TechnicalInformation getTechnicalInformation() {
TechnicalInformation    result;
result = new TechnicalInformation(Type.ARTICLE);
result.setValue(Field.AUTHOR, "R. Quinlan");
result.setValue(Field.YEAR, "1986");
result.setValue(Field.TITLE, "Inction of decision trees");
result.setValue(Field.JOURNAL, "Machine Learning");
result.setValue(Field.VOLUME, "1");
result.setValue(Field.NUMBER, "1");
result.setValue(Field.PAGES, "81-106");
return result;
}
/**
* Returns default capabilities of the classifier.
*
* @return      the capabilities of this classifier
*/
public Capabilities getCapabilities() {
Capabilities result = super.getCapabilities();
result.disableAll();
// attributes
result.enable(Capability.NOMINAL_ATTRIBUTES);
// class
result.enable(Capability.NOMINAL_CLASS);
result.enable(Capability.MISSING_CLASS_VALUES);
// instances
result.setMinimumNumberInstances(0);
return result;
}
/**
* Builds Id3 decision tree classifier.
*
* @param data the training data
* @exception Exception if classifier can't be built successfully
*/
public void buildClassifier(Instances data) throws Exception {
// can classifier handle the data?
getCapabilities().testWithFail(data);
// remove instances with missing class
data = new Instances(data);
data.deleteWithMissingClass();
makeTree(data);
}
/**
* Method for building an Id3 tree.
*
* @param data the training data
* @exception Exception if decision tree can't be built successfully
*/
private void makeTree(Instances data) throws Exception {
// Check if no instances have reached this node.
if (data.numInstances() == 0) {
m_Attribute = null;
m_ClassValue = Instance.missingValue();
m_Distribution = new double[data.numClasses()];
return;
}
// Compute attribute with maximum information gain.
double[] infoGains = new double[data.numAttributes()];
Enumeration attEnum = data.enumerateAttributes();
while (attEnum.hasMoreElements()) {
Attribute att = (Attribute) attEnum.nextElement();
infoGains[att.index()] = computeInfoGain(data, att);
}
m_Attribute = data.attribute(Utils.maxIndex(infoGains));
// Make leaf if information gain is zero.
// Otherwise create successors.
if (Utils.eq(infoGains[m_Attribute.index()], 0)) {
m_Attribute = null;
m_Distribution = new double[data.numClasses()];
Enumeration instEnum = data.enumerateInstances();
while (instEnum.hasMoreElements()) {
Instance inst = (Instance) instEnum.nextElement();
m_Distribution[(int) inst.classValue()]++;
}
Utils.normalize(m_Distribution);
m_ClassValue = Utils.maxIndex(m_Distribution);
m_ClassAttribute = data.classAttribute();
} else {
Instances[] splitData = splitData(data, m_Attribute);
m_Successors = new Id3[m_Attribute.numValues()];
for (int j = 0; j < m_Attribute.numValues(); j++) {
m_Successors[j] = new Id3();
m_Successors[j].makeTree(splitData[j]);
}
}
}
/**
* Classifies a given test instance using the decision tree.
*
* @param instance the instance to be classified
* @return the classification
* @throws  if instance has missing values
*/
public double classifyInstance(Instance instance)
throws  {
if (instance.hasMissingValue()) {
throw new ("Id3: no missing values, "
+ "please.");
}
if (m_Attribute == null) {
return m_ClassValue;
} else {
return m_Successors[(int) instance.value(m_Attribute)].
classifyInstance(instance);
}
}
/**
* Computes class distribution for instance using decision tree.
*
* @param instance the instance for which distribution is to be computed
* @return the class distribution for the given instance
* @throws  if instance has missing values
*/
public double[] distributionForInstance(Instance instance)
throws  {
if (instance.hasMissingValue()) {
throw new ("Id3: no missing values, "
+ "please.");
}
if (m_Attribute == null) {
return m_Distribution;
} else {
return m_Successors[(int) instance.value(m_Attribute)].
distributionForInstance(instance);
}
}
/**
* Prints the decision tree using the private toString method from below.
*
* @return a textual description of the classifier
*/
public String toString() {
if ((m_Distribution == null) && (m_Successors == null)) {
return "Id3: No model built yet.";
}
return "Id3

" + toString(0);
}
/**
* Computes information gain for an attribute.
*
* @param data the data for which info gain is to be computed
* @param att the attribute
* @return the information gain for the given attribute and data
* @throws Exception if computation fails
*/
private double computeInfoGain(Instances data, Attribute att)
throws Exception {
double infoGain = computeEntropy(data);
Instances[] splitData = splitData(data, att);
for (int j = 0; j < att.numValues(); j++) {
if (splitData[j].numInstances() > 0) {
infoGain -= ((double) splitData[j].numInstances() /
(double) data.numInstances()) *
computeEntropy(splitData[j]);
}
}
return infoGain;
}
/**
* Computes the entropy of a dataset.
*
* @param data the data for which entropy is to be computed
* @return the entropy of the data's class distribution
* @throws Exception if computation fails
*/
private double computeEntropy(Instances data) throws Exception {
double [] classCounts = new double[data.numClasses()];
Enumeration instEnum = data.enumerateInstances();
while (instEnum.hasMoreElements()) {
Instance inst = (Instance) instEnum.nextElement();
classCounts[(int) inst.classValue()]++;
}
double entropy = 0;
for (int j = 0; j < data.numClasses(); j++) {
if (classCounts[j] > 0) {
entropy -= classCounts[j] * Utils.log2(classCounts[j]);
}
}
entropy /= (double) data.numInstances();
return entropy + Utils.log2(data.numInstances());
}
/**
* Splits a dataset according to the values of a nominal attribute.
*
* @param data the data which is to be split
* @param att the attribute to be used for splitting
* @return the sets of instances proced by the split
*/
private Instances[] splitData(Instances data, Attribute att) {
Instances[] splitData = new Instances[att.numValues()];
for (int j = 0; j < att.numValues(); j++) {
splitData[j] = new Instances(data, data.numInstances());
}
Enumeration instEnum = data.enumerateInstances();
while (instEnum.hasMoreElements()) {
Instance inst = (Instance) instEnum.nextElement();
splitData[(int) inst.value(att)].add(inst);
}
for (int i = 0; i < splitData.length; i++) {
splitData[i].compactify();
}
return splitData;
}
/**
* Outputs a tree at a certain level.
*
* @param level the level at which the tree is to be printed
* @return the tree as string at the given level
*/
private String toString(int level) {
StringBuffer text = new StringBuffer();
if (m_Attribute == null) {
if (Instance.isMissingValue(m_ClassValue)) {
text.append(": null");
} else {
text.append(": " + m_ClassAttribute.value((int) m_ClassValue));
}
} else {
for (int j = 0; j < m_Attribute.numValues(); j++) {
text.append("
");
for (int i = 0; i < level; i++) {
text.append("|  ");
}
text.append(m_Attribute.name() + " = " + m_Attribute.value(j));
text.append(m_Successors[j].toString(level + 1));
}
}
return text.toString();
}
/**
* Adds this tree recursively to the buffer.
*
* @param id          the unqiue id for the method
* @param buffer      the buffer to add the source code to
* @return            the last ID being used
* @throws Exception  if something goes wrong
*/
protected int toSource(int id, StringBuffer buffer) throws Exception {
int                 result;
int                 i;
int                 newID;
StringBuffer[]      subBuffers;
buffer.append("
");
buffer.append("  protected static double node" + id + "(Object[] i) {
");
// leaf?
if (m_Attribute == null) {
result = id;
if (Double.isNaN(m_ClassValue)) {
buffer.append("    return Double.NaN;");
} else {
buffer.append("    return " + m_ClassValue + ";");
}
if (m_ClassAttribute != null) {
buffer.append(" // " + m_ClassAttribute.value((int) m_ClassValue));
}
buffer.append("
");
buffer.append("  }
");
} else {
buffer.append("    checkMissing(i, " + m_Attribute.index() + ");

");
buffer.append("    // " + m_Attribute.name() + "
");
// subtree calls
subBuffers = new StringBuffer[m_Attribute.numValues()];
newID = id;
for (i = 0; i < m_Attribute.numValues(); i++) {
newID++;
buffer.append("    ");
if (i > 0) {
buffer.append("else ");
}
buffer.append("if (((String) i[" + m_Attribute.index()
+ "]).equals("" + m_Attribute.value(i) + ""))
");
buffer.append("      return node" + newID + "(i);
");
subBuffers[i] = new StringBuffer();
newID = m_Successors[i].toSource(newID, subBuffers[i]);
}
buffer.append("    else
");
buffer.append("      throw new IllegalArgumentException("Value '" + i["
+ m_Attribute.index() + "] + "' is not allowed!");
");
buffer.append("  }
");
// output subtree code
for (i = 0; i < m_Attribute.numValues(); i++) {
buffer.append(subBuffers[i].toString());
}
subBuffers = null;
result = newID;
}
return result;
}
/**
* Returns a string that describes the classifier as source. The
* classifier will be contained in a class with the given name (there may
* be auxiliary classes),
* and will contain a method with the signature:
* <pre><code>
* public static double classify(Object[] i);
* </code></pre>
* where the array <code>i</code> contains elements that are either
* Double, String, with missing values represented as null. The generated
* code is public domain and comes with no warranty. <br/>
* Note: works only if class attribute is the last attribute in the dataset.
*
* @param className the name that should be given to the source class.
* @return the object source described by a string
* @throws Exception if the source can't be computed
*/
public String toSource(String className) throws Exception {
StringBuffer        result;
int                 id;
result = new StringBuffer();
result.append("class " + className + " {
");
result.append("  private static void checkMissing(Object[] i, int index) {
");
result.append("    if (i[index] == null)
");
result.append("      throw new IllegalArgumentException("Null values "
+ "are not allowed!");
");
result.append("  }

");
result.append("  public static double classify(Object[] i) {
");
id = 0;
result.append("    return node" + id + "(i);
");
result.append("  }
");
toSource(id, result);
result.append("}
");
return result.toString();
}
/**
* Returns the revision string.
*
* @return        the revision
*/
public String getRevision() {
return RevisionUtils.extract("$Revision: 6404 $");
}
/**
* Main method.
*
* @param args the options for the classifier
*/
public static void main(String[] args) {
runClassifier(new Id3(), args);
}
}

导航:首页 > 编程语言 > java调用weka分类算法

java调用weka分类算法

与java调用weka分类算法相关的资料

友情链接