java获取网页信息_如何用Java语言获得网页数据

1. 如何在java 代码中获取页面内容

import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.net.HttpURLConnection;
import java.net.URL;public class Test
{
public static void main(String[] args) throws Exception
{
PrintWriter pw = new PrintWriter("d:\\test.xml");//d:\\test.xml是你的xml文件路径
pw.println(getHtmlConentByUrl(" http://www..com"));// http://www..com是你蔽磨要访问的页面
pw.flush();
pw.close();
}

public static String getHtmlConentByUrl(
String ssourl) {
try {
URL url = new URL(ssourl);
HttpURLConnection con = (HttpURLConnection) url.openConnection();

con.setInstanceFollowRedirects(false);
con.setUseCaches(false);
con.setAllowUserInteraction(false);
con.connect(); StringBuffer sb = new StringBuffer();
String line = "";
BufferedReader URLinput = new BufferedReader(new InputStreamReader(con.getInputStream()));
while ((line = URLinput.readLine()) != null) {
sb.append(line);
}
con.disconnect();

return sb.toString().toLowerCase();
} catch (Exception e) {

return null;
}
}}
在获取到的页面携并卜内容是字符串，这里解析有两个办法，一是通过dom4j把字符串转化为dom进行解析，这辩穗样最好，但是对方的页面未必规范，符合dom结构。二是通过解析字符串过滤你想要的内容，该方法比较繁琐，需要一些技巧。我有的就是二；

2. java读取网站内容的两种方法

HttpClient

利用apache的虚拟客户端包获取某个地址的内容 import java io UnsupportedEncodingException;

import java util HashSet;

import java util Iterator;

import java util Set;

import java util regex Matcher;

import java util regex Pattern;

import mons ;

public class catchMain {

/** *//**

* @param args

隐握 public static void main(String[] args) {

String url = ;

String keyword= 食雀携宽 ;

String response=createClient(url keyword);

}

public static String createClient(String url String param){

HttpClient client = new HttpClient();

String response=null;

String keyword=null;

PostMethod postMethod = new PostMethod(url);

try {

if(param!=null)

keyword = new String(param getBytes( gb ) ISO );

} catch (UnsupportedEncodingException e ) {

// TODO Auto generated catch block

e printStackTrace();

}

NameValuePair[] data = { new NameValuePair( keyword keyword) };

// 将表单的值放入postMethod中

postMethod setRequestBody(data);

try {

int statusCode = client executeMethod(postMethod);

顷亮 response = new String(postMethod getResponseBodyAsString()

getBytes( ISO ) GBK );

} catch (Exception e) {

e printStackTrace();

}

return response;

}

java自带的HttpURLConnection

public static String getPageContent(String strUrl String strPostRequest

int maxLength) {

//读取结果网页

StringBuffer buffer = new StringBuffer();

System setProperty( client defaultConnectTimeout );

System setProperty( client defaultReadTimeout );

try {

URL newUrl = new URL(strUrl);

HttpURLConnection hConnect = (HttpURLConnection) newUrl

openConnection();

//POST方式的额外数据

if (strPostRequest length() > ) {

hConnect setDoOutput(true);

OutputStreamWriter out = new OutputStreamWriter(hConnect

getOutputStream());

out write(strPostRequest);

out flush();

out close();

}

//读取内容

BufferedReader rd = new BufferedReader(new InputStreamReader(

hConnect getInputStream()));

int ch;

for (int length = ; (ch = rd read()) >

&& (maxLength <= || length < maxLength); length++)

buffer append((char) ch);

rd close();

hConnect disconnect();

return buffer toString() trim();

} catch (Exception e) {

// return 错误:读取网页失败！ ;

return null;

}

lishixin/Article/program/Java/hx/201311/26339

3. 如何用Java语言获得网页数据

你这个是不是A系统想了解B系统的页面信息？

如果是这样存在一个问题比较难解决，版就是数据源问题，权你A系统并不知道B系统的数据。

如果要获取招聘信息的公司名称，有几个思路
1、A系统做一个iframe，这个iframe里面嵌入你要访问的URL，然后你通过js，获取这个iframe标签里面的所有内容，这样数据源就解决了
2、你可以做一个浏览器插件，这个插件的功能就是获取当前访问页面的所有字符数据，获取到数据后将数据发送到A系统
3、对网页进行截图，然后通过OCR软件获取图片中的文字，并将文字保存成文本，A系统读取这个文本信息，数据源问题也可以解决

以上上个只是我临时想到的，可能还可以运用JAVA的全文检索框架试试看，因为没有用过这个框架所以不知道是否能够实现

获取到数据源后，接下来就是业务处理了，业务处理就看具体业务进行处理就行了，技术方面就是一个文字处理的功能，技术好实现，业务比较复杂

4. java网页获取

StringBuffer用之前拍裤要初始化，eg：袭缺简StringBuffer sb = new StringBuffer();
StringBuffer document=new StringBuffer();
String line; /扮逗/ 读入网页信息

while ((line = reader.readLine()) != null){
document.append(line+"\n");
}
String title = document.toString();
title = title.substring(title.indexOf("<title>") + 7,
title.indexOf("</title>"));
System.out.println(title);

5. java语言获取网页标签中的内容

新浪的那个天气的值是通过js动态加载的，原始html页面是<div id="SI_Weather_Wrap" class="now-wea-wrap clearfix"></div> 。
而jsoup只是对html进行解析，所回以是找不到js动态生成的哪答些信息的。

6. 如何通过Java代码实现对网页数据进行指定抓取

通过Java代码实现对网页数据进行指定抓取方法步骤如下：
1在工程中导入Jsoup.jar包

2获取网址url指定HTML或者文档指定的body

3获取网页中超链接的标题和链接

4获取指定博客文章的内容

5获取网页中超链接的标题和链接的结果

7. java获取html

Java访问网络url，获取网页的html代码
方式一：
一是使用URL类的openStream()方法：
openStream()方法与制定的URL建立连接并返回InputStream类的对象，以从这一连接中读取数据；
openStream()方法只能读取网络资源。
二是使用URL类的openConnection()方法：
openConnection()方法会创建一个URLConnection类的对象，此对象在本地机和URL指定的远程节点建立一条HTTP协议的数据通道，可进行双向数据传输。类URLConnection提供了很多设置和获取连接参数的方法，最常用到的是getInputStream()和getOutputStream()方法。
openConnection()方法既能读取又能发送数据。
列如：
public static void main(String args[]) throws Exception {
try {
//输入url路径
URL url = new URL("url路径"); InputStream in =url.openStream(); InputStreamReader isr = new InputStreamReader(in); BufferedReader bufr = new BufferedReader(isr); String str; while ((str = bufr.readLine()) != null) { System.out.println(str); } bufr.close(); isr.close(); in.close(); } catch (Exception e) { e.printStackTrace(); } }

8. java如何获取网页中的文字

如果要获取表单的内容，
<from>
<input type="text" name= "username" value=""/>
</from>
request.getparameter("username");

如果是获取网页内容,估计是要获版取url,从头到尾爬了权

导航:首页 > 编程语言 > java获取网页信息

java获取网页信息

与java获取网页信息相关的资料

友情链接