javaweb抓取网页_在java中怎么获取页面的路径

『壹』 java怎么获得web应用的网址

比如这个路径

http://localhost/servlet/DemoServlet？name=test
String scheme = request.getScheme();//获取请求协议-http
int serverPort = request.getServerPort();//获取服务端口号 -8080
String serverName = request.getServerName();//获取服务域名(主机名) -localhost
String requestURI = request.getRequestURI();//获取请求uri路径 -/servlet/DemoServlet
String servletPath = request.getServletPath();//获取servlet路径 -/DemoServlet
String contextPath = request.getContextPath();//获取上下文路径 -/servlet
String queryString = request.getQueryString();//获取uri请求参数 -/name=test
StringBuffer requestURL = request.getRequestURL();//获取url路径 -http://localhost/servlet/DemoServlet

『贰』 java开源web爬虫哪个好用

1.nutch
地址：apache/nutch · GitHub
apache下的开源爬虫程序，功能丰富，文档完整。有数据抓取解析以及存储的模块。

2.Heritrix
地址：internetarchive/heritrix3 · GitHub
很早就有了，经历过很多次更新，使用的人比较多，功能齐全，文档完整，网上的资料也多。有自己的web管理控制台，包含了一个HTTP 服务器。操作者可以通过选择Crawler命令来操作控制台。

3.crawler4j
地址：yasserg/crawler4j · GitHub
因为只拥有爬虫的核心功能，所以上手极为简单，几分钟就可以写一个多线程爬虫程序。

当然，上面说的nutch有的功能比如数据存储不代表Heritrix没有，反之亦然。具体使用哪个合适还需要仔细阅读文档并配合实验才能下结论啊~
还有比如JSpider，WebEater，Java Web Crawler，WebLech，Ex-Crawler，JoBo等等，这些没用过，不知道。。。

『叁』如何用javaweb实现，当点击读取按钮时，获取到输入框的所输入的网址，并将网页显示在读取按钮下面

不知道你是不是要这效果<!DOCTYPEhtmlPUBLIC"-//W3C//DTDXHTML1.0Transitional//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<htmlxmlns="http://www.w3.org/1999/xhtml">
<head>
<metahttp-equiv="Content-Type"content="text/html;charset=utf-8"/>
<title>无标题文档</title>
<script>
functionbtn(){
document.getElementById('myframe').src=document.getElementById('in').value;
document.getElementById('myframe').contentWindow.location.reload(true);
}
</script>
</head>

<bodystyle="text-align:center">
<divstyle="border:1pxsolidred;width:200px;height:200px;margin:0auto">
<p>URL:<inputid="in"/></p>
<buttonid="btn"style="width:75px"onclick="btn()">读取</button>
</div>
<div>
<p>
<iframeid="myframe"width="800"height="500"src="http://www..com"></iframe>
</p>
</div>
</body>
</html>

『肆』在java中怎么获取页面的路径

第一种：
File f = new File(this.getClass().getResource("/").getPath());
System.out.println(f);
结果:
C:Documents%20and%
获取当前类的所在工程路径;
如果不加“/”
File f = new File(this.getClass().getResource("").getPath());
System.out.println(f);
结果：
C:Documents%20and%comtest
获取当前类的绝对路径；
第二种：
File directory = new File("");//参数为空
String courseFile = directory.getCanonicalPath() ;
System.out.println(courseFile);
结果：
C:Documents and
获取当前类的所在工程路径;
第三种：
URL xmlpath = this.getClass().getClassLoader().getResource("selected.txt");
System.out.println(xmlpath);
结果：
file:/C:/Documents%20and%
20Settings/Administrator/workspace/projectName/bin/selected.txt
获取当前工程src目录下selected.txt文件的路径
第四种：
System.out.println(System.getProperty("user.dir"));
结果：
C:Documents and
获取当前工程路径
第五种：
System.out.println( System.getProperty("java.class.path"));
结果：
C:Documents and bin

热点内容

网络中常用的传输介质发布：2025-10-20 08:42:23 浏览：518

文件如何使用发布：2025-10-20 08:33:27 浏览：322

同步推密码找回发布：2025-10-20 08:04:22 浏览：865

乐高怎么才能用电脑编程序发布：2025-10-20 07:57:56 浏览：65

本机qq文件为什么找不到发布：2025-10-20 07:39:47 浏览：264

安卓qq空间免升级发布：2025-10-20 07:36:50 浏览：490

linux如何删除模块驱动程序发布：2025-10-20 07:36:06 浏览：193

at89c51c程序发布：2025-10-20 07:35:06 浏览：329

怎么创建word大纲文件发布：2025-10-20 07:24:54 浏览：622

袅袅朗诵文件生成器发布：2025-10-20 07:00:55 浏览：626

1054件文件是多少gb 发布：2025-10-20 06:03:27 浏览：371

高州禁养区内能养猪多少头的文件发布：2025-10-20 05:51:26 浏览：927

win8ico文件发布：2025-10-20 05:47:08 浏览：949

仁和数控怎么编程发布：2025-10-20 05:24:49 浏览：381

项目文件夹图片发布：2025-10-20 04:42:54 浏览：87

怎么在东芝电视安装app 发布：2025-10-20 04:42:54 浏览：954

plc显示数字怎么编程发布：2025-10-20 04:42:54 浏览：439

如何辨别假网站发布：2025-10-20 04:26:28 浏览：711

宽带用别人的账号密码发布：2025-10-20 04:08:00 浏览：556

新app如何占有市场发布：2025-10-20 03:39:57 浏览：42

导航:首页 > 编程语言 > javaweb抓取网页

javaweb抓取网页

与javaweb抓取网页相关的资料

友情链接