1. 求一篇与大数据或者大数据信息安全专业相关的原版英文文献及其翻译,3000字左右。好人,拜托!
Big data refers to the huge volume of data that cannot
be stored and processed with in a time frame in
traditional file system.
The next question comes in mind is how big this data
needs to be in order to classify as a big data. There is a
lot of misconception in referring a term big data. We
usually refer a data to be big if its size is in gigabyte,
terabyte, Petabyte or Exabyte or anything larger than
this size. This does not define a big data completely.
Even a small amount of file can be referred to as a big
data depending upon the content is being used.
Let’s just take an example to make it clear. If we attach
a 100 MB file to an email, we cannot be able to do so.
As a email does not support an attachment of this size.
Therefore with respect to an email, this 100mb file
can be referred to as a big data. Similarly if we want to
process 1 TB of data in a given time frame, we cannot
do this with a traditional system since the resource
with it is not sufficient to accomplish this task.
As you are aware of various social sites such as
Facebook, twitter, Google+, LinkedIn or YouTube
contains data in huge amount. But as the users are
growing on these social sites, the storing and processing
the enormous data is becoming a challenging task.
Storing this data is important for various firms to
generate huge revenue which is not possible with a
traditional file system. Here is what Hadoop comes in
the existence.
Big Data simply means that huge amount
of structured, unstructured and semi-structured
data that has the ability to be processed for information. Now a days massive amount of data
proced because of growth in technology,
digitalization and by a variety of sources, including
business application transactions, videos, picture ,
electronic mails, social media, and so on. So to process
these data the big data concept is introced.
Structured data: a data that does have a proper format
associated to it known as structured data. For example
the data stored in database files or data stored in excel
sheets.
Semi-Structured Data: A data that does not have a
proper format associated to it known as structured data.
For example the data stored in mail files or in docx.
files.
Unstructured data: a data that does not have any format
associated to it known as structured data. For example
an image files, audio files and video files.
Big data is categorized into 3 v’s associated with it that
are as follows:[1]
Volume: It is the amount of data to be generated i.e.
in a huge quantity.
Velocity: It is the speed at which the data getting
generated.
Variety: It refers to the different kind data which is
generated.
A. Challenges Faced by Big Data
There are two main challenges faced by big data [2]
i. How to store and manage huge volume of data
efficiently.
ii. How do we process and extract valuable
information from huge volume data within a given
time frame.
These main challenges lead to the development of
hadoop framework.
Hadoop is an open source framework developed by
ck cutting in 2006 and managed by the apache
software foundation. Hadoop was named after yellow
toy elephant.
Hadoop was designed to store and process data
efficiently. Hadoop framework comprises of two main
components that are:
i. HDFS: It stands for Hadoop distributed file
system which takes care of storage of data within
hadoop cluster.
ii. MAPREDUCE: it takes care of a processing of a
data that is present in the HDFS.
Now let’s just have a look on Hadoop cluster:
Here in this there are two nodes that are Master Node
and slave node.
Master node is responsible for Name node and Job
Tracker demon. Here node is technical term used to
denote machine present in the cluster and demon is
the technical term used to show the background
processes running on a linux machine.
The slave node on the other hand is responsible for
running the data node and the task tracker demons.
The name node and data node are responsible for
storing and managing the data and commonly referred
to as storage node. Whereas the job tracker and task
tracker is responsible for processing and computing a
data and commonly known as Compute node.
Normally the name node and job tracker runs on a
single machine whereas a data node and task tracker
runs on different machines.
B. Features Of Hadoop:[3]
i. Cost effective system: It does not require any
special hardware. It simply can be implemented
in a common machine technically known as
commodity hardware.
ii. Large cluster of nodes: A hadoop system can
support a large number of nodes which provides
a huge storage and processing system.
iii. Parallel processing: a hadoop cluster provide the
accessibility to access and manage data parallel
which saves a lot of time.
iv. Distributed data: it takes care of splinting and
distributing of data across all nodes within a cluster
.it also replicates the data over the entire cluster.
v. Automatic failover management: once and AFM
is configured on a cluster, the admin needs not to
worry about the failed machine. Hadoop replicates
the configuration Here one of each data iscopied or replicated to the node in the same rack
and the hadoop take care of the internetworking
between two racks.
vi. Data locality optimization: This is the most
powerful thing of hadoop which make it the most
efficient feature. Here if a person requests for a
huge data which relies in some other place, the
machine will sends the code of that data and then
other person compiles it and use it in particular
as it saves a log to bandwidth
vii. Heterogeneous cluster: node or machine can be
of different vendor and can be working on
different flavor of operating systems.
viii. Scalability: in hadoop adding a machine or
removing a machine does not effect on a cluster.
Even the adding or removing the component of
machine does not.
C. Hadoop Architecture
Hadoop comprises of two components
i. HDFS
ii. MAPREDUCE
Hadoop distributes big data in several chunks and store
data in several nodes within a cluster which
significantly reces the time.
Hadoop replicates each part of data into each machine
that are present within the cluster.
The no. of copies replicated depends on the replication
factor. By default the replication factor is 3. Therefore
in this case there are 3 copies to each data on 3 different
machines。
reference:Mahajan, P., Gaba, G., & Chauhan, N. S. (2016). Big Data Security. IITM Journal of Management and IT, 7(1), 89-94.
自己拿去翻译网站翻吧,不懂可以问
2. linux 请问如何让它定时访问某个网络地址
linux下的定时访问可以使用corntab来实现
1、首先编辑corntab,添加如下命令
#每两个小时
0 */2 * * * sometask.sh
上面的这段代码是每两个小时 执行sometask.sh,这样我们可以在sometask.sh里
实现访问网络地址;
2、sometask.sh代码
#!/bin/sh
curl -d "user=test&password=123456" www.some123.com
经过以上两部就可以实现定时访问了。
3. ISO体系文件分级
以ISO9001为例,来说明这个问题:
一级文件就是质量手册;
二级文件就是程序文件,必须含有9001所要求的6个程序文件:文件控制、记录控制、内部审核、不合格品控制、纠正措施、预防措施,纠正、预防措施的可以写在一起,除这6个以外,你还可以根据公司实际情况增加其他一些。
三级文件就是支持性文件或管理制度之类了,比较多,包括作业指导书、检验规程、管理办法、管理制度等等。
若是ISO14001体系的话,一级文件就是环境手册了,ISO18001的话,一级就是职业健康安全管理手册了。等等。