[10490010] |
Linux
[10490020] |'''Linux''' (commonly pronounced {{IPAEng|ˈlɪnəks}} in English; variants exist) is a [[Unix-like]] computer [[operating system]]. [10490030] |Linux is one of the most prominent examples of [[free software]] and [[open source]] development: typically all underlying [[source code]] can be freely modified, used, and redistributed by anyone. [10490040] |The name "Linux" comes from the [[Linux kernel]], originally written in 1991 by [[Linus Torvalds]]. [10490050] |The system's [[system utility|utilities]] and [[library (computer science)|libraries]] usually come from the [[GNU operating system]], announced in 1983 by [[Richard Stallman]]. [10490060] |The GNU contribution is the basis for the alternative name '''GNU/Linux'''. [10490070] |Predominantly known for its use in [[server (computing)|server]]s, Linux is supported by corporations such as [[Dell]], [[Hewlett-Packard]], [[IBM]], [[Novell]], [[Oracle Corporation]], [[Red Hat]], and [[Sun Microsystems]]. [10490080] |It is used as an operating system for a wide variety of computer [[hardware]], including [[desktop computer]]s, [[supercomputers]], video game systems, such as the [[PlayStation 2]] and [[PlayStation 3]], several [[arcade games]], and [[embedded devices]] such as [[mobile phone]]s, [[routers]], and [[stage lighting]] systems. [10490090] |== History == [10490100] |The [[Unix]] operating system was conceived and implemented in the 1960s and first released in 1970. [10490110] |Its wide availability and [[Porting|portability]] meant that it was widely adopted, copied and modified by academic institutions and businesses, with its design being influential on authors of other systems. [10490120] |The [[GNU Project]], started in 1984, had the goal of creating a "''complete Unix-compatible software system''" made entirely of [[free software]]. [10490130] |In 1985, [[Richard Stallman]] created the [[Free Software Foundation]] and developed the [[GNU General Public License]] (GNU GPL). [10490140] |Many of the programs required in an OS (such as libraries, [[compiler]]s, [[text editor]]s, a [[Unix shell]], and a windowing system) were completed by the early 1990s, although low level elements such as [[device driver]]s, [[daemon (computer software)|daemon]]s, and the [[Kernel (computer science)|kernel]] were stalled and incomplete. [10490150] |Linus Torvalds has said that if the GNU kernel had been available at the time (1991), he would not have decided to write his own. [10490160] |=== MINIX === [10490170] |[[MINIX]], a Unix-like system intended for academic use, was released by [[Andrew S. Tanenbaum]] in 1987. [10490180] |While source code for the system was available, modification and redistribution were restricted (that is not the case today). [10490190] |In addition, MINIX's [[16-bit]] design was not well adapted to the [[32-bit]] design of the increasingly cheap and popular [[Intel 386]] architecture for personal computers. [10490200] |In 1991, Torvalds began to work on a non-commercial replacement for MINIX while he was attending the [[University of Helsinki]]. [10490210] |This eventually became the [[Linux kernel]]. [10490220] |In 1992, Tanenbaum posted an article on [[Usenet]] claiming Linux was obsolete. [10490230] |In the article, he criticized the operating system as being [[Monolithic kernel|monolithic]] in design and being tied closely to the x86 architecture and thus not portable, as he described "a fundamental error." [10490240] |Tanenbaum suggested that those who wanted a modern operating system should look into one based on the [[microkernel]] model. [10490250] |The posting elicited the response of Torvalds and [[Ken Thompson]], one of the founders of [[Unix]], which resulted in a well known debate over the microkernel and monolithic kernel designs. [10490260] |Linux was dependent on the MINIX [[user space]] at first. [10490270] |With code from the GNU system freely available, it was advantageous if this could be used with the fledgling OS. Code licensed under the GNU GPL can be used in other projects, so long as they also are released under the same or a compatible license. [10490280] |In order to make the Linux kernel compatible with the components from the GNU Project, Torvalds initiated a switch from his original license (which prohibited commercial redistribution) to the GNU GPL. [10490290] |Linux and GNU developers worked to integrate GNU components with Linux to make a fully functional and free operating system. [10490300] |=== Commercial and popular uptake === [10490310] |Today Linux is used in numerous domains, from [[embedded system]]s to [[supercomputer]]s, and has secured a place in [[server (computing)|server]] installations with the popular [[LAMP (software bundle)|LAMP]] application stack. [10490320] |Torvalds continues to direct the development of the kernel. [10490330] |Stallman heads the Free Software Foundation, which in turn supports the GNU components. [10490340] |Finally, individuals and corporations develop third-party non-GNU components. [10490350] |These third-party components comprise a vast body of work and may include both kernel modules and user applications and libraries. [10490360] |Linux vendors and communities combine and distribute the kernel, GNU components, and non-GNU components, with additional package management software in the form of [[Linux distribution]]s. [10490370] |== Design == [10490380] |Linux is a modular [[Unix-like]] operating system. [10490390] |It derives much of its basic design from principles established in Unix during the 1970s and 1980s. [10490400] |Linux uses a [[monolithic kernel]], the [[Linux kernel]], which handles process control, networking, and [[peripheral]] and [[file system]] access. [10490410] |[[Device drivers]] are integrated directly with the kernel. [10490420] |Much of Linux's higher-level functionality is provided by separate projects which interface with the kernel. [10490430] |The GNU [[Userland (computing)|userland]] is an important part of most Linux systems, providing the [[shell (computing)|shell]] and [[Unix tool]]s which carry out many basic operating system tasks. [10490440] |On top these tools form a Linux system with a [[graphical user interface]] that can be used, usually running in the [[X Window System]]. [10490450] |=== User interface === [10490460] |Linux can be controlled by one or more of a text-based [[command line interface]] (CLI), [[graphical user interface]] (GUI) (usually the default for desktop), or through controls on the device itself (common on embedded machines). [10490470] |On desktop machines, [[KDE]], [[GNOME]] and [[Xfce]] are the most popular user interfaces, though a variety of other user interfaces exist. [10490480] |Most popular user interfaces run on top of the [[X Window System]] (X), which provides [[network transparency]], enabling a graphical application running on one machine to be displayed and controlled from another. [10490490] |Other GUIs include [[X window manager]]s such as [[FVWM]], [[Enlightenment (window manager)|Enlightenment]] and [[Window Maker]]. [10490500] |The window manager provides a means to control the placement and appearance of individual application windows, and interacts with the X window system. [10490510] |A Linux system usually provides a [[CLI]] of some sort through a [[Shell (computing)|shell]], which is the traditional way of interacting with a Unix system. [10490520] |A Linux distribution specialized for servers may use the CLI as its only interface. [10490530] |A “headless system” run without even a monitor can be controlled by the command line via a protocol such as [[Secure Shell|SSH]] or [[telnet]]. [10490540] |Most low-level Linux components, including the GNU [[Userland (computing)|Userland]], use the CLI exclusively. [10490550] |The CLI is particularly suited for automation of repetitive or delayed tasks, and provides very simple [[inter-process communication]]. [10490560] |A graphical [[terminal emulator]] program is often used to access the CLI from a Linux desktop. [10490570] |== Development == [10490580] |The primary difference between Linux and many other popular contemporary operating systems is that the [[Linux kernel]] and other components are [[free software|free]] and [[open source software]]. [10490590] |Linux is not the only such operating system, although it is the best-known and most widely used. [10490600] |Some [[free software license|free]] and [[open source license|open source]] software licences are based on the principle of [[copyleft]], a kind of reciprocity: any work derived from a copyleft piece of software must also be copyleft itself. [10490610] |The most common free software license, the [[GNU GPL]], is a form of copyleft, and is used for the Linux kernel and many of the components from the [[GNU project]]. [10490620] |As an operating system [[underdog (competition)|underdog]] competing with mainstream operating systems, Linux cannot rely on a [[monopoly]] advantage; in order for Linux to be convenient for users, Linux aims for [[interoperability]] with other operating systems and established computing standards. [10490630] |Linux systems adhere to [[POSIX]], [[Single UNIX Specification|SUS]], [[International Organization for Standardization|ISO]] and [[American National Standards Institute|ANSI]] standards where possible, although to date only one Linux distribution has been POSIX.1 certified, Linux-FT. [10490640] |Free software projects, although developed in a [[Collaboration|collaborative]] fashion, are often produced independently of each other. [10490650] |However, given that the software licenses explicitly permit redistribution, this provides a basis for larger scale projects that collect the software produced by stand-alone projects and make it available all at once in the form of a [[Linux distribution]]. [10490660] |A [[Linux distribution]], commonly called a “distro”, is a project that manages a remote collection of Linux-based software, and facilitates installation of a Linux operating system. [10490670] |Distributions are maintained by individuals, loose-knit teams, volunteer organizations, and commercial entities. [10490680] |They include system software and [[application software]] in the form of ''packages'', and distribution-specific software for initial system installation and configuration as well as later package upgrades and installs. [10490690] |A distribution is responsible for the default configuration of installed Linux systems, system security, and more generally integration of the different software packages into a coherent whole. [10490700] |=== Community === [10490710] |Linux is largely driven by its developer and user communities. [10490720] |Some vendors develop and fund their distributions on a volunteer basis, [[Debian]] being a well-known example. [10490730] |Others maintain a community version of their commercial distributions, as [[Red Hat]] does with [[Fedora (Linux distribution)|Fedora]]. [10490740] |In many cities and regions, local associations known as [[Linux Users Group]]s (LUGs) seek to promote Linux and by extension free software. [10490750] |They hold meetings and provide free demonstrations, training, technical support, and operating system installation to new users. [10490760] |There are also many [[Internet]] communities that seek to provide support to Linux users and developers. [10490770] |Most distributions and open source projects have [[IRC]] chatrooms or [[newsgroup]]s. [10490780] |[[Online forum]]s are another means for support, with notable examples being [[LinuxQuestions.org]] and the [[Gentoo Linux|Gentoo]] forums. [10490790] |Linux distributions host [[mailing list]]s; commonly there will be a specific topic such as usage or development for a given list. [10490800] |There are several technology websites with a Linux focus. [10490810] |[[Linux Weekly News]] is a weekly digest of Linux-related news; the [[Linux Journal]] is an online magazine of Linux articles published monthly; [[Slashdot]] is a technology-related news website with many stories on Linux and open source software; [[Groklaw]] has written in depth about Linux-related legal proceedings and there are many articles relevant to the Linux kernel and its relationship with [[GNU]] on the [[GNU Project|GNU project's]] website. [10490820] |Print [[magazine]]s on Linux often include [[cover disk]]s including software or even complete Linux distributions. [10490830] |Although Linux is generally available free of charge, several large corporations have established business models that involve selling, supporting, and contributing to Linux and free software. [10490840] |These include [[Dell]], [[IBM]], [[Hewlett-Packard|HP]], [[Sun Microsystems]], [[Novell]], and [[Red Hat]]. [10490850] |The free software licenses on which Linux is based explicitly accommodate and encourage commercialization; the relationship between Linux as a whole and individual vendors may be seen as [[symbiosis|symbiotic]]. [10490860] |One common business model of commercial suppliers is charging for support, especially for business users. [10490870] |A number of companies also offer a specialized business version of their distribution, which adds proprietary support packages and tools to administer higher numbers of installations or to simplify administrative tasks. [10490880] |Another business model is to give away the software in order to sell hardware. [10490890] |=== Programming on Linux === [10490900] |Most Linux distributions support dozens of [[programming language]]s. [10490910] |The most common collection of utilities for building both Linux applications and operating system programs is found within the [[GNU toolchain]], which includes the [[GNU Compiler Collection]] (GCC) and the [[GNU build system]]. [10490920] |Amongst others, GCC provides compilers for [[Ada (programming language)|Ada]], [[C (programming language)|C]], [[C++]], [[Java (programming language)|Java]], and [[Fortran]]. [10490930] |The Linux kernel itself is written to be compiled with GCC. [10490940] |[[Proprietary software|Proprietary]] compilers for Linux include the [[Intel C++ Compiler]] and IBM XL C/C++ Compiler. [10490950] |Most distributions also include support for [[Perl]], [[Ruby programming language|Ruby]], [[Python programming language|Python]] and other [[Dynamic programming language|dynamic languages]]. [10490960] |Examples of languages that are less common, but still well-supported, are [[C Sharp (programming language)|C#]] via the [[Mono (software)|Mono]] project, sponsored by [[Novell]], and [[Scheme programming language|Scheme]]. [10490970] |A number of [[Java Virtual Machine]]s and development kits run on Linux, including the original Sun Microsystems JVM ([[HotSpot]]), and IBM's J2SE RE, as well as many open-source projects like [[Kaffe]]. [10490980] |The two main frameworks for developing graphical applications are those of [[GNOME]] and [[KDE]]. [10490990] |These projects are based on the [[GTK+]] and [[Qt (toolkit)|Qt]] [[widget toolkit]]s, respectively, which can also be used independently of the larger framework. [10491000] |Both support a wide variety of languages. [10491010] |There are a number of [[Integrated development environment]]s available including [[Anjuta]], [[Code::Blocks]], [[Eclipse (computing)|Eclipse]], [[KDevelop]], [[Lazarus (software)|Lazarus]], [[MonoDevelop]], [[NetBeans]], and [[Omnis Studio]] while the long-established editors [[Vim (text editor)|Vim]] and [[Emacs]] remain popular. [10491020] |== Uses == [10491030] |As well as those designed for general purpose use on desktops and servers, distributions may be specialized for different purposes including: [[computer architecture]] support, [[Embedded Linux|embedded systems]], stability, security, localization to a specific region or language, targeting of specific user groups, support for [[real-time computing|real-time]] applications, or commitment to a given desktop environment. [10491040] |Furthermore, some distributions deliberately include only [[free software]]. [10491050] |Currently, over three hundred distributions are actively developed, with about a dozen distributions being most popular for general-purpose use. [10491060] |Linux is a widely [[porting|ported]] operating system. [10491070] |While the Linux kernel was originally designed only for [[Intel 80386]] [[microprocessor]]s, it now runs on a more diverse range of [[computer architecture]]s than any other operating system: in the hand-held [[ARM architecture|ARM]]-based [[iPAQ]] and the [[mainframe computer|mainframe]] [[IBM]] [[System z9]], in devices ranging from [[mobile phone]]s to [[supercomputer]]s. [10491080] |Specialized distributions exist for less mainstream architectures. [10491090] |The [[ELKS]] kernel [[fork (software development)|fork]] can run on [[Intel 8086]] or [[Intel 80286]] [[16-bit]] microprocessors, while the [[µClinux]] kernel fork may run on systems without a [[memory management unit]]. [10491100] |The kernel also runs on architectures that were only ever intended to use a manufacturer-created operating system, such as [[Macintosh]] computers, [[Personal digital assistant|PDA]]s, [[video game console]]s, [[Digital audio player|portable music players]], and [[mobile phone]]s. [10491110] |=== Desktop === [10491120] |Although there is a lack of Linux ports for some [[Mac OS X]] and [[Microsoft Windows]] programs in domains such as [[desktop publishing]] and [[professional audio]], applications equivalent to those available for Mac and Windows are available for Linux. [10491130] |Most Linux distributions provide a program for browsing a list of thousands of [[free software]] applications that have already been tested and configured for a specific distribution. [10491140] |These free programs can be downloaded and installed with one mouse click and a digital signature guarantees that no one has added a virus or a spyware to these programs. [10491150] |Many [[free software]] titles that are popular on Windows, such as [[Pidgin (software)|Pidgin]], [[Mozilla Firefox]], [[Openoffice.org]], and [[GIMP]], are available for Linux. [10491160] |A growing amount of proprietary desktop software is also supported under Linux, examples being [[Adobe Flash Player]], [[Adobe Acrobat|Acrobat Reader]], [[Matlab]], [[Nero Burning ROM]], [[Opera (Internet suite)|Opera]], [[RealPlayer]], and [[Skype]]. [10491170] |In the field of animation and visual effects, most high end software, such as AutoDesk Maya, Softimage XSI and Apple Shake, is available for Linux, Windows and/or Mac OS X. [10491180] |[[CrossOver]] is a proprietary solution based on the open source [[Wine (software)|Wine]] project that supports running older Windows versions of [[Microsoft Office]] and [[Adobe Photoshop]] versions through CS2. [10491190] |[[Microsoft Office 2007]] and Adobe Photoshop CS3 are known not to work. [10491200] |Besides the free Windows compatibility layer [[Wine (software)|Wine]], most distributions offer [[Dual boot]] and [[X86 virtualization]] for running both Linux and Windows on the same computer. [10491210] |Linux's open nature allows distributed teams to [[L10n|localize]] Linux distributions for use in locales where localizing proprietary systems would not be cost-effective. [10491220] |For example the [[Sinhalese language]] version of the [[Knoppix]] distribution was available for a long time before [[Microsoft Windows XP]] was translated to Sinhalese. [10491230] |In this case the Lanka Linux User Group played a major part in developing the localized system by combining the knowledge of university professors, [[linguist]]s, and local developers. [10491240] |The performance of Linux on the desktop has been a controversial topic, with at least one key Linux kernel developer, Con Kolivas, accusing the Linux community of favouring performance on servers. [10491250] |He quit Linux development because he was frustrated with this lack of focus on the desktop, and then gave a 'tell all' interview on the topic. [10491260] |=== Servers and supercomputers === [10491270] |Historically, Linux has mainly been used as a [[Server (computing)|server]] operating system, and has risen to prominence in that area; [[Netcraft]] reported in September 2006 that eight of the ten most reliable internet hosting companies run Linux on their [[web server]]s. [10491280] |This is due to its relative stability and long uptime, and the fact that desktop software with a graphical user interface for servers is often unneeded. [10491290] |Enterprise and non-enterprise Linux distributions may be found running on servers. [10491300] |Linux is the cornerstone of the [[LAMP (software bundle)|LAMP]] server-software combination (Linux, [[Apache HTTP Server|Apache]], [[MySQL]], [[Perl]]/[[PHP]]/[[Python (programming language)|Python]]) which has achieved popularity among developers, and which is one of the more common platforms for website hosting. [10491310] |Linux is commonly used as an operating system for [[supercomputer]]s. [10491320] |As of [[November 2007]], out of the top 500 systems, 426 (85.2%) run Linux. [10491330] |=== Embedded devices === [10491340] |Due to its low cost and ability to be easily modified, an [[embedded Linux]] is often used in [[embedded systems]]. [10491350] |Linux has become a major competitor to the proprietary [[Symbian OS]] found in the majority of smartphones — 16.7% of [[smartphone]]s sold worldwide during 2006 were using Linux — and it is an alternative to the proprietary [[Windows CE]] and [[Palm OS]] operating systems on [[mobile device]]s. [10491360] |Cell phones or PDAs running on Linux and built on open source platform became a trend from 2007, like [[Nokia N810]], [[Openmoko]]'s [[Neo1973]] and the on-going [[Google Android]]. [10491370] |The popular [[TiVo]] digital video recorder uses a customized version of Linux. [10491380] |Several network [[firewall]] and [[router]] standalone products, including several from [[Linksys]], use Linux internally, using its advanced firewall and routing capabilities. [10491390] |The [[Korg OASYS]] and the [[Yamaha Motif|Yamaha Motif XS]] [[music workstation]]s also run Linux. [10491400] |Further more Linux is used in the leading [[stage lighting]] control system, FlyingPig/HighEnd WholeHogIII Console . [10491410] |=== Market share and uptake === [10491420] |Many quantitative studies of open source software focus on topics including market share and reliability, with numerous studies specifically examining Linux. [10491430] |The Linux market is growing rapidly, and the revenue of servers, desktops, and packaged software running Linux is expected to exceed $35.7 billion by 2008. [10491440] |[[International Data Corporation|IDC]]'s report for Q1 2007 says that Linux now holds 12.7% of the overall server market. [10491450] |This estimate was based on the number of Linux servers sold by various companies. [10491460] |Desktop adoption of Linux is approximately 1%. [10491470] |In comparison, [[List of Microsoft operating systems|Microsoft operating systems]] hold more than 90%. [10491480] |The frictional cost of switching operating systems and lack of support for certain hardware and application programs designed for [[Microsoft Windows]] have been two factors that have inhibited adoption. [10491490] |Proponents and analysts attribute the relative success of Linux to its security, reliability, low cost, and freedom from [[vendor lock-in]]. [10491500] |Also most recently Google has begun to fund [[Wine (software)|Wine]], which acts as a compatibility layer, allowing users to run some Windows programs under Linux. [10491510] |The [[OLPC XO-1|XO laptop]] project of One Laptop Per Child is creating a new and potentially much larger Linux community, planned to reach [http://www.laptop.org/en/vision/mission/index.shtml several hundred million schoolchildren] and their families and communities in developing countries. [http://wiki.laptop.org/go/countries Six countries] have ordered a million or more units each for delivery in 2007 to distribute to schoolchildren at no charge. [10491520] |[[Google]], [[Red Hat]], and [[eBay]] are major supporters of the project. [10491530] |== Copyright and naming == [10491540] |The Linux kernel and most GNU software are [[software license|license]]d under the [[GNU General Public License]] (GPL). [10491550] |The GPL requires that anyone who distributes the Linux kernel must make the source code (and any modifications) available to the recipient under the same terms. [10491560] |In 1997, Linus Torvalds stated, “Making Linux GPL'd was definitely the best thing I ever did.” [10491570] |Other key components of a Linux system may use other licenses; many libraries use the [[GNU Lesser General Public License]] (LGPL), a more permissive variant of the GPL, and the [[X Window System]] uses the [[MIT License]]. [10491580] |Torvalds has publicly stated that he would not move the Linux kernel (currently licensed under GPL version 2) to version 3 of the GPL, released in mid-2007, specifically citing some provisions in the new license which prohibit the use of the software in [[digital rights management]]. [10491590] |A 2001 study of [[Red Hat Linux]] 7.1 found that this distribution contained 30 million [[source lines of code]]. [10491600] |Using the [[COCOMO|Constructive Cost Model]], the study estimated that this distribution required about eight thousand man-years of development time. [10491610] |According to the study, if all this software had been developed by conventional [[proprietary software|proprietary]] means, it would have cost about 1.08 billion dollars (year 2000 U.S. dollars) to develop in the United States. [10491620] |Most of the code (71%) was written in the [[C (programming language)|C]] [[computer programming|programming]] [[programming language|language]], but many other languages were used, including [[C++]], [[assembly language]], [[Perl]], [[Python (programming language)|Python]], [[Fortran]], and various [[shell script]]ing languages. [10491630] |Slightly over half of all lines of code were licensed under the GPL. [10491640] |The Linux kernel itself was 2.4 million lines of code, or 8% of the total. [10491650] |In a later study, the same analysis was performed for Debian GNU/Linux version 4.0. [10491660] |This distribution contained over 283 million source lines of code, and the study estimated that it would have cost 5.4 billion Euros to develop by conventional means. [10491670] |In the United States, the name ''Linux'' is a [[trademark]] registered to Linus Torvalds. [10491680] |Initially, nobody registered it, but on [[August 15]] [[1994]], William R. Della Croce, Jr. filed for the trademark ''Linux'', and then demanded royalties from Linux distributors. [10491690] |In 1996, Torvalds and some affected organizations sued him to have the trademark assigned to Torvalds, and in 1997 the case was settled. [10491700] |The licensing of the trademark has since been handled by the [[Linux Mark Institute]]. [10491710] |Torvalds has stated that he only trademarked the name to prevent someone else from using it, but was bound in 2005 by [[United States trademark law]] to take active measures to enforce the trademark. [10491720] |As a result, the LMI sent out a number of letters to distribution vendors requesting that a fee be paid for the use of the name, and a number of companies have complied. [10491730] |=== GNU/Linux === [10491740] |The [[Free Software Foundation]] views Linux distributions which use GNU software as [[GNU variants]] and they ask that such operating systems be referred to as ''GNU/Linux'' or ''a Linux-based GNU system''. [10491750] |However, the media and population at large refers to this family of operating systems simply as ''Linux''. [10491760] |While some distributors make a point of using the aggregate form, most notably [[Debian]] with the ''[[Debian GNU/Linux]]'' distribution, the term's use outside of the enthusiast community is limited. [10491770] |The distinction between the Linux kernel and distributions based on it plus the GNU system is a source of confusion to many newcomers, and the naming remains controversial, as many large Linux distributions (e.g. [10491780] |[[Ubuntu]] and [[SuSE]] Linux) are simply using the ''Linux'' name, rather than ''GNU/Linux''. [10500010] |
List of chatterbots
[10500020] |==Chatterbot Directories== [10500030] |* [10500040] |*[http://www.simonlaven.com Chatterbot Central] at [http://www.simonlaven.com The Simon Laven Page] [10500050] |*[http://www.aidreams.co.uk/chatterbotcollection/index.htm The Chatterbot Collection] [10500060] |*[http://www.aihub.org AI Hub] - A directory of news, programs, and links all related to chatterbots and Artificial Intelligence [10500070] |*[http://www.chatterboxchallenge.com/bots_dir.php The Chatterbox Challenge Bots Directory] at [http://www.chatterboxchallenge.com The Chatterbox Challenge] [10500080] |==Classic Chatterbots== [10500090] |*[[Dr. Sbaitso]] [10500100] |*[[ELIZA]] [10500110] |*[[PARRY]] [10500120] |*[[Racter]] [10500130] |==General Chatterbots== [10500140] |*[[Artificial Linguistic Internet Computer Entity|A.L.I.C.E.]] and other [[Alicebot]]/pandorabot-based ([http://www.titane.ca/concordia/dfar251/igod/main.html iGod], [http://www.mousebreaker.com/games/chatbot/play.php Mitsuku], [http://www.friendbot.co.uk FriendBot], etc.) [10500150] |*[[Albert One]] [10500160] |*[[ALIMbot]] [10500170] |*[[CHAT and TIPS]] [10500180] |*[http://www.chat-bot.com Chat-bot] [10500190] |*[[Claude Chatterbot|Claude]] [10500200] |*[http://www.dadorac.com Dadorac] [10500210] |*[http://www.dai2.co.uk/ DAI2] - A dynamic artificial intelligence which learns from its surrounding community [10500220] |*[http://www.elbot.com/ Elbot] [10500230] |*[[Ella Chatterbot|Ella]] [10500240] |*[[Fred Chatterbot|Fred]] [10500250] |*[[Jabberwacky]] [10500260] |*[http://www.abenteuermedien.de/jabberwock Jabberwock] [10500270] |*[http://www.jeeney.com/ Jeeney AI] [10500280] |*[http://www.jixperts.com?lang=en JIxperts] – collection of wiki chatterbots. [10500290] |*[http://www.iaindustrie.fr.nf KAR Intelligent Computer] [10500300] |*[http://www.leeds-city-guide.com/kyle Kyle] – A unique learning Artificial Intelligence chatbot, which employs contextual learning algorithms. [10500310] |*[[MegaHal]] [10500320] |*[[Mr Know-It-All]] [10500330] |*Oliverbot [10500340] |*[http://uk.geocities.com/mattbrown1101/ Poseidon] [10500350] |*[http://www.infradrive.com/robomatic.php RoboMatic X1] - A chatbot which controls the user's PC through chatting by their voice or by typing. [10500360] |*[http://www.cooldictionary.com/splotchy.mpl Splotchy] [10500370] |*[[Starship Titanic#Spookitalk|Spookitalk]] - A chatterbot used for [[Non-player character|NPC]]s in [[Douglas Adams]]' ''Starship Titanic'' video game. [10500380] |*[http://www.onebigspace.com/ Thomas] [10500390] |*[[Ultra Hal Assistant]] [10500400] |*[[Verbot]] [10500410] |*[http://www.yhaken.com/ Yhaken] [10500420] |*[http://www.scientiobot.com ScientioBot] - A new technology chatterbot using concept mining techniques accessible via a free web service. [10500430] |*[http://nicole.jetaylor.net NICOLE] A simple chatterbot with the ability to learn new phrases. [10500440] |==[[Instant messenger|IM]] Chatterbots== [10500450] |*DAI2 is also available on the MSN / Windows Live network as dai2@dai2.co.uk [10500460] |*[http://www.dnreg.org/bot/ MSN Quickbot] [10500470] |*[http://www.smarterchild.com SmarterChild] [10500480] |*[http://www.spleak.com Spleak] [10500490] |*[http://www.mrmovie.com MrMovie] - searching actors/movies/dvd's in IM (Skype, AOL/AIM or MSN/Live) [10500500] |*[[Inside Messenger Bot|InsideMessenger]] [10500510] |*[http://www.inocu.jt-online.co.uk Inocu] - (MSN/Live) [10500520] |*[http://www.friendbot.co.uk FriendBot-An AIM Chatterbot] [10500530] |*[http://www.amsn-project.net/plugins.php amsnEliza plugin for aMSN] [10500540] |*[[Inside Messenger Bot|TrixieMouse]] [10500550] |*[http://www.infobot.pl/ Infobot] - Polish informational bot for Gadu-gadu, Skype and Jabber [10500560] |==AIML Chatterbots== [10500570] |*[http://www.taik.fi/turingenigma Alan] - In ''Turing Enigma'' Alan Turing's spirit has infiltrated the World War II encrypting device Enigma. [10500580] |*[http://www.dustyant.com/projects/deebot/ Deeb0t] [10500590] |*[http://www.pandorabots.com/pandora/talk?botid=b0dafd24ee35a477 Chomsky] A chatbot that uses a smiley face to convey emotions. [10500600] |It uses the information in Wikipedia to build its conversations and has links to Wikipedia articles. [10500610] |*[[John Lennon Artificial Intelligence Project]] [10500620] |*[[SitePal]] [10500630] |==JFred Chatterbots== [10500640] |*[[The Turing Hub]] [10500650] |==Educational Chatterbots== [10500660] |*[http://www.philocomp.net/?pageref=ai&page=elizabeth Elizabeth] Aims to teach AI techniques and concepts, starting from chatterbot design. [10500670] |Accompanied by self-teaching materials, as used at the University of Leeds. [10500680] |==Non-English Chatterbots== [10500690] |*[http://www.geocities.com/brizglace/amanda.htm Amanda] - (French) with source code for Windows. [10500700] |*[[Proteus]] [10500710] |*[msnim:chat?contact=senhorbot@hotmail.com Senhor Bot] (Brazillian bot for MSN) [10500720] |[[Category:Chatterbots|*]] [10500730] |[[bn:চ্যাটারবটসমূহের তালিকা]] [10510010] |
Loebner prize
[10510020] |The '''Loebner Prize''' is an annual competition that awards prizes to the [[Chatterbot]] considered by the judges to be the most [[Artificial intelligence|humanlike]] of those entered. [10510030] |The format of the competition is that of a standard [[Turing test]]. [10510040] |In the Loebner Prize, as in a Turing test, a human judge is faced with two computer screens. [10510050] |One is under the control of a computer, the other is under the control of a human. [10510060] |The judge poses questions to the two screens and receives answers. [10510070] |Based upon the answers, the judge must decide which screen is controlled by the human and which is controlled by the computer program. [10510080] |The contest was begun in 1990 by [[Hugh Loebner]] in conjunction with the [[Cambridge Center for Behavioral Studies]] of [[Massachusetts]], [[United States]]. [10510090] |It has since been associated with [[Flinders University]], [[Dartmouth College]], the [[Science Museum (London)|Science Museum]] in [[London]], and most recently the [[University of Reading]]. [10510100] |Within the field of artificial intelligence, the Loebner Prize is somewhat controversial; the most prominent critic, [[Marvin Minsky]], has called it a publicity stunt that does not help the field along. [10510110] |==Prizes== [10510120] |The prizes for each year include: [10510130] |* $2,000 for the most human-seeming of all chatterbots for that year - awarded every year. [10510140] |In 2005, the prize was increased to $3,000, and the prize was $2,250 in 2006. [10510150] |In 2008 the prize will be $3000.00 [10510160] |* $25,000 for the first chatterbot that judges cannot distinguish from a real human in a text-only Turing test, and that can convince judges that the other (human) entity they are talking to simultaneously is a computer. ''(to be awarded once only)'' [10510170] |* $100,000 to the first chatterbot that judges cannot distinguish from a real human in a Turing test that includes deciphering and understanding text, visual, and auditory input. ''(to be awarded once only)'' [10510180] |The Loebner Prize dissolves once the $100,000 prize is won. [10510190] |==2008 Loebner Prize== [10510200] |The 2008 Competition is to be held on Sunday [[12 October]] in University of Reading, [[United Kingdom|UK]]. [10510210] |The event, which is being co-directed by [[Kevin Warwick]], will include a direct challenge on the [[Turing test]] as originally proposed by [[Alan Turing]]. [10510220] |The first place winner will receive $3000.00 and a bronze medal. [10510230] |==2007 Loebner Prize== [10510240] |The 2007 Competition was held on Sunday, [[21 October]] in [[New York City]]. [10510250] |The participants in the contest were: [10510260] |* [[Rollo Carpenter]] from Icogno, creator of [[Jabberwacky]] [10510270] |* Noah Duncan, private entry, creator of Cletus [10510280] |* Robert Medeksza from Zabaware, creator of [[Ultra Hal Assistant]] [10510290] |No bot passed the Turing test but the judges ranked the bots as "most human". [10510300] |The results of the contest were: [10510310] |* 1st place: Robert Medeksza [10510320] |* 2nd place: Noah Duncan [10510330] |* 3rd place: Rollo Carpenter [10510340] |The winner received $2250 and the Annual Medal. [10510350] |The runners up received $250 each. [10510360] |==2006 Loebner Prize== [10510370] |On Wednesday, [[August 30]], the finalists for the 2006 Loebner Prize were announced. [10510380] |The finalists were: [10510390] |* Rollo Carpenter [10510400] |* Richard Churchill and Marie-Claire Jenkins [10510410] |* Noah Duncan [10510420] |* Robert Medeksza [10510430] |The contest was held on Sunday, [[17 September]] at the Torrington Theatre, [[University College London]]. [10510440] |==Winners== [10520010] |
Machine learning
[10520020] |As a broad subfield of [[artificial intelligence]], '''machine learning''' is concerned with the design and development of [[algorithm]]s and techniques that allow computers to "learn". [10520030] |At a general level, there are two types of learning: [[Inductive reasoning|inductive]], and [[Deductive reasoning|deductive]]. [10520040] |Inductive machine learning methods extract rules and patterns out of massive data sets. [10520050] |The major focus of machine learning research is to extract information from data automatically, by computational and statistical methods. [10520060] |Hence, machine learning is closely related not only to [[data mining]] and [[statistics]], but also [[theoretical computer science]]. [10520070] |==Applications== [10520080] |Machine learning has a wide spectrum of applications including [[natural language processing]], [[syntactic pattern recognition]], [[search engines]], [[diagnosis|medical diagnosis]], [[bioinformatics]], [[brain-machine interfaces]] and [[cheminformatics]], detecting [[credit card fraud]], [[stock market]] analysis, classifying [[DNA sequence]]s, [[speech recognition|speech]] and [[handwriting recognition]], [[object recognition]] in [[computer vision]], [[strategy game|game playing]] and [[robot locomotion]]. [10520090] |== Human interaction == [10520100] |Some machine learning systems attempt to eliminate the need for human intuition in the analysis of the data, while others adopt a collaborative approach between human and machine. [10520110] |Human intuition cannot be entirely eliminated since the designer of the system must specify how the data is to be represented and what mechanisms will be used to search for a characterization of the data. [10520120] |Machine learning can be viewed as an attempt to automate parts of the [[scientific method]]. [10520130] |Some statistical machine learning researchers create methods within the framework of [[Bayesian statistics]]. [10520140] |== Algorithm types == [10520150] |Machine learning [[algorithm]]s are organized into a [[taxonomy]], based on the desired outcome of the algorithm. [10520160] |Common algorithm types include: [10520170] |* [[Supervised learning]] — in which the algorithm generates a function that maps inputs to desired outputs. [10520180] |One standard formulation of the supervised learning task is the [[statistical classification|classification]] problem: the learner is required to learn (to approximate) the behavior of a function which maps a vector [X_1, X_2, \ldots X_N]\, into one of several classes by looking at several input-output examples of the function. [10520190] |* [[Unsupervised learning]] — An agent which models a set of inputs: labeled examples are not available. [10520200] |* [[Semi-supervised learning]] — which combines both labeled and unlabeled examples to generate an appropriate function or classifier. [10520210] |* [[Reinforcement learning]] — in which the algorithm learns a policy of how to act given an observation of the world. [10520220] |Every action has some impact in the environment, and the environment provides feedback that guides the learning algorithm. [10520230] |* [[Transduction (machine learning)|Transduction]] — similar to supervised learning, but does not explicitly construct a function: instead, tries to predict new outputs based on training inputs, training outputs, and test inputs which are available while training. [10520240] |* [[Leaning to learn]] — in which the algorithm learns its own [[inductive bias]] based on previous experience. [10520250] |The computational analysis of machine learning algorithms and their performance is a branch of [[theoretical computer science]] known as [[computational learning theory]]. [10520260] |== Machine learning topics == [10520270] |:''This list represents the topics covered on a typical machine learning course.'' [10520280] |;Prerequisites [10520290] |*[[Bayesian theory]] [10520300] |;Modeling [[conditional probability|conditional probability density functions]]: [[Regression analysis|regression]] and [[Statistical classification|classification]] [10520310] |*[[Artificial neural network]]s [10520320] |*[[Decision tree]]s [10520330] |*[[Gene expression programming]] [10520340] |*[[Genetic algorithms]] [10520350] |*[[Genetic programming]] [10520360] |*[[Holographic associative memory]] [10520370] |*[[Inductive Logic Programming]] [10520380] |*[[Kriging|Gaussian process regression]] [10520390] |*[[Linear discriminant analysis]] [10520400] |*[[Nearest neighbor (pattern recognition)|K-nearest neighbor]] [10520410] |*[[Minimum message length]] [10520420] |*[[Perceptron]] [10520430] |*[[Quadratic classifier]] [10520440] |*[[Radial basis function network]]s [10520450] |*[[Support vector machine]]s [10520460] |;Algorithms for estimating model parameters: [10520470] |*[[Dynamic programming]] [10520480] |*[[Expectation-maximization algorithm]] [10520490] |;Modeling [[probability density function]]s through [[generative model]]s: [10520500] |*[[Graphical model]]s including [[Bayesian network]]s and [[Markov network|Markov random fields]] [10520510] |*[[Generative topographic map]] [10520520] |;Approximate inference techniques [10520530] |*[[Monte Carlo method]]s [10520540] |*[[Variational Bayes]] [10520550] |*[[Variable-order Markov model]]s [10520560] |*[[Variable-order Bayesian network]]s [10520570] |*[[Loopy belief propagation]] [10520580] |;Optimization [10520590] |*Most of methods listed above either use [[Optimization (mathematics)|optimization]] or are instances of optimization algorithms [10520600] |;Meta-learning (ensemble methods) [10520610] |*[[Boosting]] [10520620] |*[[Bootstrap aggregating]] [10520630] |*[[Random forest]] [10520640] |*[[Weighted majority algorithm]] [10520650] |;Inductive transfer and learning to learn [10520660] |*[[Inductive transfer]] [10520670] |*[[Reinforcement learning]] [10520680] |*[[Temporal difference learning]] [10520690] |*[[Monte-Carlo method]] [10530010] |
Machine translation
[10530020] |Machine translation''', sometimes referred to by the abbreviation '''MT''', is a sub-field of [[computational linguistics]] that investigates the use of [[computer software]] to [[translation|translate]] text or speech from one [[natural language]] to another. [10530030] |At its basic level, MT performs simple [[substitution]] of words in one natural language for words in another. [10530040] |Using [[corpus linguistics|corpus]] techniques, more complex translations may be attempted, allowing for better handling of differences in [[linguistic typology]], phrase [[recognition]], and translation of [[idiom]]s, as well as the isolation of anomalies. [10530050] |Current machine translation software often allows for customisation by domain or [[profession]] (such as [[meteorology|weather reports]]) — improving output by limiting the scope of allowable substitutions. [10530060] |This technique is particularly effective in domains where formal or formulaic language is used. [10530070] |It follows then that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text. [10530080] |Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has [[word sense disambiguation|unambiguously identified]] which words in the text are names. [10530090] |With the assistance of these techniques, MT has proven useful as a tool to assist human translators, and in some cases can even produce output that can be used "as is". [10530100] |However, current systems are unable to produce output of the same quality as a human translator, particularly where the text to be translated uses casual language. [10530110] |==History== [10530120] |The history of machine translation begins in the 1950s, after [[World War II]]. [10530130] |The [[Georgetown-IBM experiment|Georgetown experiment]] (1954) involved fully-automatic translation of over sixty [[Russian language|Russian]] sentences into [[English language|English]]. [10530140] |The experiment was a great success and ushered in an era of substantial funding for machine-translation research. [10530150] |The authors claimed that within three to five years, machine translation would be a solved problem. [10530160] |Real progress was much slower, however, and after the [[ALPAC|ALPAC report]] (1966), which found that the ten-year-long research had failed to fulfill expectations, funding was greatly reduced. [10530170] |Beginning in the late 1980s, as [[computation]]al power increased and became less expensive, more interest was shown in [[statistical machine translation|statistical models for machine translation]]. [10530180] |The idea of using digital computers for translation of natural languages was proposed as early as 1946 by A.D.Booth and possibly others. [10530190] |The Georgetown experiment was by no means the first such application, and a demonstration was made in 1954 on the APEXC machine at Birkbeck College (London Univ.) of a rudimentary translation of English into French. [10530200] |Several papers on the topic were published at the time, and even articles in popular journals (see for example Wireless World, Sept. 1955, Cleave and Zacharov). [10530210] |A similar application, also pioneered at Birkbeck College at the time, was reading and composing Braille texts by computer. [10530220] |Recently, Internet has emerged as global information infrastructure, revolutionizing access to any information, as well as fast information transfer and exchange. [10530230] |Using Internet and e-mail technology, people need to communicate rapidly over long distances across continent boundaries. [10530240] |Not all of these Internet users, however, can use their own language for global communication to different people with different languages. [10530250] |Therefore, using machine translation software, people can possibly communicate and contact one to another around the world in their own mother tongue, in the near future. [10530260] |==Translation process== [10530270] |The [[translation process]] may be stated as: [10530280] |# [[Decoding]] the [[meaning (linguistic)|meaning]] of the [[source text]]; and [10530290] |# Re-[[encoding]] this [[meaning (linguistic)|meaning]] in the [[target language]]. [10530300] |Behind this ostensibly simple procedure lies a complex [[cognitive]] operation. [10530310] |To decode the meaning of the [[source text]] in its entirety, the translator must interpret and analyse all the features of the text, a process that requires in-depth knowledge of the [[grammar]], [[semantics]], [[syntax]], [[idiom]]s, etc., of the [[source language]], as well as the [[culture]] of its speakers. [10530320] |The translator needs the same in-depth knowledge to re-encode the meaning in the [[target language]]. [10530330] |Therein lies the challenge in machine translation: how to program a computer that will "understand" a text as a person does, and that will "create" a new text in the [[target language]] that "sounds" as if it has been written by a person. [10530340] |This problem may be approached in a number of ways. [10530350] |==Approaches== [10530360] |Machine translation can use a method based on [[Expert System|linguistic rules]], which means that words will be translated in a linguistic way — the most suitable (orally speaking) words of the target language will replace the ones in the source language. [10530370] |It is often argued that the success of machine translation requires the problem of [[natural language processing|natural language understanding]] to be solved first. [10530380] |Generally, rule-based methods parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. [10530390] |According to the nature of the intermediary representation, an approach is described as [[interlingual machine translation]] or [[transfer-based machine translation]]. [10530400] |These methods require extensive [[lexicon]]s with [[morphology (linguistics)|morphological]], [[syntax|syntactic]], and [[semantics|semantic]] information, and large sets of rules. [10530410] |Given enough data, machine translation programs often work well enough for a [[native speaker]] of one language to get the approximate meaning of what is written by the other native speaker. [10530420] |The difficulty is getting enough data of the right kind to support the particular method. [10530430] |For example, the large multilingual [[Text corpus|corpus]] of data needed for statistical methods to work is not necessary for the grammar-based methods. [10530440] |But then, the grammar methods need a skilled linguist to carefully design the grammar that they use. [10530450] |To translate between closely related languages, a technique referred to as [[shallow-transfer machine translation]] may be used. [10530460] |===Rule-based=== [10530470] |The rule-based machine translation paradigm includes transfer-based machine translation, interlingual machine translation and dictionary-based machine translation paradigms. [10530480] |'''''Transfer-based machine translation''''' [10530490] |'''''Interlingual''''' [10530500] |Interlingual machine translation is one instance of rule-based machine-translation approaches. [10530510] |In this approach, the source language, i.e. the text to be translated, is transformed into an interlingual, i.e. source-/target-language-independent representation. [10530520] |The target language is then generated out of the [[interlinguistics|interlingua]]. [10530530] |'''''Dictionary-based''''' [10530540] |Machine translation can use a method based on [[dictionary]] entries, which means that the words will be translated as they are by a dictionary. [10530550] |===Statistical=== [10530560] |Statistical machine translation tries to generate translations using [[statistical methods]] based on bilingual text corpora, such as the [[Hansard#Canadian hansard and machine translation|Canadian Hansard]] corpus, the English-French record of the Canadian parliament and [[EUROPARL]], the record of the [[European Parliament]]. [10530570] |Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare. [10530580] |The first statistical machine translation software was [[CANDIDE]] from [[IBM]]. [10530590] |Google used [[SYSTRAN]] for several years, but has switched to a statistical translation method in October 2007. [10530600] |Recently, they improved their translation capabilities by inputting approximately 200 billion words from [[United Nations]] materials to train their system. [10530610] |Accuracy of the translation has improved. [10530620] |===Example-based=== [10530630] |Example-based machine translation (EBMT) approach is often characterised by its use of a bilingual [[corpus]] as its main knowledge base, at run-time. [10530640] |It is essentially a translation by [[analogy]] and can be viewed as an implementation of [[case-based reasoning]] approach of [[machine learning]]. [10530650] |==Major issues== [10530660] |===Disambiguation=== [10530670] |Word sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. [10530680] |The problem was first raised in the 1950s by [[Yehoshua Bar-Hillel]]. [10530690] |He pointed out that without a "universal encyclopedia", a machine would never be able to distinguish between the two meanings of a word. [10530700] |Today there are numerous approaches designed to overcome this problem. [10530710] |They can be approximately divided into "shallow" approaches and "deep" approaches. [10530720] |Shallow approaches assume no knowledge of the text. [10530730] |They simply apply statistical methods to the words surrounding the ambiguous word. [10530740] |Deep approaches presume a comprehensive knowledge of the word. [10530750] |So far, shallow approaches have been more successful. [10530760] |===Named entities=== [10530770] |Related to [[named entity recognition]] in [[information extraction]]. [10530780] |==Applications== [10530790] |There are now many [[software]] programs for translating natural language, several of them [[online]], such as the [[SYSTRAN]] system which powers both [[Google]] translate and [[AltaVista]]'s [[Babel Fish (website)|Babel Fish]] as well as [[Promt]] that powers online translation services at Voila.fr and Orange.fr. [10530800] |Although no system provides the holy grail of "fully automatic high quality machine translation" (FAHQMT), many systems produce reasonable output. [10530810] |Despite their inherent limitations, MT programs are used around the world. [10530820] |Probably the largest institutional user is the [[European Commission]]. [10530830] |[[Toggletext]] uses a transfer-based system (known as Kataku) to translate between [[English language|English]] and [[Indonesian language|Indonesian]]. [10530840] |[[Google]] has claimed that promising results were obtained using a proprietary statistical machine translation engine. [10530850] |The statistical translation engine used in the [[Google tools#anchor_language_tools|Google language tools]] for Arabic <-> English and Chinese <-> English has an overall score of 0.4281 over the runner-up IBM's BLEU-4 score of 0.3954 (Summer 2006) in tests conducted by the National Institute for Standards and Technology. [10530860] |[[Uwe Muegge]] has implemented a demo website that uses a [[controlled language]] in combination with the [[Google tools#anchor_language_tools|Google tool]] to produce fully automatic, high-quality machine translations of his English, German, and French web sites. [10530870] |With the recent focus on terrorism, the military sources in the United States have been investing significant amounts of money in natural language engineering. [10530880] |''In-Q-Tel'' (a [[venture capital]] fund, largely funded by the US Intelligence Community, to stimulate new technologies through private sector entrepreneurs) brought up companies like [[Language Weaver]]. [10530890] |Currently the military community is interested in translation and processing of languages like [[Arabic language|Arabic]], [[Pashto language|Pashto]], and [[Dari language|Dari]]. [10530900] |Information Processing Technology Office in [[DARPA]] hosts programs like [[DARPA TIDES program|TIDES]] and [[Babylon translator|Babylon Translator]]. [10530910] |US Air Force has awarded a $1 million contract to develop a language translation technology. [10530920] |== Evaluation == [10530930] |There are various means for evaluating the performance of machine-translation systems. [10530940] |The oldest is the use of human judges to assess a translation's quality. [10530950] |Even though human evaluation is time-consuming, it is still the most reliable way to compare different systems such as rule-based and statistical systems. [10530960] |[[Automate]]d means of evaluation include [[Bilingual evaluation understudy|BLEU]], [[NIST (metric)|NIST]] and [[METEOR]]. [10530970] |Relying exclusively on machine translation ignores that communication in [[natural language|human language]] is [[wiktionary:context|context]]-embedded, and that it takes a human to adequately comprehend the context of the original text. [10530980] |Even purely human-generated translations are prone to error. [10530990] |Therefore, to ensure that a machine-generated translation will be of publishable quality and useful to a human, it must be reviewed and edited by a human. [10531000] |It has, however, been asserted that in certain applications, e.g. product descriptions written in a [[controlled language]], a [[dictionary-based machine translation|dictionary-based machine-translation]] system has produced satisfactory translations that require no human intervention. [10540010] |
Metadata
[10540020] |'''Metadata''' ('''meta data''', or sometimes '''metainformation''') is "data about data", of any sort in any media. [10540030] |An item of metadata may describe an individual [[datum]], or content item, or a collection of data including multiple content items and hierarchical levels, for example a [[database schema]]. [10540040] |== Purpose == [10540050] |Metadata provides context for data. [10540060] |Metadata is used to facilitate the understanding, characteristics, and management usage of data. [10540070] |The metadata required for effective data management varies with the type of data and context of use. [10540080] |In a [[library]], where the data is the content of the titles stocked, metadata about a title would typically include a description of the content, the [[author]], the publication date and the physical location. [10540090] |== Examples of Metadata == [10540100] |=== Camera === [10540110] |In the context of a [[camera]], where the data is the photographic image, metadata would typically include the date the [[photograph]] was taken and details of the camera settings (lens, focal length, aperture, shutter timing, white balance, etc.). [10540120] |=== Digital Music Player === [10540130] |On a digital portable music player, the album names, song titles and album art embedded in the music files are used to generate the artist and song listings, and are considered the metadata. [10540140] |=== Information system === [10540150] |In the context of an [[information system]], where the data is the content of the [[computer]] files, metadata about an individual data item would typically include the name of the field and its length. [10540160] |Metadata about a collection of data items, a computer file, might typically include the name of the file, the type of file and the name of the data administrator. [10540170] |''Italic text'' [10540180] |=== Real world location === [10540190] |If we consider a particular place in the real world, this may be described by data, for example: [10540200] |* 1 "E83BJ" . [10540210] |* 2 "17" [10540220] |* 3 "Sunny" [10540230] |To make sense of and use this data, context is important, and can be provided by metadata. [10540240] |The metadata for the above three items of data might include: [10540250] |* 1.1 "Post Code" – This is a brief description (or name) of the data item "E83BJ" [10540260] |* 1.2 "The unique identifier of a postal district" – This is another description (a definition) of "E83BJ" [10540270] |* 1.3 "27 June 2006" – This could also help describe "E83BJ", for example by giving the date it was last updated [10540280] |* 2 "Average temperature in degrees Celsius" – This is a possible description of "17" [10540290] |* 3 "Yesterday's weather" – This is a description of "sunny" [10540300] |An item of metadata is itself data and therefore may have its own metadata. [10540310] |For example, "Post Code" might have the following metadata: [10540320] |* 1.1.1 "data item name" [10540330] |* 1.1.2 "5 characters, starting with A – Z" [10540340] |"27 June 2006" might have the following metadata: [10540350] |* 1.3.1 "date last changed" [10540360] |* 1.3.2 "dd MMM yyyy" [10540370] |== Levels == [10540380] |The hierarchy of metadata descriptions can go on forever, but usually context or semantic understanding makes extensively detailed explanations unnecessary. [10540390] |The role played by any particular [[datum]] depends on the context. [10540400] |For example, when considering the geography of London, "E83BJ" would be a datum and "Post Code" would be metadatum. [10540410] |But, when considering the data management of an automated system that manages geographical data, "Post Code" might be a datum and then "data item name" and "5 characters, starting with A – Z" would be metadata. [10540420] |In any particular context, metadata characterizes the data it describes, not the entity described by that data. [10540430] |So, in relation to "E83BJ", the datum "is in London" is a further description of the place in the real world which has the post code "E83BJ", not of the code itself. [10540440] |Therefore, although it is providing information connected to "E83BJ" (telling us that this is the post code of a place in London), this would not normally be considered metadata, as it is describing "E83BJ" ''qua'' place in the real world and not ''qua'' data. [10540450] |== Definitions == [10540460] |=== Etymology === [10540470] |[[Meta]] is a classical Greek preposition (μετ’ αλλων εταιρων) and prefix (μεταβασις) conveying the following senses in English, depending upon the case of the associated noun: among; along with; with; by means of; in the midst of; after; behind. [10540480] |In [[epistemology]], the word means "about (its own category)"; thus metadata is "data about the data". [10540490] |=== Varying definitions === [10540500] |The term was introduced intuitively, without a formal definition. [10540510] |Because of that, today there are various definitions. [10540520] |The most common one is the literal translation: [10540530] |* "Data about data are referred to as metadata." [10540540] |Example: "12345" is data, and with no additional context is meaningless. [10540550] |When "12345" is given a meaningful name (metadata) of "[[ZIP code]]", one can understand (at least in the [[United States]], and further placing "ZIP code" within the context of a [[postal address]]) that "12345" refers to the [[General Electric]] plant in [[Schenectady, New York]]. [10540560] |As for most people the difference between data and [[information]] is merely a [[philosophical]] one of no relevance in practical use, other definitions are: [10540570] |* Metadata is information about data. [10540580] |* Metadata is information about information. [10540590] |* Metadata contains information about that data or other data [10540600] |There are more sophisticated definitions, such as: [10540610] |*"Metadata is structured, encoded data that describe characteristics of information-bearing entities to aid in the identification, discovery, assessment, and management of the described entities." [10540620] |* "[Metadata is a set of] optional structured descriptions that are publicly available to explicitly assist in locating objects." [10540630] |These are used more rarely because they tend to concentrate on one purpose of metadata — to find "objects", "entities" or "resources" — and ignore others, such as using metadata to optimize [[data compression|compression algorithms]], or to perform additional computations using the data. [10540640] |The metadata concept has been extended into the world of systems to include any "data about data": the names of tables, columns, programs, and the like. [10540650] |Different views of this "system metadata" are detailed below, but beyond that is the recognition that metadata can describe all aspects of systems: data, activities, people and organizations involved, locations of data and processes, access methods, limitations, timing and events, as well as motivation and rules. [10540660] |Fundamentally, then, metadata is "the data that describe the structure and workings of an organization's use of information, and which describe the systems it uses to manage that information". [10540670] |To do a model of metadata is to do an "[[Enterprise modeling|Enterprise model]]" of the information technology industry itself. [10540680] |=== Metadata and Markup === [10540690] |In the context of the web and the work of the [[W3C]] in providing markup technologies of [[HTML]], [[XML]] and [[SGML]] the concept of metadata has specific context that is perhaps clearer than in other information domains. [10540700] |With markup technologies there is metadata, markup and data content. [10540710] |The metadata describes characteristics about the data, while the markup identifies the specific type of data content and acts as a container for that document instance. [10540720] |This page in Wikipedia is itself an example of such usage, where the textual information is data, how it is packaged, linked, referenced, styled and displayed is markup and aspects and characteristics of that markup are metadata set globally across Wikipedia. [10540730] |In the context of markup the metadata is architected to allow optimization of document instances to contain only a minimum amount of metadata, while the metadata itself is likely referenced externally such as in a [[schema]] definition ([[XSD]]) instance. [10540740] |Also it should be noted that markup provides specialised mechanisms that handle referential data, again avoiding confusion over what is metadata or data, and allowing optimizations. [10540750] |The reference and ID mechanisms in markup allowing reference links between related data items, and links to data items that can then be repeated about a data item, such as an address or product details. [10540760] |These are then all themselves simply more data items and markup instances rather than metadata. [10540770] |Similarly there are concepts such as classifications, ontologies and associations for which markup mechanisms are provided. [10540780] |A data item can then be linked to such categories via markup and hence providing a clean delineation between what is metadata, and actual data instances. [10540790] |Therefore the concepts and descriptions in a classification would be metadata, but the actual classification entry for a data item is simply another data instance. [10540800] |Some examples can illustrate the points here. [10540810] |Items in bold are data content, in italic are metadata, normal text items are all markup. [10540820] |The two examples show in-line use of metadata within markup relating to a data instance (XML) compared to simple markup (HTML). [10540830] |A simple [[HTML]] instance example: [10540840] |<span style="normalText">'''Example'''</span> [10540850] |And then a [[XML]] instance example with metadata: [10540860] |'''John''' [10540870] |Where the inline assertion that a person's middle name may be an empty data item is metadata about the data item. [10540880] |Such definitions however are usually not placed inline in XML. [10540890] |Instead these definitions are moved away into the [[schema]] definition that contains the metadata for the entire document instance. [10540900] |This again illustrates another important aspect of metadata in the context of markup. [10540910] |The metadata is optimally defined only once for a collection of data instances. [10540920] |Hence repeated items of markup are rarely metadata, but rather more markup data instances themselves. [10540930] |=== Hierarchies of metadata === [10540940] |When structured into a hierarchical arrangement, metadata is more properly called an [[Ontology (computer science)|ontology]] or [[schema]]. [10540950] |Both terms describe "what exists" for some purpose or to enable some action. [10540960] |For instance, the arrangement of subject headings in a library catalog serves not only as a guide to finding books on a particular subject in the stacks, but also as a guide to what subjects "exist" in the library's own ontology and how more specialized topics are related to or derived from the more general subject headings. [10540970] |Metadata is frequently stored in a central location and used to help organizations standardize their data. [10540980] |This information is typically stored in a [[metadata registry]]. [10540990] |=== Difference between data and metadata === [10541000] |Usually it is not possible to distinguish between (plain) data and metadata because: [10541010] |*Something can be data and metadata at the same time. [10541020] |The headline of an article is both its title (metadata) and part of its text (data). [10541030] |* Data and metadata can change their roles. [10541040] |A poem, as such, would be regarded as data, but if there were a song that used it as lyrics, the whole poem could be attached to an audio file of the song as metadata. [10541050] |Thus, the labeling depends on the point of view. [10541060] |These considerations apply no matter which of the above definitions is considered, except where explicit markup is used to denote what is data and what is metadata. [10541070] |== Use == [10541080] |Metadata has many different applications; this section lists some of the most common. [10541090] |Metadata is used to speed up and enrich searching for resources. [10541100] |In general, search queries using metadata can save users from performing more complex filter operations manually. [10541110] |It is now common for web browsers (with the notable exception of Mozilla Firefox), P2P applications and media management software to automatically download and locally cache metadata, to improve the speed at which files can be accessed and searched. [10541120] |Metadata may also be associated to files manually. [10541130] |This is often the case with documents which are scanned into a document storage repository such as FileNet or Documentum. [10541140] |Once the documents have been converted into an electronic format a user brings the image up in a viewer application, manually reads the document and keys values into an online application to be stored in a metadata repository. [10541150] |Metadata provide additional information to users of the data it describes. [10541160] |This information may be descriptive ("These pictures were taken by children in the school's third grade class.") or algorithmic ("Checksum=139F"). [10541170] |Metadata helps to bridge the [[semantic gap]]. [10541180] |By telling a computer how data items are related and how these relations can be evaluated automatically, it becomes possible to process even more complex filter and search operations. [10541190] |For example, if a search engine understands that "Van Gogh" was a "Dutch painter", it can answer a search query on "Dutch painters" with a link to a web page about Vincent Van Gogh, although the exact words "Dutch painters" never occur on that page. [10541200] |This approach, called knowledge representation, is of special interest to the [[semantic web]] and [[artificial intelligence]]. [10541210] |Certain metadata is designed to optimize [[lossy compression]]. [10541220] |For example, if a video has metadata that allows a computer to tell foreground from background, the latter can be compressed more aggressively to achieve a higher compression rate. [10541230] |Some metadata is intended to enable variable content presentation. [10541240] |For example, if a picture has metadata that indicates the most important region — the one where there is a person — an image viewer on a small screen, such as on a mobile phone's, can narrow the picture to that region and thus show the user the most interesting details. [10541250] |A similar kind of metadata is intended to allow blind people to access diagrams and pictures, by converting them for special output devices or reading their description using [[speech synthesis|text-to-speech]] software. [10541260] |Other descriptive metadata can be used to automate workflows. [10541270] |For example, if a "smart" software tool knows content and structure of data, it can convert it automatically and pass it to another "smart" tool as input. [10541280] |As a result, users save the many [[cut, copy and paste|copy-and-paste]] operations required when analyzing data with "dumb" tools. [10541290] |Metadata is becoming an increasingly important part of [[electronic discovery]]. [http://www.lexbe.com/hp/indepth-e-discovery-rule-metadata.htm] Application and file system metadata derived from [[electronic document]]s and files can be important evidence. [10541300] |Recent changes to the [[Federal Rules of Civil Procedure]] make metadata routinely discoverable as part of [[Civil law (common law)|civil litigation]]. [10541310] |Parties to litigation are required to maintain and produce metadata as part of [[discovery (law)|discovery]], and [[spoliation of evidence|spoliation]] of metadata can lead to sanctions. [10541320] |Metadata has become important on the [[World Wide Web]] because of the need to find useful information from the mass of information available. [10541330] |Manually-created metadata adds value because it ensures consistency. [10541340] |If a web page about a certain topic contains a word or phrase, then all web pages about that topic should contain that same word or phrase. [10541350] |Metadata also ensures variety, so that if a topic goes by two names each will be used. [10541360] |For example, an article about "[[sport utility vehicle]]s" would also be [[tag (metadata)|tagged]] "4 wheel drives", "4WDs" and "four wheel drives", as this is how SUVs are known in some countries. [10541370] |Examples of metadata for an [[Compact Disc|audio CD]] include the [[MusicBrainz]] project and [[All Media Guide]]'s [[Allmusic]]. [10541380] |Similarly, [[MP3]] files have metadata tags in a format called [[ID3]]. [10541390] |== Types of metadata == [10541400] |Metadata can be classified by: [10541410] |* Content. [10541420] |Metadata can either describe the ''resource'' itself (for example, name and size of a file) or the ''content'' of the resource (for example, "This video shows a boy playing football"). [10541430] |* Mutability. [10541440] |With respect to the whole resource, metadata can be either ''immutable'' (for example, the "Title" of a video does not change as the video itself is being played) or ''mutable'' (the "Scene description" does change). [10541450] |* Logical function. [10541460] |There are three layers of logical function: at the bottom the ''subsymbolic'' layer that contains the raw data itself, then the ''symbolic'' layer with metadata describing the raw data, and on the top the ''logical'' layer containing metadata that allows logical reasoning using the symbolic layer [10541470] |== Important issues == [10541480] |To successfully develop and use metadata, several important issues should be treated with care: [10541490] |=== Metadata risks === [10541500] |[[Microsoft Office]] files include metadata beyond their printable content, such as the original author's name, the creation date of the document, and the amount of time spent editing it. [10541510] |Unintentional disclosure can be awkward or even, in professional practices requiring confidentiality, raise malpractice concerns. [10541520] |Some of Microsoft Office document's metadata can be seen by clicking ''File'' then ''Properties'' from the program's menu. [10541530] |Other metadata is not visible except through external analysis of a file, such as is done in forensics. [10541540] |The author of the Microsoft Word-based [[Melissa (computer worm)|Melissa]] computer virus in 1999 was caught due to Word metadata that uniquely identified the computer used to create the original infected document. [10541550] |=== Metadata lifecycle === [10541560] |Even in the early phases of planning and designing it is necessary to keep track of all metadata created. [10541570] |It is not economical to start attaching metadata only after the production process has been completed. [10541580] |For example, if metadata created by a digital camera at recording time is not stored immediately, it may have to be restored afterwards manually with great effort. [10541590] |Therefore, it is necessary for different groups of resource producers to cooperate using compatible methods and standards. [10541600] |* Manipulation. [10541610] |Metadata must adapt if the resource it describes changes. [10541620] |It should be merged when two resources are merged. [10541630] |These operations are seldom performed by today's software; for example, image editing programs usually do not keep track of the [[Exchangeable image file format|Exif]] metadata created by digital cameras. [10541640] |* Destruction. [10541650] |It can be useful to keep metadata even after the resource it describes has been destroyed, for example in change histories within a text document or to archive file deletions due to digital rights management. [10541660] |None of today's metadata standards consider this phase. [10541670] |=== Storage === [10541680] |Metadata can be stored either ''internally'', in the same file as the data, or ''externally'', in a separate file. [10541690] |Metadata that are embedded with content is called ''embedded metadata''. [10541700] |A data repository typically stores the metadata ''detached'' from the data. [10541710] |Both ways have advantages and disadvantages: [10541720] |*Internal storage allows transferring metadata together with the data it describes; thus, metadata is always at hand and can be manipulated easily. [10541730] |This method creates high redundancy and does not allow holding metadata together. [10541740] |* External storage allows bundling metadata, for example in a database, for more efficient searching. [10541750] |There is no redundancy and metadata can be transferred simultaneously when using [[streaming media|streaming]]. [10541760] |However, as most formats use [[Uniform Resource Identifier|URI]]s for that purpose, the method of how the metadata is linked to its data should be treated with care. [10541770] |What if a resource does not have a URI (resources on a local hard disk or web pages that are created on-the-fly using a content management system)? [10541780] |What if metadata can only be evaluated if there is a connection to the Web, especially when using [[Resource Description Framework|RDF]]? [10541790] |How to realize that a resource is replaced by another with the same name but different content? [10541800] |Moreover, there is the question of data format: storing metadata in a human-readable format such as XML can be useful because users can understand and edit it without specialized tools. [10541810] |On the other hand, these formats are not optimized for storage capacity; it may be useful to store metadata in a binary, non-human-readable format instead to speed up transfer and save memory. [10541820] |== Criticisms == [10541830] |Although the majority of computer scientists see metadata as a chance for better interoperability, some critics argue: [10541840] |*Metadata is too expensive and time-consuming. [10541850] |The argument is that companies will not produce metadata without need because it costs extra money, and private users also will not produce complex metadata because its creation is very time-consuming. [10541860] |* Metadata is too complicated. [10541870] |Private users will not create metadata because existing formats, especially [[MPEG-7]], are too complicated. [10541880] |As long as there are no automatic tools for creating metadata, it will not be created. [10541890] |* Metadata is subjective and depends on context. [10541900] |Most probably, two persons will attach different metadata to the same resource due to their different points of view. [10541910] |Moreover, metadata can be misinterpreted due to its dependency on context. [10541920] |For example searching for "post-modern art" may miss a certain item because the expression was not in use at the time when that work of art was created, or searching for "pictures taken at 1:00" may produce confusing results due to local time differences. [10541930] |* There is no end to metadata. [10541940] |For example, when annotating a match of soccer with metadata, one can describe all the players and their actions in time and stop there. [10541950] |One can also describe the advertisements in the background and the clothes the players wear. [10541960] |One can also describe each fan on the tribune and the clothes they wear. [10541970] |All of this metadata can be interesting to one party or another — such as the spectators, sponsors or a counter-terrorist unit of the police — and even for a simple resource the amount of possible metadata can be gigantic. [10541980] |* Metadata is useless. [10541990] |Many of today's search engines are very efficient at finding text. [10542000] |Other techniques for finding pictures, videos and music (namely query-by-example) will become more and more powerful in the future. [10542010] |Thus, there is no real need for metadata. [10542020] |The opposers of metadata sometimes use the term [[metacrap]] to refer to the unsolved problems of metadata in some scenarios. [10542030] |These people are also referred to as "Meta Haters." [10542040] |== Types == [10542050] |In general, there are two distinct classes of metadata: structural or control metadata and guide metadata. [10542060] |Structural metadata is used to describe the structure of computer systems such as tables, columns and indexes. [10542070] |Guide metadata is used to help humans find specific items and is usually expressed as a set of keywords in a natural language. [10542080] |Metatadata can be divided into 3 distinct categories: [10542090] |* Descriptive [10542100] |* Administrative [10542110] |* Structural [10542120] |=== Relational database metadata === [10542130] |Each [[relational database]] system has its own mechanisms for storing metadata. [10542140] |Examples of relational-database metadata include: [10542150] |*Tables of all tables in database, their names, sizes and number of rows in each table. [10542160] |* Tables of columns in each database, what tables they are used in, and the type of data stored in each column. [10542170] |In database terminology, this set of metadata is referred to as the [[database catalog|catalog]]. [10542180] |The [[SQL]] standard specifies a uniform means to access the catalog, called the INFORMATION_SCHEMA, but not all databases implement it, even if they implement other aspects of the SQL standard. [10542190] |For an example of database-specific metadata access methods, see [[Oracle metadata]]. [10542200] |=== Data warehouse metadata === [10542210] |[[Data warehouse]] metadata systems are sometimes separated into two sections: [10542220] |# '''back room''' metadata that are used for [[Extract, transform, load]] functions to get [[OLTP]] data into a data warehouse [10542230] |# '''front room''' metadata that are used to label screens and create reports [10542240] |Kimball lists the following types of metadata in a data warehouse (See also [http://www.fortunecity.com/skyscraper/oracle/699/orahtml/dbmsmag/9803d05.html]): [10542250] |* [[source system]] metadata [10542260] |** source specifications, such as [[repository|repositories]], and source [[logical schema]]s [10542270] |** source descriptive information, such as ownership descriptions, update frequencies, legal limitations, and [[access method]]s [10542280] |** process information, such as job schedules and extraction code [10542290] |* [[data staging]] metadata [10542300] |** [[data acquisition]] information, such as [[data transmission]] scheduling and results, and file usage [10542310] |** [[dimension table]] management, such as definitions of dimensions, and [[surrogate key]] assignments [10542320] |** [[Program transformation|transformation]] and [[aggregation]], such as [[data enhancement]] and mapping, [[DBMS]] load scripts, and aggregate definitions [10542330] |** audit, job logs and documentation, such as [[data lineage]] records, [[data transform]] logs [10542340] |* DBMS metadata, such as: [10542350] |** DBMS system table contents [10542360] |** processing hints [10542370] |Michael Bracket defines metadata (what he calls "Data resource data") as "any data about the organization's data resource". [10542380] |Adrienne Tannenbaum defines metadata as "the detailed description of instance data. [10542390] |The format and characteristics of populated instance data: instances and values, dependent on the role of the metadata recipient". [10542400] |These definitions are characteristic of the "data about data" definition. [10542410] |=== Business Intelligence metadata === [10542420] |[[Business Intelligence]] is the process of analyzing large amounts of corporate data, usually stored in large databases such as the [[Data Warehouse]], tracking business performance, detecting patterns and trends, and helping enterprise business users make better decisions. [10542430] |Business Intelligence metadata describes how data is queried, filtered, analyzed, and displayed in Business Intelligence software tools, such as Reporting tools, OLAP tools, Data Mining tools. [10542440] |Examples: [10542450] |* [[Online analytical processing|OLAP]] metadata: The descriptions and structures of Dimensions, Cubes, Measures (Metrics), Hierarchies, Levels, Drill Paths [10542460] |* Reporting metadata: The descriptions and structures of Reports, Charts, Queries, DataSets, Filters, Variables, Expressions [10542470] |* [[Data Mining]] metadata: The descriptions and structures of DataSets, Algorithms, Queries [10542480] |Business Intelligence metadata can be used to understand how corporate financial reports reported to [[Wall Street]] are calculated, how the revenue, expense and profit are aggregated from individual sales transactions stored in the data warehouse. [10542490] |A good understanding of Business Intelligence metadata is required to solve complex problems such as compliance with corporate governance standards, such as [[Sarbanes Oxley]] (SOX) or Basel II. [10542500] |=== General IT metadata === [10542510] |In contrast, David Marco, another metadata theorist, defines metadata as "all physical data and knowledge from inside and outside an organization, including information about the physical data, technical and business processes, rules and constraints of the data, and structures of the data used by a corporation." [10542520] |Others have included web services, systems and interfaces. [10542530] |In fact, the entire [[Zachman framework]] (see [[Enterprise Architecture]]) can be represented as metadata. [10542540] |Notice that such definitions expand metadata's scope considerably, to encompass most or all of the data required by the [[Management Information System]]s capability. [10542550] |In this sense, the concept of metadata has significant overlaps with the [[ITIL]] concept of a Configuration Management Database ([[CMDB]]), and also with disciplines such as [[Enterprise Architecture]] and [[IT portfolio management]]. [10542560] |This broader definition of metadata has precedent. [10542570] |Third generation corporate repository products (such as those eventually merged into the CA Advantage line) not only store information about data definitions (COBOL copybooks, DBMS schema), but also about the programs accessing those data structures, and the [[Job Control Language]] and batch job infrastructure dependencies as well. [10542580] |These products (some of which are still in production) can provide a very complete picture of a mainframe computing environment, supporting exactly the kinds of impact analysis required for ITIL-based processes such as [[ITIL#Incident Management|Incident]] and [[Change Management (ITIL)|Change Management]]. [10542590] |The [[ITIL]] [http://www.tso.co.uk/itil/ Back Catalogue] includes the ''Data Management'' volume which recognizes the role of these metadata products on the mainframe, posing the [[CMDB]] as the distributed computing equivalent. [10542600] |CMDB vendors however have generally not expanded their scope to include data definitions, and metadata solutions are also available in the distributed world. [10542610] |Determining the appropriate role and scope for each is thus a challenge for large IT organizations requiring the services of both. [10542620] |Since metadata is pervasive, centralized attempts at tracking it need to focus on the most highly leveraged assets. [10542630] |Enterprise Assets may only constitute a small percentage of the entire IT portfolio. [10542640] |Some practitioners have successfully managed IT metadata using the [[Dublin Core]] metamodel. [10542650] |==== IT metadata management products ==== [10542660] |First generation data dictionary/metadata repository tools would be those only supporting a specific [[DBMS]], such as [[IDMS]]'s IDD (integrated data dictionary), the [[Information Management System|IMS]] Data Dictionary, and [[ADABAS]]'s Predict. [10542670] |Second generation would be ASG's DATAMANAGER product which could support many different file and DBMS types. [10542680] |Third generation repository products became briefly popular in the early 1990s along with the rise of widespread use of [[RDBMS]] engines such as IBM's [[IBM DB2|DB2]]. [10542690] |Fourth generation products link the repository with more [[Extract, transform, load]] tools and can be connected with architectural modeling tools. [10542700] |Examples include [http://www.adaptive.com/products/mm.html Adaptive Metadata Manager] from Adaptive, [http://www.asg.com/products/product_details.asp?code=ROC&id=50 Rochade] from ASG,[http://www.infolibcorp.com/productsOverview.html InfoLibrarian Metadata Integration Framework] and [[Troux Technologies]] Metis Server product. [10542710] |=== File system metadata === [10542720] |Nearly all [[file system]]s keep metadata about files [[out-of-band]]. [10542730] |Some systems keep metadata in [[directory (file systems)|directory]] entries; others in specialized structure like [[inode]]s or even in the name of a file. [10542740] |Metadata can range from simple [[timestamp]]s, [[mode bit]]s, and other special-purpose information used by the implementation itself, to [[icon (computing)|icon]]s and free-text comments, to arbitrary [[attribute-value pair]]s. [10542750] |With more complex and open-ended metadata, it becomes useful to search for files based on the metadata contents. [10542760] |The [[Unix]] [[find]] utility was an early example, although inefficient when scanning hundreds of thousands of files on a modern computer system. [10542770] |[[Apple Computer]]'s [[Mac OS X]] operating system supports cataloguing and searching for file metadata through a feature known as [[Spotlight (software)|Spotlight]], as of [[Mac OS X v10.4|version 10.4]]. [10542780] |[[Microsoft]] worked in the development of similar functionality with the [[Instant Search]] system in [[Windows Vista]], as well as being present in [[SharePoint Server]]. [10542790] |[[Linux]] implements file metadata using [[extended file attributes]]. [10542800] |=== Image metadata === [10542810] |Examples of image files containing metadata include [[Exchangeable image file format]] (EXIF) and [[Tagged Image File Format]] (TIFF). [10542820] |Having metadata about images embedded in TIFF or EXIF files is one way of acquiring additional data about an image. [10542830] |[[Tag (metadata)|Tagging]] pictures with subjects, related emotions, and other descriptive phrases helps Internet users find pictures easily rather than having to search through entire image collections. [10542840] |A prime example of an image tagging service is [[Flickr]], where users upload images and then describe the contents. [10542850] |Other patrons of the site can then search for those tags. [10542860] |Flickr uses a [[folksonomy]]: a free-text keyword system in which the community defines the vocabulary through use rather than through a [[controlled vocabulary]]. [10542870] |Users can also tag photos for organization purposes using Adobe's [[Extensible Metadata Platform]] (XMP) language, for example. [10542880] |Digital photography is increasingly making use of technical metadata tags describing the conditions of exposure. [10542890] |Photographers shooting [[RAW image format|Camera RAW]] file formats can use applications such as [[Adobe Bridge]] or Apple Computer's [[Aperture (photography software)|Aperture]] to work with camera metadata for post-processing. [10542900] |=== Audio Metadata === [10542910] |Audio metadata generally relates to the how the data should be written in order for a processor to efficiently process it. [10542920] |These technologies are usually seen in Audio Engine Programming such as Microsoft [[Resource Interchange File Format|RIFF (Resource Interchange File Format)]] technologies for .wave file. [10542930] |Codes generally develop their own metadata standards for compression purpose. [10542940] |=== Program metadata === [10542950] |Metadata is casually used to describe the controlling data used in software architectures that are more abstract or configurable. [10542960] |Most '''[[executable|executable file]]''' formats include what may be termed "metadata" that specifies certain, usually configurable, behavioral [[runtime]] characteristics. [10542970] |However, it is difficult if not impossible to precisely distinguish program "metadata" from general aspects of [[Von Neumann architecture|stored-program computing architecture]]; if the machine reads it and acts upon it, it is a computational [[Instruction (computer science)|instruction]], and the prefix "meta" has little significance. [10542980] |In [[Java (programming language)|Java]], the [[Class (file format)|class file format]] contains metadata used by the [[Java compiler]] and the [[Java virtual machine]] to [[dynamic linking|dynamically link]] [[class (computer science)|classes]] and to support [[reflection (computer science)|reflection]]. [10542990] |The [[J2SE]] 5.0 version of Java included a [[metadata facility for Java|metadata facility]] to allow additional annotations that are used by [[development tool]]s. [10543000] |In [[MS-DOS]], the [[COM file]] format does ''not'' include metadata, while the [[EXE]] file and Windows [[Portable Executable|PE]] formats do. [10543010] |These metadata can include the company that published the program, the date the program was created, the version number and more. [10543020] |In the [[.NET Framework|Microsoft .NET]] executable format, extra metadata is included to allow [[Reflection (computer science)|reflection]] at runtime. [10543030] |=== Existing software metadata === [10543040] |[[Object Management Group]] (OMG) has defined metadata format for representing entire existing applications for the purposes of [[software mining]], [[software modernization]] and software assurance. [10543050] |This specification, called the OMG [[Knowledge Discovery Metamodel]] (KDM) is the OMG's foundation for "modeling in reverse". [10543060] |KDM is a common language-independent intermediate representation that provides an integrated view of an entire enterprise application, including its behavior (program flow), data, and structure. [10543070] |One of the applications of KDM is Business Rules Mining. [10543080] |[[Knowledge Discovery Metamodel]] includes a fine grained low-level representation (called "micro KDM"), suitable for performing static analysis of programs. [10543090] |=== Document metadata === [10543100] |Most programs that create documents, including Microsoft [[SharePoint]], [[Microsoft Office Word|Microsoft Word]] and other [[Microsoft Office]] products, save metadata with the document files. [10543110] |These metadata can contain the name of the person who created the file (obtained from the operating system), the name of the person who last edited the file, how many times the file has been printed, and even how many revisions have been made on the file. [10543120] |Other saved material, such as deleted text (saved in case of an undelete command), document comments and the like, is also commonly referred to as "metadata", and the inadvertent inclusion of this material in distributed files has sometimes led to undesirable disclosures. [10543130] |Document Metadata is particularly important in legal environments where litigation can request this sensitive information (metadata) which can include many elements of private detrimental data. [10543140] |This data has been linked to multiple lawsuits that have got corporations into legal complications. [10543150] |Many legal firms today use "Metadata Management Software", also known as "Metadata Removal Tools". [10543160] |This software can be used to clean documents before they are sent outside of their firm. [10543170] |This process, known as metadata management, protects lawfirms from potentially unsafe leaking of sensitive data through [[Electronic Discovery]]. [10543180] |For a list of executable formats, see [[object file]]. [10543190] |=== Metamodels === [10543200] |Metadata on Models are called [[Metamodel]]s. [10543210] |In [[Model Driven Engineering]], a [[Model (abstract)|Model]] has to conform to a given [[Metamodel]]. [10543220] |According to the [[model-driven architecture|MDA]] guide, a metamodel is a model and each model conforms to a given metamodel. [10543230] |[[Meta-modeling]] allows strict and agile automatic processing of models and metamodels. [10543240] |The [[Object Management Group]] (OMG) defines 4 layers of meta-modeling. [10543250] |Each level of modeling is defined, validated by the next layer: [10543260] |*M0: instance object, data row, record -> "John Smith" [10543270] |* M1: model, schema -> "Customer" UML Class or database Table [10543280] |* M2: metamodel -> [[Unified Modeling Language]] (UML), [[Common Warehouse Metamodel]] (CWM), [[Knowledge Discovery Metamodel]] (KDM) [10543290] |* M3: meta-metamodel -> [[Meta-Object Facility]] (MOF) [10543300] |=== Meta-metadata === [10543310] |Since metadata are also data, it is possible to have metadata of metadata–"meta-metadata." [10543320] |Machine-generated meta-metadata, such as the reversed index created by a free-text search engine, is generally not considered metadata, though. [10543330] |=== Digital library metadata === [10543340] |There are three categories of metadata that are frequently used to describe objects in a digital library: [10543350] |# '''descriptive''' - Information describing the intellectual content of the object, such as [[MARC]] cataloguing records, finding aids or similar schemes. [10543360] |It is typically used for bibliographic purposes and for search and retrieval. [10543370] |# '''structural''' - Information that ties each object to others to make up logical units (e.g., information that relates individual images of pages from a book to the others that make up the book). [10543380] |# '''administrative''' - Information used to manage the object or control access to it. [10543390] |This may include information on how it was scanned, its storage format, [[copyright]] and licensing information, and information necessary for the [[digital preservation|long-term preservation]] of the digital objects. [10543400] |=== Geospatial metadata === [10543410] |Metadata that describe geographic objects (such as datasets, maps, features, or simply documents with a geospatial component) have a history going back to at least 1994 (refer [http://libraries.mit.edu/guides/subjects/metadata/standards/fgdc.html MIT Library page on FGDC Metadata]). [10543420] |This class of metadata is described more fully on the [[Geospatial metadata]] page. [10550010] |
Microsoft Windows
[10550020] |'''Microsoft Windows''' is a series of [[software]] [[operating system]]s produced by [[Microsoft]]. [10550030] |Microsoft first introduced an operating environment named ''Windows'' in November 1985 as an add-on to [[MS-DOS]] in response to the growing interest in [[graphical user interface]]s (GUIs). [10550040] |Microsoft Windows came to [[Market dominance|dominate]] the world's [[personal computer]] market, overtaking [[Mac OS]], which had been introduced previously. [10550050] |At the 2004 [[International Data Corporation|IDC]] Directions conference, it was stated that Windows had approximately 90% of the [[Client (computing)|client]] operating system market. [10550060] |The most recent client version of Windows is [[Windows Vista]]; the current [[Server (computing)|server]] version is [[Windows Server 2008]]. [10550070] |==Versions== [10550080] |The term ''Windows'' collectively describes any or all of several generations of Microsoft (MS) operating system (OS) products. [10550090] |These products are generally categorized as follows: [10550100] |===16-bit operating environments=== [10550110] |The early versions of Windows were often thought of as just graphical user interfaces, mostly because they ran on top of [[MS-DOS]] and used it for [[file system]] services. [10550120] |However, even the earliest 16-bit Windows versions already assumed many typical operating system functions, notably, having their own [[executable file format]] and providing their own [[device driver]]s (timer, graphics, printer, mouse, keyboard and sound) for applications. [10550130] |Unlike [[MS-DOS]], Windows allowed users to execute multiple graphical applications at the same time, through [[computer multitasking|cooperative multitasking]]. [10550140] |Finally, Windows implemented an elaborate, segment-based, software virtual memory scheme, which allowed it to run applications larger than available memory: code segments and [[resource (Windows)|resource]]s were swapped in and thrown away when memory became scarce, and data segments moved in memory when a given application had relinquished processor control, typically waiting for user input. [10550150] |16-bit Windows versions include [[Windows 1.0]] (1985), [[Windows 2.0]] (1987) and its close relatives, ''[[Windows 2.1x|Windows/286-Windows/386]]''. [10550160] |===Hybrid 16/32-bit operating environments=== [10550170] |[[Windows 2.1x|Windows/386]] introduced a 32-bit [[protected mode]] [[kernel (computer science)|kernel]] and [[virtual machine]] monitor. [10550180] |For the duration of a Windows session, it created one or more [[virtual 8086 mode|virtual 8086 environments]] and provided device virtualization for the video card, keyboard, mouse, timer and [[interrupt]] controller inside each of them. [10550190] |The user-visible consequence was that it became possible to preemptively multitask multiple MS-DOS environments in separate windows, although graphical MS-DOS applications required full screen mode. [10550200] |Also, Windows applications were multi-tasked cooperatively inside one such virtual 8086 environment. [10550210] |[[Windows 3.0]] (1990) and [[Windows 3.1x|Windows 3.1]] (1992) improved the design, mostly because of [[virtual memory]] and loadable virtual device drivers ([[VxD]]s) which allowed them to share arbitrary devices between multitasked DOS windows. [10550220] |Also, Windows applications could now run in protected mode (when Windows was running in Standard or 386 Enhanced Mode), which gave them access to several megabytes of memory and removed the obligation to participate in the software virtual memory scheme. [10550230] |They still ran inside the same address space, where the segmented memory provided a degree of protection, and multi-tasked cooperatively. [10550240] |For Windows 3.0, Microsoft also rewrote critical operations from [[C (programming language)|C]] into [[Assembly language|assembly]], making this release faster and less memory-hungry than its predecessors. [10550250] |===Hybrid 16/32-bit operating systems=== [10550260] |With the introduction of the [[32-bit]] [[Windows 3.1x|Windows for Workgroups 3.11]], Windows was able to stop relying on DOS for file management. [10550270] |Leveraging this, [[Windows 95]] introduced [[Long filename|Long File Names]], reducing the [[8.3 filename]] DOS environment to the role of a [[boot loader]]. [10550280] |MS-DOS was now bundled with Windows; this notably made it (partially) aware of long file names when its utilities were run from within Windows. [10550290] |The most important novelty was the possibility of running 32-bit multi-threaded preemptively multitasked graphical programs. [10550300] |However, the necessity of keeping compatibility with 16-bit programs meant the GUI components were still 16-bit only and not fully reentrant, which resulted in reduced performance and stability. [10550310] |There were three releases of Windows 95 (the first in 1995, then subsequent bug-fix versions in 1996 and 1997, only released to OEMs, which added extra features such as [[File Allocation Table|FAT32]] and primitive USB support). [10550320] |Microsoft's next OS was [[Windows 98]]; there were two versions of this (the first in 1998 and the second, named "Windows 98 Second Edition", in 1999). [10550330] |In 2000, Microsoft released [[Windows Me]] (''Me'' standing for ''Millennium Edition''), which used the same core as Windows 98 but adopted some aspects of Windows 2000 and removed the option boot into DOS mode. [10550340] |It also added a new feature called System Restore, allowing the user to set the computer's settings back to an earlier date. [10550350] |===32-bit operating systems=== [10550360] |The NT family of Windows systems was fashioned and marketed for higher reliability business use, and was unencumbered by any Microsoft DOS patrimony. [10550370] |The first release was [[Windows NT 3.1]] (1993, numbered "3.1" to match the Windows version and to one-up [[OS/2]] 2.1, IBM's flagship OS co-developed by Microsoft and was Windows NT's main competitor at the time), which was followed by [[Windows NT 3.5|NT 3.5]] (1994), [[Windows NT 3.51|NT 3.51]] (1995), [[Windows NT 4.0|NT 4.0]] (1996), and [[Windows 2000]] (essentially NT 5.0). [10550380] |NT 4.0 was the first in this line to implement the "Windows 95" user interface (and the first to include Windows 95's built-in 32-bit runtimes). [10550390] |Microsoft then moved to combine their consumer and business operating systems. [10550400] |[[Windows XP]], coming in both home and professional versions (and later niche market versions for [[tablet PC]]s and [[media center]]s) improved stability, user experience and backwards compatibility. [10550410] |Then, [[Windows Server 2003]] brought [[Windows Server]] up to date with Windows XP. [10550420] |Since then, a new version, [[Windows Vista]] was released and [[Windows Server 2008]], released on [[February 27]], [[2008]], brings [[Windows Server]] up to date with [[Windows Vista]]. [10550430] |[[Windows CE]], Microsoft's offering in the mobile and embedded markets, is also a true 32-bit operating system that offers various services for all sub-operating workstations. [10550440] |===64-bit operating systems=== [10550450] |[[Windows NT]] included support for several different platforms before the [[X86 architecture|x86]]-based [[personal computer]] became dominant in the professional world. [10550460] |Versions of NT from 3.1 to 4.0 variously supported [[PowerPC]], [[DEC Alpha]] and [[MIPS Technologies|MIPS]] R4000, some of which were 64-bit processors, although the operating system treated them as 32-bit processors. [10550470] |With the introduction of the [[Intel]] [[Itanium]] architecture, which is referred to as [[IA-64]], Microsoft released new versions of Windows to support it. [10550480] |Itanium versions of [[Windows XP]] and [[Windows Server 2003]] were released at the same time as their mainstream x86 (32-bit) counterparts. [10550490] |On [[April 25]] [[2005]], Microsoft released [[Windows XP Professional x64 Edition]] and x64 versions of Windows Server 2003 to support the [[x86-64|AMD64/Intel64]] (or ''x64'' in Microsoft terminology) architecture. [10550500] |Microsoft dropped support for the Itanium version of Windows XP in 2005. [10550510] |[[Windows Vista]] is the first end-user version of Windows that Microsoft has released simultaneously in 32-bit and x64 editions. [10550520] |Windows Vista does not support the Itanium architecture. [10550530] |The modern 64-bit Windows family comprises AMD64/Intel64 versions of [[Windows Vista]], and [[Windows Server 2003]] and [[Windows Server 2008]], in both Itanium and x64 editions. [10550540] |==History== [10550550] |Microsoft has taken two parallel routes in its operating systems. [10550560] |One route has been for the home user and the other has been for the professional IT user. [10550570] |The dual routes have generally led to home versions having greater [[multimedia]] support and less functionality in networking and security, and professional versions having inferior multimedia support and better networking and security. [10550580] |The first version of Microsoft Windows, [[Windows 1.0|version 1.0]], released in November 1985, lacked a degree of functionality and achieved little popularity, and was to compete with Apple's own operating system. [10550590] |Windows 1.0 is not a complete operating system; rather, it extends MS-DOS. [10550600] |Microsoft Windows version 2.0 was released in November, 1987 and was slightly more popular than its predecessor. [10550610] |Windows 2.03 (release date January 1988) had changed the OS from tiled windows to overlapping windows. [10550620] |The result of this change led to Apple Computer filing a suit against Microsoft alleging infringement on Apple's copyrights. [10550630] |Microsoft Windows version 3.0, released in 1990, was the first Microsoft Windows version to achieve broad commercial success, selling 2 million copies in the first six months.[http://www.islandnet.com/~kpolsson/compsoft/soft1991.htm][http://www.thocp.net/companies/microsoft/microsoft_company.htm] It featured improvements to the user interface and to multitasking capabilities. [10550640] |It received a facelift in Windows 3.1, made generally available on [[March 1]], [[1992]]. [10550650] |Windows 3.1 support ended on [[December 31]], [[2001]]. [10550660] |In July 1993, Microsoft released [[Windows NT]] based on a new kernel. [10550670] |NT was considered to be the professional OS and was the first Windows version to utilize [[preemptive multitasking]].. [10550680] |Windows NT would later be retooled to also function as a home operating system, with Windows XP. [10550690] |On August 24th 1995, Microsoft released [[Windows 95]], a new, and major, consumer version that made further changes to the user interface, and also used [[preemptive multitasking]]. [10550700] |Windows 95 was designed to replace not only Windows 3.1, but also Windows for Workgroups, and MS-DOS. [10550710] |It was also the first Windows operating system to use Plug and Play capabilities. [10550720] |The changes Windows 95 brought to the desktop were revolutionary, as opposed to evolutionary, such as those in Windows 98 and Windows Me. [10550730] |Mainstream support for [[Windows 95]] ended on [[December 31]], [[2000]] and extended support for [[Windows 95]] ended on [[December 31]], [[2001]]. [10550740] |The next in the consumer line was Microsoft [[Windows 98]] released on June 25th, 1998. [10550750] |It was substantially criticized for its slowness and for its unreliability compared with [[Windows 95]], but many of its basic problems were later rectified with the release of [[Windows 98]] Second Edition in 1999. [10550760] |Mainstream support for [[Windows 98]] ended on [[June 30]], [[2002]] and extended support for [[Windows 98]] ended on [[July 11]], [[2006]]. [10550770] |As part of its "professional" line, Microsoft released [[Windows 2000]] in February 2000. [10550780] |The consumer version following Windows 98 was [[Windows Me]] (Windows Millennium Edition). [10550790] |Released in September 2000, [[Windows Me]] implemented a number of new technologies for Microsoft: most notably publicized was "[[Universal Plug and Play]]." [10550800] |In October 2001, Microsoft released [[Windows XP]], a version built on the Windows NT [[Kernel (computer science)|kernel]] that also retained the consumer-oriented usability of Windows 95 and its successors. [10550810] |This new version was widely praised in computer magazines. [10550820] |It shipped in two distinct editions, "Home" and "Professional", the former lacking many of the superior security and networking features of the Professional edition. [10550830] |Additionally, the first "Media Center" edition was released in 2002, with an emphasis on support for DVD and TV functionality including program recording and a remote control. [10550840] |Mainstream support for [[Windows XP]] will continue until [[April 14]], [[2009]] and extended support will continue until [[April 8]], [[2014]]. [10550850] |In April 2003, [[Windows Server 2003]] was introduced, replacing the [[Windows 2000]] line of server products with a number of new features and a strong focus on security; this was followed in December 2005 by Windows Server 2003 R2. [10550860] |On [[January 30]], [[2007]] Microsoft released [[Windows Vista]]. [10550870] |It contains a number of [[Features new to Windows Vista|new features]], from a redesigned shell and user interface to significant [[Technical features new to Windows Vista|technical changes]], with a particular focus on [[Security and safety features new to Windows Vista|security features]]. [10550880] |It is available in a number of [[Windows Vista editions and pricing|different editions]], and has been subject to [[Criticism of Windows Vista|some criticism]]. [10550890] |==Timeline of releases== [10550900] |==Security== [10550910] |[[Computer security|Security]] has been a hot topic with Windows for many years, and even Microsoft itself has been the victim of security breaches. [10550920] |Consumer versions of Windows were originally designed for ease-of-use on a single-user PC without a network connection, and did not have security features built in from the outset. [10550930] |[[Windows NT]] and its successors are designed for security (including on a network) and multi-user PCs, but are not designed with Internet security in mind as much since, when it was first developed in the early 1990s, Internet use was less prevalent. [10550940] |These design issues combined with flawed code (such as [[buffer overflow]]s) and the popularity of Windows means that it is a frequent target of [[computer worm|worm]] and [[computer virus|virus]] writers. [10550950] |In June 2005, [[Bruce Schneier]]'s ''Counterpane Internet Security'' reported that it had seen over 1,000 new viruses and worms in the previous six months. [10550960] |Microsoft releases security patches through its [[Windows Update]] service approximately once a month (usually the second Tuesday of the month), although critical updates are made available at shorter intervals when necessary. [10550970] |In Windows 2000 (SP3 and later), Windows XP and Windows Server 2003, updates can be automatically downloaded and installed if the user selects to do so. [10550980] |As a result, Service Pack 2 for Windows XP, as well as Service Pack 1 for Windows Server 2003, were installed by users more quickly than it otherwise might have been. [10550990] |===Windows Defender=== [10551000] |On [[6 January]] [[2005]], Microsoft released a beta version of Microsoft AntiSpyware, based upon the previously released [[GIANT Company Software|Giant]] AntiSpyware. [10551010] |On [[14 February]], [[2006]], Microsoft AntiSpyware became [[Windows Defender]] with the release of beta 2. [10551020] |Windows Defender is a freeware program designed to protect against spyware and other unwanted software. [10551030] |[[Windows XP]] and [[Windows Server 2003]] users who have [[Windows Genuine Advantage|genuine]] copies of Microsoft Windows can freely download the program from Microsoft's web site, and Windows Defender ships as part of [[Windows Vista]]. [10551040] |===Third-party analysis=== [10551050] |In an article based on a report by Symantec, internetnews.com has described Microsoft Windows as having the "fewest number of patches and the shortest average patch development time of the five operating systems it monitored in the last six months of 2006." [10551060] |And the number of vulnerabilities found in Windows has significantly increased— Windows: 12+, Red Hat + Fedora: 2, Mac OS X: 1, HP-UX: 2, Solaris: 1. [10551070] |A study conducted by [[Kevin Mitnick]] and marketing communications firm Avantgarde in 2004 found that an unprotected and unpatched Windows XP system with Service Pack 1 lasted only 4 minutes on the Internet before it was compromised, and an unprotected and also unpatched [[Windows Server 2003]] system was compromised after being connected to the internet for 8 hours. [10551080] |However, it is important to note that this study does not apply to Windows XP systems running the Service Pack 2 update (released in late 2004), which vastly improved the security of Windows XP. [10551090] |The computer that was running Windows XP Service Pack 2 was not compromised. [10551100] |The [[AOL]] National Cyber Security Alliance Online Safety Study of October 2004 determined that 80% of Windows users were infected by at least one [[spyware]]/[[adware]] product. [10551110] |Much documentation is available describing how to increase the security of Microsoft Windows products. [10551120] |Typical suggestions include deploying Microsoft Windows behind a hardware or software [[firewall]], running [[anti-virus]] and [[anti-spyware]] software, and installing patches as they become available through [[Windows Update]]. [10551130] |==Windows Lifecycle Policy== [10551140] |Microsoft has stopped releasing updates and hotfixes for many old Windows operating systems, including all versions of Windows 9x and earlier versions of Windows NT. [10551150] |Windows versions prior to [[Windows XP|XP]] are no longer supported, with the exception of [[Windows 2000]], which is currently in the Extended Support Period, that will end on [[July 13]], [[2010]]. [10551160] |Windows XP versions prior to SP2 are no longer supported either. [10551170] |Also, support for [[Windows XP 64-bit Edition]] ended after the release of the more recent [[Windows XP Professional x64 Edition]]. [10551180] |No new updates are created for unsupported versions of Windows. [10551190] |==Emulation software== [10551200] |Emulation allows the use of some Windows applications without using Microsoft Windows. [10551210] |These include: [10551220] |* [[Wine (software)|Wine]] - a [[free and open source software]] implementation of the [[Windows API]], allowing one to run many Windows applications on x86-based platforms, including [[Linux]]. [10551230] |Wine is technically not an emulator but a "compatibility layer"; while an emulator effectively 'pretends' to be a different CPU, Wine instead makes use of Windows-style APIs to 'simulate' the Windows environment directly. [10551240] |** [[CrossOver]] - A Wine package with licensed fonts. [10551250] |Its developers are regular contributors to Wine, and focus on Wine running officially supported applications. [10551260] |** [[Cedega]] - [[TransGaming Technologies]]' proprietary [[Fork (software development)|fork]] of Wine, designed specifically for running games written for Microsoft Windows under Linux. [10551270] |** [[Darwine]] - This project intends to port and develop Wine as well as other supporting tools that will allow [[Darwin (operating system)|Darwin]] and [[Mac OS X]] users to run Microsoft Windows applications, and to provide [[Win32]] [[Application Programming Interface|API]] compatibility at application source code level. [10551280] |* [[ReactOS]] - An open-source OS that is intended to run the same software as Windows, originally designed to imitate Windows NT 4.0, now aiming at Windows XP compatibility. [10551290] |It has been in the [[development stage]] since 1996.