AI Code Puppy

AI Code Puppy — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • JasPer

    JasPer

    JasPer is a computer software project to create a reference implementation of the codec specified in the JPEG-2000 Part-1 standard (i.e. ISO/IEC 15444-1) - started in 1997 at Image Power Inc. and at the University of British Columbia. It consists of a C library and some sample applications useful for testing the codec. The copyright owner began licensing the code to the public under an MIT License-style license in 2004 in response to requests from the open-source community. As of 2011 JasPer operated as a component of many software projects, both free and proprietary, including (but not limited to) netpbm (as of release 10.12), ImageMagick and KDE (as of version 3.2). As of 22 June 2010 the GEGL graphics library supported JasPer in its latest Git versions. In a series of objective JPEG-2000-compression quality tests conducted in 2004, "JasPer was the best codec, closely followed by IrfanView and Kakadu". However, Jasper remains one of the slowest implementations of the JPEG-2000 codec, as it was designed for reference, not performance. == Etymology == The name "JasPer" has simultaneous connotations with Canada's Jasper National Park, with the semi-precious gemstone, jasper, and with "JP" as an abbreviation of the JPEG-2000 standard.

    Read more →
  • Premature convergence

    Premature convergence

    Premature convergence is an unwanted effect in evolutionary algorithms (EA), a metaheuristic that mimics the basic principles of biological evolution as a computer algorithm for solving an optimization problem. The effect means that the population of an EA has converged too early, resulting in being suboptimal. In this context, the parental solutions, through the aid of genetic operators, are not able to generate offspring that are superior to, or outperform, their parents. Premature convergence is a common problem found in evolutionary algorithms, as it leads to a loss, or convergence of, a large number of alleles, subsequently making it very difficult to search for a specific gene in which the alleles were present. An allele is considered lost if, in a population, a gene is present, where all individuals are sharing the same value for that particular gene. An allele is, as defined by De Jong, considered to be a converged allele, when 95% of a population share the same value for a certain gene. == Strategies for preventing premature convergence == Strategies to regain genetic variation can be: a mating strategy called incest prevention, uniform crossover, mimicking sexual selection, favored replacement of similar individuals (preselection or crowding), segmentation of individuals of similar fitness (fitness sharing), increasing population size niche and specie The genetic variation can also be regained by mutation though this process is highly random. A general strategy to reduce the risk of premature convergence is to use structured populations instead of the commonly used panmictic ones. == Identification of the occurrence of premature convergence == It is hard to determine when premature convergence has occurred, and it is equally hard to predict its presence in the future. One measure is to use the difference between the average and maximum fitness values, as used by Patnaik & Srinivas, to then vary the crossover and mutation probabilities. Population diversity is another measure which has been extensively used in studies to measure premature convergence. However, although it has been widely accepted that a decrease in the population diversity directly leads to premature convergence, there have been little studies done on the analysis of population diversity. In other words, by using the term population diversity, the argument for a study in preventing premature convergence lacks robustness, unless specified what their definition of population diversity is. There are models to counter the effect and risk of premature convergence that do not compromise core GA parameters like population size, mutation rate, and other core mechanisms. These models were inspired by biological ecology, where genetic interactions are limited by external mechanisms such as spatial topologies or speciation. These ecological models, such as the Eco-GA, adopt diffusion-based strategies to improve the robustness of GA runs and increase the likelihood of reaching near-global optima. == Causes for premature convergence == There are a number of presumed or hypothesized causes for the occurrence of premature convergence. === Self-adaptive mutations === Rechenberg introduced the idea of self-adaptation of mutation distributions in evolution strategies. According to Rechenberg, the control parameters for these mutation distributions evolved internally through self-adaptation, rather than predetermination. He called it the 1/5-success rule of evolution strategies (1 + 1)-ES: The step size control parameter would be increased by some factor if the relative frequency of positive mutations through a determined period of time is larger than 1/5, vice versa if it is smaller than 1/5. Self-adaptive mutations may very well be one of the causes for premature convergence. Accurately locating of optima can be enhanced by self-adaptive mutation, as well as accelerating the search for this optima. This has been widely recognized, though the mechanism's underpinnings of this have been poorly studied, as it is often unclear whether the optima is found locally or globally. Self-adaptive methods can cause global convergence to global optimum, provided that the selection methods used are using elitism, as well as that the rule of self-adaptation doesn't interfere with the mutation distribution, which has the property of ensuring a positive minimum probability when hitting a random subset. This is for non-convex objective functions with sets that include bounded lower levels of non-zero measurements. A study by Rudolph suggests that self-adaption mechanisms among elitist evolution strategies do resemble the 1/5-success rule, and could very well get caught by a local optimum that include a positive probability. === Panmictic populations === Most EAs use unstructured or panmictic populations where basically every individual in the population is eligible for mate selection based on fitness. Thus, The genetic information of an only slightly better individual can spread in a population within a few generations, provided that no better other offspring is produced during this time. Especially in comparatively small populations, this can quickly lead to a loss of genotypic diversity and thus to premature convergence. A well-known countermeasure is to switch to alternative population models which introduce substructures into the population that preserve genotypic diversity over a longer period of time and thus counteract the tendency towards premature convergence. This has been shown for various EAs such as genetic algorithms, the evolution strategy, other EAs or memetic algorithms.

    Read more →
  • Tensor product network

    Tensor product network

    A tensor product network, in artificial neural networks, is a network that exploits the properties of tensors to model associative concepts such as variable assignment. Orthonormal vectors are chosen to model the ideas (such as variable names and target assignments), and the tensor product of these vectors construct a network whose mathematical properties allow the user to easily extract the association from it.

    Read more →
  • Julia (programming language)

    Julia (programming language)

    Julia is a dynamic general-purpose programming language. As a high-level language, distinctive aspects of Julia's design include a type system with parametric polymorphism, the use of multiple dispatch as a core programming paradigm, just-in-time compilation and a parallel garbage collection implementation. Notably, Julia does not support classes with encapsulated methods but instead relies on the types of all of a function's arguments to determine which method will be called. By default, Julia is run similarly to scripting languages, using its runtime, and allows for interactions, but Julia programs can also be compiled to small binary standalone executables (or to small libraries for e.g. Python), with e.g. the JuliaC.jl compiler. Julia programs can reuse libraries from other languages, and vice versa. Julia has interoperability with C, C++, Fortran, Rust, Python, and R. Additionally, some Julia packages have bindings to be used from Python and R as libraries. Julia is supported by programmer tools like IDEs (see below) and by notebooks like Pluto.jl, Jupyter, and since 2025, Google Colab officially supports Julia natively. Julia is sometimes used in embedded systems (e.g. has been used in a satellite in space on a Raspberry Pi Compute Module 4; 64-bit Pis work best with Julia, and Julia is supported in Raspbian). == History == Work on Julia began in 2009, when Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman set out to create a free language that was both high-level and fast. On 14 February 2012, the team launched a website with a blog post explaining the language's mission. In an interview with InfoWorld in April 2012, Karpinski said about the name of the language, Julia: "There's no good reason, really. It just seemed like a pretty name." Bezanson said he chose the name on the recommendation of a friend, then years later wrote: Maybe julia stands for "Jeff's uncommon lisp is automated"? Julia's syntax is stable, since version 1.0 in 2018, and Julia has a backward compatibility guarantee for 1.x and also a stability promise for the documented (stable) API, while in the years before in the early development prior to 0.7 the syntax (and semantics) was changed in new versions. All of the (registered package) ecosystem uses the new and improved syntax, and in most cases relies on new APIs that have been added regularly, and in some cases minor additional syntax added in a forward compatible way e.g. in Julia 1.7. In the 10 years since the 2012 launch of pre-1.0 Julia, the community has grown. The Julia package ecosystem has over 11.8 million lines of code (including docs and tests). The JuliaCon academic conference for Julia users and developers has been held annually since 2014 with JuliaCon2020 welcoming over 28,900 unique viewers, and then JuliaCon2021 breaking all previous records (with more than 300 JuliaCon2021 presentations available for free on YouTube, up from 162 the year before), and 43,000 unique viewers during the conference. Three of the Julia co-creators are the recipients of the 2019 James H. Wilkinson Prize for Numerical Software (awarded every four years) "for the creation of Julia, an innovative environment for the creation of high-performance tools that enable the analysis and solution of computational science problems." Also, Alan Edelman, professor of applied mathematics at MIT, has been selected to receive the 2019 IEEE Computer Society Sidney Fernbach Award "for outstanding breakthroughs in high-performance computing, linear algebra, and computational science and for contributions to the Julia programming language." Version 0.3 was released in August 2014. Both Julia 0.7 and version 1.0 were released on 8 August 2018. Julia 1.4 added syntax for generic array indexing to handle e.g. 0-based arrays. The memory model was also changed. Julia 1.5 released in August 2020 added record and replay debugging support, for Mozilla's rr tool. The release changed the behavior in the REPL (to soft scope) to the one used in Jupyter, but keeps full compatible with non-REPL code (that retains hard scope). Julia 1.6 was the largest release since 1.0, and it was the long-term support (LTS) version for the longest time. Since Julia 1.7 development is back to time-based releases, and it was released in November 2021 with e.g. a new default random-number generator and Julia 1.7.3 fixed at least one security issue. Julia 1.8 added options for hiding source code when compiling Julia source code to executables. Julia 1.9 has added the ability to precompile packages to native machine code, done automatically; to improve precompilation of packages a new package PrecompileTools.jl was introduced, for use by package developers. Julia 1.10 was released on 25 December 2023 with new features such as parallel garbage collection. Julia 1.11 was released on 7 October 2024, and with it 1.10.5 became the next long-term support (LTS) version (i.e. those became the only two supported versions), since replaced by 1.10.10 released on 27 June, and 1.6 is no longer an LTS version. Julia 1.11 adds e.g. the new public keyword to signal safe public API (Julia users are advised to use such API, not internals, of Julia or packages, and package authors advised to use the keyword, generally indirectly, e.g. prefixed with the @compat macro, from Compat.jl, to also support older Julia versions, at least the LTS version). Julia 1.12 was released on 7 October 2025 (and 1.12.5 on 9 February 2026), and with it a JuliaC.jl package including the juliac compiler that works with it, for making rather small binary executables (much smaller than was possible before; through the use of new so-called trimming feature). Julia 1.10 LTS is an officially still-supported branch, but the 1.11 branch has also been maintained after 1.12 release, with 1.11.8 released and then 1.11.9 released on 8 February 2026. === JuliaCon === Since 2014, the Julia Community has hosted an annual Julia Conference focused on developers and users. The first JuliaCon took place in Chicago and kickstarted the annual occurrence of the conference. Since 2014, the conference has taken place across a number of locations including MIT and the University of Maryland, Baltimore. The event audience has grown from a few dozen people to over 28,900 unique attendees during JuliaCon 2020, which took place virtually. JuliaCon 2021 also took place virtually with keynote addresses from professors William Kahan, the primary architect of the IEEE 754 floating-point standard (which virtually all CPUs and languages, including Julia, use), Jan Vitek, Xiaoye Sherry Li, and Soumith Chintala, a co-creator of PyTorch. JuliaCon grew to 43,000 unique attendees and more than 300 presentations (still freely accessible, plus for older years). JuliaCon 2022 will also be virtual held between July 27 and July 29, 2022, for the first time in several languages, not just in English. === Sponsors === The Julia language became a NumFOCUS fiscally sponsored project in 2014 in an effort to ensure the project's long-term sustainability. Jeremy Kepner at MIT Lincoln Laboratory was the founding sponsor of the Julia project in its early days. In addition, funds from the Gordon and Betty Moore Foundation, the Alfred P. Sloan Foundation, Intel, and agencies such as NSF, DARPA, NIH, NASA, and FAA have been essential to the development of Julia. Mozilla, the maker of Firefox web browser, with its research grants for H1 2019, sponsored "a member of the official Julia team" for the project "Bringing Julia to the Browser", meaning to Firefox and other web browsers. The Julia language is also supported by individual donors on GitHub. === The Julia company === JuliaHub, Inc. was founded in 2015 as Julia Computing, Inc. by Viral B. Shah, Deepak Vinchhi, Alan Edelman, Jeff Bezanson, Stefan Karpinski and Keno Fischer. In June 2017, Julia Computing raised US$4.6 million in seed funding from General Catalyst and Founder Collective, the same month was "granted $910,000 by the Alfred P. Sloan Foundation to support open-source Julia development, including $160,000 to promote diversity in the Julia community", and in December 2019 the company got $1.1 million funding from the US government to "develop a neural component machine learning tool to reduce the total energy consumption of heating, ventilation, and air conditioning (HVAC) systems in buildings". In July 2021, Julia Computing announced they raised a $24 million Series A round led by Dorilton Ventures, which also owns Formula One team Williams Racing, that partnered with Julia Computing. Williams' Commercial Director said: "Investing in companies building best-in-class cloud technology is a strategic focus for Dorilton and Julia's versatile platform, with revolutionary capabilities in simulation and modelling, is hugely relevant to our business. We look forward to embedding Julia Computing in the world's most technologically advanced sport". In June 2023, JuliaHub received (again, now

    Read more →
  • Time-inhomogeneous hidden Bernoulli model

    Time-inhomogeneous hidden Bernoulli model

    Time-inhomogeneous hidden Bernoulli model (TI-HBM) is an alternative to hidden Markov model (HMM) for automatic speech recognition. Contrary to HMM, the state transition process in TI-HBM is not a Markov-dependent process, rather it is a generalized Bernoulli (an independent) process. This difference leads to elimination of dynamic programming at state-level in TI-HBM decoding process. Thus, the computational complexity of TI-HBM for probability evaluation and state estimation is O ( N L ) {\displaystyle O(NL)} (instead of O ( N 2 L ) {\displaystyle O(N^{2}L)} in the HMM case, where N {\displaystyle N} and L {\displaystyle L} are number of states and observation sequence length respectively). The TI-HBM is able to model acoustic-unit duration (e.g. phone/word duration) by using a built-in parameter named survival probability. The TI-HBM is simpler and faster than HMM in a phoneme recognition task, but its performance is comparable to HMM. For details, see [1] or [2].

    Read more →
  • Synaptic transistor

    Synaptic transistor

    A synaptic transistor is an electrical device that can learn in ways similar to a neural synapse. It optimizes its own properties for the functions it has carried out in the past. The device mimics the behavior of the property of neurons called spike-timing-dependent plasticity, or STDP. == Structure == Its structure is similar to that of a field effect transistor, where an ionic liquid takes the place of the gate insulating layer between the gate electrode and the conducting channel. That channel is composed of samarium nickelate (SmNiO3, or SNO) rather than the field effect transistor's doped silicon. == Function == A synaptic transistor has a traditional immediate response whose amount of current that passes between the source and drain contacts varies with voltage applied to the gate electrode. It also produces a much slower learned response such that the conductivity of the SNO layer varies in response to the transistor's STDP history, essentially by shuttling oxygen ions between the SNO and the ionic liquid. The analog of strengthening a synapse is to increase the SNO's conductivity, which essentially increases gain. Similarly, weakening a synapse is analogous to decreasing the SNO's conductivity, lowering the gain. The input and output of the synaptic transistor are continuous analog values, rather than digital on-off signals. While the physical structure of the device has the potential to learn from history, it contains no way to bias the transistor to control the memory effect. An external supervisory circuit converts the time delay between input and output into a voltage applied to the ionic liquid that either drives ions into the SNO or removes them. A network of such devices can learn particular responses to "sensory inputs", with those responses being learned through experience rather than explicitly programmed.

    Read more →
  • NSynth

    NSynth

    NSynth (a portmanteau of "Neural Synthesis") is a WaveNet-based autoencoder for synthesizing audio, outlined in a paper in April 2017. == Overview == The model generates sounds through a neural network based synthesis, employing a WaveNet-style autoencoder to learn its own temporal embeddings from four different sounds. Google then released an open source hardware interface for the algorithm called NSynth Super, used by notable musicians such as Grimes and YACHT to generate experimental music using artificial intelligence. The research and development of the algorithm was part of a collaboration between Google Brain, Magenta and DeepMind. == Technology == === Dataset === The NSynth dataset is composed of 305,979 one-shot instrumental notes featuring a unique pitch, timbre, and envelope, sampled from 1,006 instruments from commercial sample libraries. For each instrument the dataset contains four-second 16 kHz audio snippets by ranging over every pitch of a standard MIDI piano, as well as five different velocities. The dataset is made available under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. === Machine learning model === A spectral autoencoder model and a WaveNet autoencoder model are publicly available on GitHub. The baseline model uses a spectrogram with fft_size 1024 and hop_size 256, MSE loss on the magnitudes, and the Griffin-Lim algorithm for reconstruction. The WaveNet model trains on mu-law encoded waveform chunks of size 6144. It learns embeddings with 16 dimensions that are downsampled by 512 in time. == NSynth Super == In 2018 Google released a hardware interface for the NSynth algorithm, called NSynth Super, designed to provide an accessible physical interface to the algorithm for musicians to use in their artistic production. Design files, source code and internal components are released under an open source Apache License 2.0, enabling hobbyists and musicians to freely build and use the instrument. At the core of the NSynth Super there is a Raspberry Pi, extended with a custom printed circuit board to accommodate the interface elements. == Influence == Despite not being publicly available as a commercial product, NSynth Super has been used by notable artists, including Grimes and YACHT. Grimes reported using the instrument in her 2020 studio album Miss Anthropocene. YACHT announced an extensive use of NSynth Super in their album Chain Tripping. Claire L. Evans compared the potential influence of the instrument to the Roland TR-808. The NSynth Super design was honored with a D&AD Yellow Pencil award in 2018.

    Read more →
  • IBM Watsonx

    IBM Watsonx

    Watsonx is a platform by IBM for building and managing artificial intelligence (AI) applications for business use. Released on May 9, 2023, the platform provides software tools and infrastructure for companies to work with both IBM's own AI models and models from third-party sources. The platform consists of three main components: watsonx.ai, a studio for training, validating, and deploying AI models; watsonx.data, a system for storing and managing data used by the models; and watsonx.governance, a toolkit to ensure AI applications are compliant with company policies and regulations. A key feature of the platform is that it can be trained on a company's private data to perform specialized tasks, a process known as fine-tuning. IBM states that this client-specific data is not used to train its own models. == History == Watsonx was introduced on May 9, 2023, at the annual IBM Think conference, as a platform that includes multiple services. Just like Watson AI computer with the similar name, Watsonx was named after Thomas J. Watson, IBM's founder and first CEO. On February 13, 2024, Anaconda partnered with IBM to embed its open-source Python packages into Watsonx. Watsonx is used at ESPN's Fantasy Football App for managing players' performance, and by Italian telecommunications company Wind Tre. It was employed to generate editorial content around nominees during the 66th Annual Grammy Awards. In 2025, Wimbledon integrated IBM watsonx generative AI into its app and website. Integrated with IBM Safer Payments, IBM watsonx has been used in banking sector fraud detection and anti-money laundering (AML) systems. == Services == === watsonx.ai === Watsonx.ai is a platform that allows AI developers to leverage a wide range of LLMs under IBM's own Granite series and others such as Facebook's LLaMA-2, free and open-source model Mistral, and many others present in the Hugging Face community. These models come pre-trained and optimized for various natural language processing (NLP) applications.The platform also allows fine-tuning with its Tuning Studio. === watsonx.data === Watsonx.data is a platform designed to assist clients in addressing issues related to data volume, complexity, cost, and governance.. The platform facilitates seamless data access, whether stored in the cloud or on-premises, through a single entry point. === watsonx.governance === Watsonx.governance is a platform that utilizes IBM's AI capabilities to implement AI lifecycle governance. This helps them manage risks and maintain compliance with evolving AI and industry regulations, while reducing AI bias through automated oversight.

    Read more →
  • Read Along

    Read Along

    Read Along, formerly known as Bolo, is an Android language-learning app for children developed by Google for the Android operating system. The application was released on the Play Store on March 7, 2019. It features a character named Diya helping children learn to read through illustrated stories. It has the facility to learn English and Indian major languages i.e. Hindi, Bengali, Tamil, Telugu, Marathi and Urdu, as well as Spanish, Portuguese and Arabic. == Technology == The app uses text-to-speech technology, through which the character named Dia reads the story, as well as speech-to-text technology, which mechanically identifies the matches between the text and the reading of the user. The story of Chhota Bheem and Katha Kids was added in September 2019. In April 2020, a new version of the application was released. In September 2020, it added Arabic language to its language option. A web version was launched in August 2022.

    Read more →
  • Ordinal regression

    Ordinal regression

    In statistics, ordinal regression, also called ordinal classification, is a type of regression analysis used for predicting an ordinal variable, i.e. a variable whose value exists on an arbitrary scale where only the relative ordering between different values is significant. It can be considered an intermediate problem between regression and classification. Examples of ordinal regression are ordered logit and ordered probit. Ordinal regression turns up often in the social sciences, for example in the modeling of human levels of preference (on a scale from, say, 1–5 for "very poor" through "excellent"), as well as in information retrieval. In machine learning, ordinal regression may also be called ranking learning. == Linear models for ordinal regression == Ordinal regression can be performed using a generalized linear model (GLM) that fits both a coefficient vector and a set of thresholds to a dataset. Suppose one has a set of observations, represented by length-p vectors x1 through xn, with associated responses y1 through yn, where each yi is an ordinal variable on a scale 1, ..., K. For simplicity, and without loss of generality, we assume y is a non-decreasing vector, that is, yi ≤ {\displaystyle \leq } yi+1. To this data, one fits a length-p coefficient vector w and a set of thresholds θ1, ..., θK−1 with the property that θ1 < θ2 < ... < θK−1. This set of thresholds divides the real number line into K disjoint segments, corresponding to the K response levels. The model can now be formulated as Pr ( y ≤ i ∣ x ) = σ ( θ i − w ⋅ x ) {\displaystyle \Pr(y\leq i\mid \mathbf {x} )=\sigma (\theta _{i}-\mathbf {w} \cdot \mathbf {x} )} or, the cumulative probability of the response y being at most i is given by a function σ (the inverse link function) applied to a linear function of x. Several choices exist for σ; the logistic function σ ( θ i − w ⋅ x ) = 1 1 + e − ( θ i − w ⋅ x ) {\displaystyle \sigma (\theta _{i}-\mathbf {w} \cdot \mathbf {x} )={\frac {1}{1+e^{-(\theta _{i}-\mathbf {w} \cdot \mathbf {x} )}}}} gives the ordered logit model, while using the CDF of the standard normal distribution gives the ordered probit model. A third option is to use an exponential function σ ( θ i − w ⋅ x ) = 1 − exp ⁡ ( − exp ⁡ ( θ i − w ⋅ x ) ) {\displaystyle \sigma (\theta _{i}-\mathbf {w} \cdot \mathbf {x} )=1-\exp(-\exp(\theta _{i}-\mathbf {w} \cdot \mathbf {x} ))} which gives the proportional hazards model. === Latent variable model === The probit version of the above model can be justified by assuming the existence of a real-valued latent variable (unobserved quantity) y, determined by y ∗ = w ⋅ x + ε {\displaystyle y^{}=\mathbf {w} \cdot \mathbf {x} +\varepsilon } where ε is normally distributed with zero mean and unit variance, conditioned on x. The response variable y results from an "incomplete measurement" of y, where one only determines the interval into which y falls: y = { 1 if y ∗ ≤ θ 1 , 2 if θ 1 < y ∗ ≤ θ 2 , 3 if θ 2 < y ∗ ≤ θ 3 ⋮ K if θ K − 1 < y ∗ . {\displaystyle y={\begin{cases}1&{\text{if}}~~y^{}\leq \theta _{1},\\2&{\text{if}}~~\theta _{1} Read more →

  • Operational taxonomic unit

    Operational taxonomic unit

    An operational taxonomic unit (OTU) is an operational definition used to classify groups of closely related individuals. The term was originally introduced in 1963 by Robert R. Sokal and Peter H. A. Sneath in the context of numerical taxonomy, where an "operational taxonomic unit" is simply the group of organisms currently being studied. In this sense, an OTU is a pragmatic definition to group individuals by similarity, equivalent to but not necessarily in line with classical Linnaean taxonomy or modern evolutionary taxonomy. Nowadays, however, the term is commonly used in a different context and refers to clusters of (uncultivated or unknown) organisms, grouped by DNA sequence similarity of a specific taxonomic marker gene (originally coined as mOTU; molecular OTU). In other words, OTUs are pragmatic proxies for "species" at different taxonomic levels, in the absence of traditional systems of biological classification as are available for macroscopic organisms. For several years, OTUs have been the most commonly used units of diversity, especially when analysing small subunit 16S (for prokaryotes) or 18S rRNA (for eukaryotes) marker gene sequence datasets. == Molecular OTU by clustering of marker gene sequences == In the approach represented by DNA barcoding, a particular locus is chosen to be used as the marker gene for classification. This locus should be universally present in the scope selected, variable enough to be different among close-related species, and be flanked by conservative sequences that allow for easy amplification and detection. There are databases containing sequences for such marker genes from many different species, allowing for comparison. (Sometimes only using one locus does not provide sufficient resolution, so multiple marker genes are used. This is the case for plants, where rbcL+matK is common.) Sequences obtained this way can be clustered according to their similarity to one another, and operational taxonomic units are defined based on the similarity threshold set by the researcher. The exact threshold depends on the taxa in question and the mutational rates of the selected locus in the taxon. 97–99% are commonly used, but "it is now recognized to be somewhat arbitrary as sequence variation within and among species varies across taxa". 100% similarity (fully identical) is also common, also known as single variants. It remains debatable how well this commonly used method recapitulates true microbial species phylogeny or ecology. Although OTUs can be calculated differently when using different algorithms or thresholds, research by Schmidt et al. (2014) demonstrated that 16S-derived microbial OTUs were generally ecologically consistent across habitats and several clustering approaches. The number of OTUs defined may be inflated due to errors in DNA sequencing. === OTU clustering approaches === There are three main approaches to clustering OTUs: De novo, for which the clustering is based on similarities between sequencing reads. Closed-reference, for which the clustering is performed against a reference database of sequences. Open-reference, where clustering is first performed against a reference database of sequences, then any remaining sequences that could not be mapped to the reference are clustered de novo. Using a reference provides taxonomic context for the OTUs found. Alternatively, taxonomic context can be found after the construction of clusters by comparing representative sequences from clusters against a reference database. There are also specialized classifiers for this purpose which are much faster than naive comparison using BLAST. === OTU clustering algorithms === Hierarchical clustering algorithms (HCA): uclust & cd-hit & ESPRIT Bayesian clustering: CROP == Molecular OTU by other methods == In addition to similarity-based grouping, marker gene sequences can be sorted into OTUs using molecular phylogeny, k-mer composition, or hybrid methods combining these methods with similarity. There are also Bayesian tree-less methods and machine learning approaches. Using phylogeny often involves manually assigning terminal clades or single nodes to an OTU, so this is usually only done for refinement. Genome skimming can be used to obtain high-copy DNA without the need to choose marker genes or to design PCR primers for the chosen genes. It can provide fairly good coverage of organelle DNA and repetitive elements such as ribosomal DNA, both of which can be used like marker genes in OTU analysis. Whole-genome sequencing is more expensive and involves the production and processing of more data. By considering the entire genome, many (sometimes over 100) marker genes can be used at the same time, producing highly resolved phylogenies that correctly identify problematic taxa. It is also possible to use entire genomes for OTU assignment. For example, genomes from different bacterial species almost always have an average nucleotide identity lower than 95%, a fact that can be used to define new OTUs (and likely new species).

    Read more →
  • Bioz

    Bioz

    Bioz is a search engine for life science experimentation. == History == Bioz was founded by Karin Lachmi and Daniel Levitt. Lachmi is a scientist who completed her postdoc in molecular and cellular biology at the Stanford University School of Medicine. During her lab work she found little available data regarding preferable lab tools, reagents and related products for experimentation. There are 50,000 vendors selling 300 million scientific products. She decided to start the company in order to provide researchers with adequate information for that purpose. Co-founder Daniel Levitt is an entrepreneur who sold his company WebAppoint to Microsoft in the year 2000. He also co-founded the company StemRad. At Bioz, Lachmi serves as the Chief Scientific Officer and Levitt serves as the chief executive officer. Bioz claims to have over a million researcher-users from 196 countries. Among the investors are Esther Dyson and the Stanford-StartX Fund. The company's advisory board includes Nobel Laureates in Chemistry Michael Levitt, Roger Kornberg, and Ada Yonath. == Technology == The company uses artificial intelligence, machine learning and natural language processing in order to extract experimentation data from scientific articles, such as the products that researchers used, the companies that supply the products, the protocol conditions that researchers selected, and the types of experiments and techniques. The algorithm ranks products based on how frequently they were used by researchers in their experiments, how recently a product was used, and the impact factor of the journal. The algorithm's output is a Bioz stars score for each product that was mentioned in an article. Bioz is a data-driven platform for product recommendations, which is contrary to platforms such as TripAdvisor and OpenTable that are based on user-generated reviews and ratings. The recommendations and scoring system that the company has developed are meant to assist researchers with the process of developing future medications and finding cures for diseases. They are guided towards products and techniques that were previously used by other researchers when planning and performing experiments. The company's revenue is based on selling SaaS subscriptions to researchers in biopharma companies. They also charge product suppliers for content syndication.

    Read more →
  • Alias Eclipse

    Alias Eclipse

    Eclipse was a professional 2D image editing program available on Silicon Graphics and Windows workstations. Designed to manipulate high-resolution images like digitized movie frames and photographs for print, it offered color correction tools, image processing effects, rudimentary paint features, and spline-based drawing and masking. == History == Eclipse was originally developed in the late 1980s by Full Color Computing, an early provider of photo retouch and color prepress software for Silicon Graphics workstations. Alias Research (later Alias Systems Corporation), a developer of professional 3D graphics applications for the SGI platform, purchased the rights to Eclipse in fall 1990. Alias developed Eclipse through the early to mid-1990s, releasing version 2.5 in 1995 with improvements to the speed of color correction, effects, and rendering. Xyvision's Contex Prepress division purchased exclusive rights to Eclipse from Alias in 1996, and released version 3.0 the following year. Eclipse was subsequently sold to German developer Form & Vision GmbH, which continued development and ported it to the Windows platform. In 1999, Form & Vision released a demo of Eclipse 3.1.3 on the SGI platform which was limited to 1600 x 1600 pixel images, then ceased development of Eclipse on the SGI platform. Eclipse was thereafter developed exclusively for the Windows platform, culminating with version 3.1.4 in 2001. In the same year the firm went bankrupt. == Features == Eclipse was designed to work with very large images that could not be manipulated in real time on contemporary computer systems due to memory limitations, and thus allowed the user to make modifications to a lower-resolution copy of the original image in "proxy mode." Brush strokes, color corrections, and other edits were saved in proxy mode, then applied to the full-size image in post processing. This method also allowed for batch processing of a high-resolution image sequence using the edits applied to the original proxy image. Other features included color correction and separation, warping, special effects, text, and shape masking. Wavelet image compression created by LuraTech was added to Eclipse 3.1.4

    Read more →
  • Win–stay, lose–switch

    Win–stay, lose–switch

    In psychology, game theory, statistics, and machine learning, win–stay, lose–switch (also win–stay, lose–shift or Pavlov, named after Ivan Pavlov) is a heuristic learning strategy used to model learning in decision situations. It was first invented as an improvement over randomization in bandit problems. It was later applied to the prisoner's dilemma in order to model the evolution of altruism. In most versions, it starts either with a cooperate, then proceeds as always, or starts with a "probe" of cooperate-defect-cooperate to determine the other player's strategy. A mutual cooperation is regarded as a win. The learning rule bases its decision only on the outcome of the previous play. Outcomes are divided into successes (wins) and failures (losses). If the play on the previous round resulted in a success, then the agent plays the same strategy on the next round. Alternatively, if the play resulted in a failure the agent switches to another action. A large-scale empirical study of players of the game rock, paper, scissors shows that a variation of this strategy is adopted by real-world players of the game, instead of the Nash equilibrium strategy of choosing entirely at random between the three options.

    Read more →
  • Perceptron

    Perceptron

    In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function that can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector. == History == The artificial neuron and artificial neural network were invented in 1943 by Warren McCulloch and Walter Pitts in their seminal paper "A Logical Calculus of the Ideas Immanent in Nervous Activity". In 1957, Frank Rosenblatt was at the Cornell Aeronautical Laboratory. He simulated the perceptron on an IBM 704. Later, he obtained funding by the Information Systems Branch of the United States Office of Naval Research and the Rome Air Development Center, to build a custom-made computer, the Mark I Perceptron. It was first publicly demonstrated on 23 June 1960. The machine was "part of a previously secret four-year NPIC [the US' National Photographic Interpretation Center] effort from 1963 through 1966 to develop this algorithm into a useful tool for photo-interpreters". Rosenblatt described the details of the perceptron in a 1958 paper. His organization of a perceptron is constructed of three kinds of cells ("units"): S, A, R, which stand for "sensory", "association" and "response". He presented at the first international symposium on AI, Mechanisation of Thought Processes, which took place in 1958 November. Rosenblatt's project was funded under Contract Nonr-401(40) "Cognitive Systems Research Program", which lasted from 1959 to 1970, and Contract Nonr-2381(00) "Project PARA" ("PARA" means "Perceiving and Recognition Automata"), which lasted from 1957 to 1963. In 1959, the Institute for Defense Analysis awarded his group a $10,000 contract. By September 1961, the ONR awarded further $153,000 worth of contracts, with $108,000 committed for 1962. The ONR research manager, Marvin Denicoff, stated that ONR, instead of ARPA, funded the Perceptron project, because the project was unlikely to produce technological results in the near or medium term. Funding from ARPA go up to the order of millions dollars, while from ONR are on the order of 10,000 dollars. Meanwhile, the head of IPTO at ARPA, J.C.R. Licklider, was interested in 'self-organizing', 'adaptive' and other biologically-inspired methods in the 1950s; but by the mid-1960s he was openly critical of these, including the perceptron. Instead he strongly favored the logical AI approach of Simon and Newell. === Mark I Perceptron machine === The perceptron was intended to be a machine, rather than a program, and while its first implementation was in software for the IBM 704, it was subsequently implemented in custom-built hardware as the Mark I Perceptron with the project name "Project PARA", designed for image recognition. The machine is currently in Smithsonian National Museum of American History. The Mark I Perceptron had three layers. One version was implemented as follows: An array of 400 photocells arranged in a 20x20 grid, named "sensory units" (S-units), or "input retina". Each S-unit can connect to up to 40 A-units. A hidden layer of 512 perceptrons, named "association units" (A-units). An output layer of eight perceptrons, named "response units" (R-units). Rosenblatt called this three-layered perceptron network the alpha-perceptron, to distinguish it from other perceptron models he experimented with. The S-units are connected to the A-units randomly (according to a table of random numbers) via a plugboard (see photo), to "eliminate any particular intentional bias in the perceptron". The connection weights are fixed, not learned. Rosenblatt was adamant about the random connections, as he believed the retina was randomly connected to the visual cortex, and he wanted his perceptron machine to resemble human visual perception. The A-units are connected to the R-units, with adjustable weights encoded in potentiometers, and weight updates during learning were performed by electric motors.The hardware details are in an operators' manual. In a 1958 press conference organized by the US Navy, Rosenblatt made statements about the perceptron that caused a heated controversy among the fledgling AI community; based on Rosenblatt's statements, The New York Times reported the perceptron to be "the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence." The Photo Division of Central Intelligence Agency, from 1960 to 1964, studied the use of Mark I Perceptron machine for recognizing militarily interesting silhouetted targets (such as planes and ships) in aerial photos. === Principles of Neurodynamics (1962) === Rosenblatt described his experiments with many variants of the Perceptron machine in a book Principles of Neurodynamics (1962). The book is a published version of the 1961 report. Among the variants are: "cross-coupling" (connections between units within the same layer) with possibly closed loops, "back-coupling" (connections from units in a later layer to units in a previous layer), four-layer perceptrons where the last two layers have adjustable weights (and thus a proper multilayer perceptron), incorporating time-delays to perceptron units, to allow for processing sequential data, analyzing audio (instead of images). The machine was shipped from Cornell to Smithsonian in 1967, under a government transfer administered by the Office of Naval Research. === Perceptrons (1969) === Although the perceptron initially seemed promising, it was quickly proved that perceptrons could not be trained to recognise many classes of patterns. This caused the field of neural network research to stagnate for many years, before it was recognised that a feedforward neural network with two or more layers (also called a multilayer perceptron) had greater processing power than perceptrons with one layer (also called a single-layer perceptron). Single-layer perceptrons are only capable of learning linearly separable patterns. For a classification task with some step activation function, a single node will have a single line dividing the data points forming the patterns. More nodes can create more dividing lines, but those lines must somehow be combined to form more complex classifications. A second layer of perceptrons, or even linear nodes, are sufficient to solve many otherwise non-separable problems. In 1969, a famous book entitled Perceptrons by Marvin Minsky and Seymour Papert showed that it was impossible for these classes of network to learn an XOR function. It is often incorrectly believed that they also conjectured that a similar result would hold for a multi-layer perceptron network. However, this is not true, as both Minsky and Papert already knew that multi-layer perceptrons were capable of producing an XOR function. (See the page on Perceptrons (book) for more information.) Nevertheless, the often-miscited Minsky and Papert text caused a significant decline in interest and funding of neural network research. It took ten more years until neural network research experienced a resurgence in the 1980s. This text was reprinted in 1987 as "Perceptrons - Expanded Edition" where some errors in the original text are shown and corrected. === Subsequent work === Rosenblatt continued working on perceptrons despite diminishing funding. The last attempt was Tobermory, built between 1961 and 1967, built for speech recognition. It occupied an entire room. It had 4 layers with 12,000 weights implemented by toroidal magnetic cores. By the time of its completion, simulation on digital computers had become faster than purpose-built perceptron machines. He died in a boating accident in 1971. A simulation program for neural networks was written for IBM 7090/7094, and was used to study various pattern recognition applications, such as character recognition, particle tracks in bubble-chamber photographs; phoneme, isolated word, and continuous speech recognition; speaker verification; and center-of-attention mechanisms for image processing. The kernel perceptron algorithm was already introduced in 1964 by Aizerman et al. Margin bounds guarantees were given for the Perceptron algorithm in the general non-separable case first by Freund and Schapire (1998), and more recently by Mohri and Rostamizadeh (2013) who extend previous results and give new and more favorable L1 bounds. The perceptron is a simplified model of a biological neuron. While the complexity of biological neuron models is often required to fully understand neural behavior, research suggests a perceptron-like linear model can produce some behavior seen in real neurons. The solution spaces of decision boundaries for all binary functions and learning behaviors are studied in. == Definition == In the modern sense, the perceptron is an algori

    Read more →