Detecting Ambiguities in Regression Problems Using TSK Models

A.K. Nandi, F. Klawonn:

Regression refers to the problem of approximating measured data that are assumed to be produced by an underlying, possibly noisy function. However, in real applications the assumption that the data represent samples from one function is sometimes wrong. For instance, in process control different strategies might be used to achieve the same goal. Any regression model, trying to fit such data as good as possible, must fail, since it can only find an intermediate compromise between the different strategies by which the data were produced. To tackle this problem, an approach is proposed here to detect ambiguities in regression problems by selecting a subset of data from the total data set using TSK models, which work in parallel by sharing the data with each other in every step. The proposed approach is verified with artificial data, and finally utilised to real data of grinding, a manufacturing process used to generate smooth surfaces on work pieces.

A Novel Approach to Noise Clustering for Outlier Detection

F. Rehm, F. Klawonn, R. Kruse:

Noise clustering, as a robust clustering method, performs partitioning of data sets reducing errors caused by outliers. Noise clustering defines outliers in terms of a certain distance, which is called noise distance. The probability or membership degree of data points belonging to the noise cluster increases with their distance to regular clusters. The main purpose of noise clustering is to reduce the influence of outliers on the regular clusters. The emphasis is not put on exactly identifying outliers. However, in many applications outliers contain important information and their correct identification is crucial. In this paper we present a method to estimate the noise distance in noise clustering based on the preservation of the hypervolume of the feature space. Our examples will demonstrate the efficiency of this approach.

Cluster Analysis via the Dynamic Data Assigning Assessment Algorithm

O. Georgieva, F. Klawonn:

A new clustering algorithm based on the principles of noise clustering that finds good clusters step by step is presented and examined. The algorithm can be applied to finding just a few substructures, but also as an iterative method to data partition including the identification of the number of clusters and noise data. The algorithm is applicable in terms of both hard and fuzzy clustering techniques. Its capabilities are studied over different distance metrics of cluster calculation. The algorithm advantages and drawbacks are discussed with illustrative and real data examples. A certain parallel between the algorithm proposed here and other clustering algorithms that are based on the idea to search one cluster at a time is provided.

Evolving Extended Naive Bayes Classifiers

F. Klawonn, P. Angelov:

Naive Bayes classifiers are a very simple, but often effective tool for classification problems, although they are based on independence assumptions that do not hold in most cases. Extended naive Bayes classifiers also rely on independence assumptions, but break them down to artificial subclasses, in this way becoming more powerful than ordinary naive Bayes classifiers. Since the involved computations for Bayes classifiers are basically generalised mean value calculations, they easily render themselves to incremental and online learning. However, for the extended naive Bayes classifiers it is necessary, to choose and construct the subclasses, a problem whose answer is not obvious, especially in the case of online learning. In this paper we propose an evolving extended naive Bayes classifier that can learn and evolve in an online manner.

Equi-sized, Homogeneous Partitioning

F. Klawonn, F. Höppner:

We consider the problem of partitioning a data set of $n$ data objects into $c$ homogeneous subsets (that is, data objects in the same subset should be similar to each other), such that each subset is of approximately the same size. This problem has applications wherever a population has to be distributed among a limited number of resources and the workload for each resource shall be balanced. We modify an existing clustering algorithm in this respect, present some empirical evaluation and discuss the results.

A Maximum Likelihood Approach to Noise Estimation for Intensity Measurements in Biology

F. Klawonn, C. Hundertmark, L. Jänsch:

Often, measurement of biological components generates results, that are corrupted by noise. Noise can be caused by various factors like the detectors themselves, sample properties or also the process of data processing and appears independently from the applied technology. When measuring two identical samples it can be observed that similar signal intensities may have inherent but varying levels of noise and that the ratio of noise decreases with increasing signal intensities. In this paper a statistical approach is introduced to estimate the noise inherent in the measured data. Based on this estimation, it is possible to provide information about the reliability of a measured signal and whether the difference between intensities is mainly caused by noise or by biological relevant cellular alterations.

Subgroup Discovery Based on Ideas from Fuzzy Cluster Analysis

F. Klawonn:

This extended abstract provides a brief outline on a new algorithm for subgroup discovery, exploiting some principles from fuzzy cluster analysis.

Reducing the Number of Parameters of a Fuzzy System Using Scaling Functions

F. Klawonn:

Learning techniques are tailored for fuzzy systems in order to tune them or even for deriving fuzzy rules from data. However, a compromise between accuracy and interpretability has to be found. Flexible fuzzy systems with a large number of parameters and high degrees of freedom tend to function as black boxes. In this paper, we introduce an interpretation of fuzzy systems that enables us to work with a small number of parameters without loosing flexibility or interpretability. In this way, we can provide a learning algorithm that is efficient and yields accuracy as well as interpretability. Our fuzzy system is based on extremely simple fuzzy sets and transformations using an interpretable scaling functions of the input variables.

JProGO: A Novel Tool for the Functional Interpretation of Prokaryotic Microarray Data Using Gene Ontology Information

M. Scheer, F. Klawonn, R. Münch, A. Grote, K. Hiller, C. Choi, I. Koch, M. Schobert, E. Härtig, U. Klages, D. Jahn:

A novel program suite was implemented for the functional interpretation of high-throughput gene expression data based on the identification of Gene Ontology (GO) nodes. The focus of the analysis lies on the interpretation of microarray data from prokaryotes. The three well established statistical methods of the threshold value-based Fisher’s exact test, as well as the threshold value-independent Kolmogorov–Smirnov and Student’s t-test were employed in order to identify the groups of genes with a significantly altered expression profile. Furthermore, we provide the application of the rankbased unpaired Wilcoxon’s test for a GO-based microarray data interpretation. Further features of the program include recognition of the alternative gene names and the correction for multiple testing. Obtained results are visualized interactively both as a table and as a GO subgraph including all significant nodes. Currently, JProGO enables the analysis of microarray data from more than 20 different prokaryotic species, including all important model organisms, and thus constitutes a useful web service for the microbial research community. JProGO is freely accessible via the web at the following address:

Clustering of Unevenly Sampled Gene Expression Time-Series Data

C.S. Möller-Levet, F. Klawonn, K.-H. Cho, O. Wolkenhauer:

Time course measurements are becoming a common type of experiment in the use of microrarrays. The temporal order of the data and the varying length of sampling intervals are important and should be considered in clustering time-series. However, the shortness of gene expression time-series data limits the use of conventional statistical models and techniques for time-series analysis. To address this problem, this paper proposes the Fuzzy Short Time-Series (FSTS) clustering algorithm, which clusters profiles based on the similarity of their relative change of expression level and the corresponding temporal information. One of the major advantages of fuzzy clustering is that genes can belong to more than one group, revealing distinctive features of each gene’s function and regulation. Several examples are provided to illustrate the performance of the proposed algorithm. In addition, we present the validation of the algorithm by clustering the genes which define the model profiles in Chu et al (1998, Science, vol. 284, pp. 699-705). The fuzzy c-means, k-means, average linkage hierarchical algorithm and random clustering are compared to the proposed FSTS algorithm. The performance is evaluated with a well-established cluster validity measure proving that the FSTS algorithm has a better performance than the compared algorithms in clustering similar rates of change of expression in successive unevenly distributed time points. Moreover, the FSTS algorithm was able to cluster in a biologically meaningful way the genes defining the model profiles.

Detecting and Identifying Ambiguities in Regression Problems: An Approach Using a Modified Mountain Method

A.K. Nandi, F. Klawonn:

Regression problems occur in many data analysis applications. The aim of regression is to approximate a function from which measurements were taken. When considering a regression problem, we have to take a number of aspects into account: How noisy the data are, whether they cover the domain sufficiently in which we want to find the regression function and what kind of regression function we should choose. However, the underlying assumption is always that the data actually are (noisy) samples of a single function. In some cases, this might not be true. For instance, when we consider data from a technical process that is controlled by human operators, these operators might use different strategies to reach a particular goal. Even a single operator might not stick to the same strategy all the time. Thus, the dataset containing a mixture of samples from different strategies, do not represent (noisy) samples from a single function. Therefore, there exists an ambiguity of selecting data from a large dataset for regression problems to fit a single model. In this paper, we suggest an approach using a modified mountain method (MMM) to select data from a jumble of large data samples that come from different functions, in order to cope with the ambiguities in the underlying regression problem. The proposed method may also serve to identify the best local (approximation) function(s). These are determined using a weighted regression analysis method. The proposed methodology is explained with a one-dimensional problem, a single input single output system, and later performance of the proposed approach is analysed with artificial data of a two-dimensional case study.

A Formal Study of Linearity Axioms for Fuzzy Orderings

U. Bodenhofer, F. Klawonn:

This contribution is concerned with a detailed investigation of linearity axioms for fuzzy orderings. Different existing concepts are evaluated with respect to three fundamental correspondences from the classical case - linearizability of partial orderings, intersection representation, and one-to-one correspondence between linearity and maximality. As a main result, we obtain that it is virtually impossible to simultaneously preserve all these three properties in the fuzzy case. If we do not require a one-to-one correspondence between linearity and maximality, however, we obtain that an implication-based definition appears to constitute a sound compromise, in particular, if Lukasiewicz-type logics are considered.

Learning Fuzzy Systems – An Objective Function-Approach

F. Höppner, F. Klawonn:

One of the most important aspects of fuzzy systems is that they are easily understandable and interpretable. This property, however, does not come for free but poses some essential constraints on the parameters of a fuzzy system (like the linguistic terms), which are sometimes overlooked when learning fuzzy system automatically from data. In this paper, an objective function-based approach to learn fuzzy systems is developed, taking these constraints explicitly into account. Starting from fuzzy c-means clustering, several modifications of the basic algorithm are proposed, affecting the shape of the membership functions, the partition of individual variables and the coupling of input space partitioning and local function approximation.

Fuzzy Clustering: Insights and a New Approach

F. Klawonn:

Fuzzy clustering extends crisp clustering in the sense that objects can belong to various clusters with different membership degrees at the same time, whereas crisp or deterministic clustering assigns each object to a unique cluster. The standard approach to fuzzy clustering introduces the so-called fuzzifier which controls how much clusters may overlap. In this paper we illustrate, how this fuzzifier can help to reduce the number of undesired local minima of the objective function that is associated with fuzzy clustering. Apart from this advantage, the fuzzifier has also some drawbacks that are discussed in this paper. A deeper analysis of the fuzzifier concept leads us to a more general approach to fuzzy clustering that can overcome the problems caused by the fuzzifier.

A Contribution to Convergence Theory of Fuzzy c-Means and its Derivatives

F. Höppner, F. Klawonn:

In this paper we revisit the convergence and optimization properties of fuzzy clustering algorithms in general and the fuzzy c-means (FCM) algorithm in particular. Our investigation includes probabilistic and (a slightly modified implementation of) possibilistic memberships, which will be discussed under a unified view. We give a convergence proof for the axis-parallel variant of the algorithm by Gustafson and Kessel, that can be generalized to other algorithms more easily than in the usual approach. Using reformulated fuzzy clustering algorithms we apply Banach's classical contraction principle and establish a relationship between saddle points and attractive fixed points. For the special case of FCM we derive a sufficient condition for fixed points to be attractive, allowing identification of them as (local) minima of the objective function (excluding the possibility of a saddle point).

Improved Fuzzy Partitions for Fuzzy Regression Models

F. Höppner, F. Klawonn:

Fuzzy clustering algorithms like the popular fuzzy c-means algorithm (FCM) are frequently used to automatically divide up the data space into fuzzy granules. When the fuzzy clusters are used to derive membership functions for a fuzzy rule-based system, then the corresponding fuzzy sets should fulfill some requirements like boundedness of support or unimodality. Problems may also arise in the case, when the fuzzy partition induced by the clusters is intended as a basis for local function approximation. In this case, a local model (function) is assigned to each cluster. Taking the fuzziness of the partition into account, continuous transitions between the single local models can be obtained easily. However, unless the overlapping of the clusters is very small, the local models tend to mix and no local model is actually valid. By rewarding crisp membership degrees, we modify the objective function used in fuzzy clustering and obtain different membership functions that better suit these purposes. We show that the modification can be interpreted as standard FCM using distances to the Voronoi cell of the cluster rather than using distances to the cluster prototypes. In consequence, the resulting partitions of the modified algorithm are much closer to those of the crisp original methods. The membership functions can be generalized to a fuzzified minimum function. We apply this modified fuzzy clustering approach to builing fuzzy models of the Takagi-Sugeno (TS) type automatically from data.

Should fuzzy equality and similarity satisfy transitivity?

F. Klawonn:

This brief paper addresses problems that were raised in the paper by DeCock in connection with modelling fuzzy equality and similarity. In addition to specific comments to the cited paper, we also point out the more fundamental questions underlying this discussion.

An Alternative Approach to the Fuzzifier in Fuzzy Clustering to Obtain Better Clustering Results

F. Klawonn, F. Höppner:

Observing a binary feature over a period of time yields a sequence of observation intervals. To ease the access to continuous features (like time series), they are often broken down into attributed intervals, such that the attribute describes the series' behaviour within the segment(e.g. increasing, high-value, higly convex, etc.). In both cases, we obtain a sequence of interval data, in which temporal patterns and rules can be identified. A temporal pattern is defined as a set of labeled intervals together with their interval relationships described in terms of Allen's interval logic. In this paper, we consider the evaluation of such rules in order to find the most informative rules. We discuss rule semantics and outline deficiencies of the previously used rule evaluation. We apply the J-measure to rules with a modified semantics in order to better cope with different lengths of the temporal patterns. We also consider the problem of specializing temporal rules by additional attributes of the state intervals.

Finding Informative Rules in Interval Sequences

F. Höppner, F. Klawonn:

The most common fuzzy clustering algorithms are based on the minimization of an objective function that evaluates (fuzzy) cluster partitions. The generalisation step from hard clustering to crisp clustering requires the introduction of an additional parameter, the so called fuzzifier. This fuzzifier does not only control, how much clusters may overlap, but has also some undesired consequences. For example, data have (almost) always non-zero membership degrees to all clusters, no matter how far they are away from a cluster. We propose a concept that generalizes the idea of the fuzzifier and solves the mentioned problems.

Learning Indistinguishability from Data

F. Höppner, F. Klawonn, P. Eklund:

In this paper we revisit the idea of interpreting fuzzy sets as representations of vague values. In this context a fuzzy set is induced by a crisp value and the membership degree of an element is understood as the similarity degree between this element and the crisp value that determines the fuzzy set. Similarity is assumed to be a notion of distance. This means that fuzzy sets are induced by crisp values and anappropriate distance function. This distance function can be described in terms of scaling the ordinary distance between real numbers. With this interpretation in mind, the task of designing a fuzzy system corresponds to determining suitable crisp values and appropriate scaling functions for the distance. When we want to generate a fuzzy model from data, the parameters have to be fitted to the data. This leads to an optimisation problem that is very similar to the optimisation task to be solved in objective function based clustering. We borrow ideas from the alternating optimisation schemes applied in fuzzy clustering in order to develop a new technique to determine our set of parameters from data, supporting the interpretability of the fuzzy system.

Fuzzy Clustering with Weighting of Data Variables

A. Keller, F. Klawonn:

We introduce an objective function-based fuzzy clustering technique that assigns one influence parameter to each single data variable for each cluster. Our method is not only suited to detect structures or groups of data that ate not uniformly distributed over the structure's single domains, but gives also information about the influence of individual variables on the detected groups. In addition, our approach can be seen as a generalization of the well-known fuzzy c-means clusterig algorithm.

Linearity Axioms for Fuzzy Orderings: A Formal Review

U. Bodenhofer, F. Klawonn

This contribution is concerned with a review of linearity axioms for fuzzy orderings with respect to three fundamental correspondences from the classical case - linearizability of partial orderings, intersection representation, and one-to-one correspondence between linearity and maximality. We obtain that it is virtually impossible to simultaneously preserve all these three properties in the fuzzy case. If we do not require a one-to-one correspondence between linearity and maximality, however, we obtain that an implication-based definition appears to constitute a sound compromise, in particular, if Lukasiewicz-type logics are considered.

Noise Clustering with a Fixed Fraction of Noise

F. Klawonn:

Cluster analysis is an exploratory data analysis technique that is designed to group data and to detect structures within data. Exploratory techniques are applied as first steps in data analysis, immediately after elementary cleaning and visualisation of the data has been carried out. This means that a specific model for the data is not available at this state. However, exploratory methods usually incorporate control parameters that influence the result of this early data analysis step. Therefore, it is desirable to design methods that are robust w.r.t. the variation of such control parameters. In this paper, we modify the so-called noise clustering technique making it more robust against a wrong choice of its main control parameter, the noise distance. Section 2 briefly reviews the necessary background in fuzzy clustering. Section 3 introduces our modified noise clustering approach including a computationally efficient algorithm. We finish the paper by an example and some concluding remarks.

The Inherent Indistinguishability in Fuzzy Systems

F. Klawonn, R. Kruse:

This paper provides an overview of fuzzy systems from the viewpoint of similarity relations. Similarity relations turn out to be an appealing framework in which typical concepts and techniques applied in fuzzy systems and fuzzy control can be better understood and interpreted. They can also be used to describe the indistinguishability inherent in any fuzzy system that cannot be avoided.

Adaptation of Cluster Sizes in Objective Function Based Fuzzy Clustering

A. Keller, F. Klawonn:

This paper discusses new approaches in objective function based fuzzy clustering. Some well-known approaches are extended by a supplementary component. The resulting new clustering techniques are able to adapt single clusters to the expansion of the corresponding group of data in an iterative optimization procedure. Another new approach based on volume centers as cluster representatives with varying radii for individual groups is also described. The corresponding objective functions are presented and alternating optimization schemes are derived. Experimental results demonstrate the significance of the presented techniques.

Visual Inspection of Fuzzy Clustering Results

F. Klawonn, V. Chekhtman, E. Janz:

Clustering is an explorative data analysis method applied to data in order to discover structures or certain groupings in a data set. Therefore, clustering can be seen as an unsupervised classification technique. Fuzzy clustering accepts the fact that the clusters or classes in the data are usually not completely well separated and thus assigns a membership degree between 0 and 1 for each cluster to every datum. The most common fuzzy clustering techniques aim at minimizing an objective function whose (main) parameters are the membership degrees and the parameters determining the localisation as well as the shape of the clusters. The algorithm will always compute a result that might represent an undesired local minimum of the objective function. Even if the global minimum is found, it might correspond to a bad result, when the cluster shapes or the number of the clusters are not chosen properly. Since the data are usually multi-dimensional, the visual inspection of the data is very limited. Methods like multi-dimensional scaling are available, but lead very often to unsatisfactory results. Nevertheless, it is important to evaluate the clustering result. Although cluster validity measures try to solve this problem, they tend to reduce the information of a large data set and a number of cluster parameters to a single value. We propose in this paper to use the underlying principles of validity measures, but to refrain from the simplification to a single value and instead provide a graphical representation containing more information. This enables the user to identify inconsistencies in the clustering result or even in single clusters.

Fuzzy Points, Fuzzy Relations and Fuzzy Functions

F. Klawonn:

In many applications of fuzzy sets, especially in fuzzy control, the notions of fuzzy points and fuzzy functions play an important role. Nevertheless, these concepts are usually understood and used on a very intuitive basis without providing a formal definition. In this paper we discuss an approach to clarify and formalize these notions and show some consequences for fuzzy control and fuzzy interpolation. One of the main results shows that the Mamdani fuzzy controller provides a solution to the system of fuzzy relational equations induced by the if-then-rules if it determines a partial fuzzy function.

Sequence Mining for Customer Behaviour Predictions in Telecommunications

F. Eichinger, D.D. Nauck, F. Klawonn:

Predicting the behaviour of customers is challenging, but important for service oriented businesses. Data mining techniques are used to make such predictions, typically using only recent static data. In this paper, a sequence mining approach is proposed, which allows taking historic data and temporal developments into account as well. In order to form a combined classifier, sequence mining is combined with decision tree analysis. In the area of sequence mining, a tree data structure is extended with hashing techniques and a variation of a classic algorithm is presented. The combined classifier is applied to real customer data and produces promising results.

Identifying Single Good Clusters in Data Sets

F. Klawonn:

Local patterns in the form of single clusters are of interest in various areas of data mining. However, since the intention of cluster analysis is a global partition of a data set into clusters, it is not suitable to identify single clusters in a large data set where the majority of the data can not be assigned to meaningful clusters. This paper presents a new objective function-based approach to identify a single good cluster in a data set making use of techniques known from prototype-based, noise and fuzzy clustering. The proposed method can either be applied in order to identify single clusters or to carry out a standard cluster analysis by finding clusters step by step and determining the number of clusters automatically in this way.

Understanding and Controlling the Membership Degrees in Fuzzy Clustering

F. Klawonn:

Fuzzy cluster analysis uses membership degrees to assign data objects to clusters in order to better handle ambiguous data that share properties of different clusters. However, the introduction of membership degrees requires a new parameter called fuzzifier. In this paper the good and bad effects of the fuzzifier on the clustering results are analysed and based on these considerations a more general approach to fuzzy clustering is proposed, providing better control on the membership degrees and their influence in fuzzy cluster analysis.

Automatically Determine Initial Fuzzy Partitions for Neuro-Fuzzy Classifiers

F. Klawonn, D.D. Nauck:

Learning a fuzzy classifier from data is a well-known technique in fuzzy data analysis and many learning algorithms have been proposed, typically in the area of neuro-fuzzy systems. All learning algorithms require a number of parameters to be set by the user. These are typically initial fuzzy partitions for all variables and sometimes also the number of fuzzy rules. Especially, for neuro-fuzzy algorithms the initial choice of parameters can be crucial and if ill-chosen may lead to failure of the learning algorithm. Recent trends in data analysis show that automation is an important issue because it helps to provide advanced analytics to users who are no data analysis experts. In order to fully automate a learning algorithm for fuzzy classifiers we preferably need an algorithm that can determine a suitable initial fuzzy partition for the learning algorithm to start with. In this paper we propose such an algorithm that we have implemented to extend the neuro-fuzzy approach NEFCLASS. NEFCLASS has recently been integrated into an automatic soft computing platform for intelligent data analysis (SPIDA).

POLARMAP - Effizient Visualisation of High Dimensional Data

F. Rehm, F. Klawonn, R. Kruse:

Mulidimensional scaling provides low-dimensional visualization of high-dimensional feature vectors. This is a very important step in data preprocessing because it helps the user to appraise which methods to use for further data analysis. But a well known problem with conventional MDS is the quadratic need of space and time. Beside this, a transformation of MDS must be completely recomputed if additional feature vectors have to be considered. The POLARMAP algorithm, presented in this paper, learns a function, similar to NeuroScale, but with lower computational costs, that maps high-dimensional feature vectors to a 2-dimensional feature space. With the obtained function even new feature vectors can be mapped to the target space.

Rule Classification Visualization of High-Dimensional Data

F. Rehm, F. Klawonn, R. Kruse:

This paper presents an approach to visualize high-dimensional fuzzy classification rules and the corresponding classified data set in the plane. This enables the observer to check visually to which degree a feature vector is classified by a certain rule. Also misclassified feature vectors can be well spotted and conflicting or error-prone rules can be identified.

Visualization of Single Clusters

F. Rehm, F. Klawonn, R. Kruse:

Evaluation of clustering partitions is a crucial step in data processing. A multitude of measures exists, which - unfortunately - give for one data set various results. In this paper we present a visualization technique to visualize single clusters of high-dimensional data. Our method maps single clusters to the plane trying to preserve membership degrees that describe a data point's gradual membership to a certain cluster. The resulting scatter plot illustrates separation of the respecting cluster and the need of additional prototypes as well. Since clusters will be visualized individually, additional prototypes can be added locally where they are needed.

What is Fuzzy About Fuzzy Clustering? Understanding and Improving the Concept of the Fuzzifier

F. Klawonn, F. Höppner:

The step from the well-known c-means clustering algorithm to the fuzzy c-means algorithm and its vast number of sophisticated extensions and generalisations involves an additional clustering parameter, the so called fuzzifier. This fuzzifier controls how much clusters may overlap. It also has some negative effects causing problems for clusters with varying data density, noisy data and large data sets with a higher number of clusters. In this paper we take a closer look at what the underlying general principle of the fuzzifier is. Based on these investigations, we propose an improved more general framework that avoids the undesired effects of the fuzzifier.

Fuzzy Clustering of Macroarray Data

O. Georgieva, F. Klawonn, E. Haertig:

The complete sequence of bacterial genomes provides new perspectives for the study of gene expression and gene function. DNA array experiments allow measuring the expression levels for all genes of an organism in a single hybridization experiment. Computational analysis of the macroarray data is used extensively to extract groups of similarly expressed genes. The aim is to organize DNA array data so that the underlying structures can be recognized and explored. The gene groups identified as clusters are searched for genes known to be involved in similar biological processes, implying that genes of unknown functions may be involved in the same processes. Commonly used computational techniques include hierarchical clustering, K-means clustering and self-organizing maps. These share many features, particularly the distance metric, which measures the relationship between samples or genes in the data space formed by the expression values. The output from different clustering algorithms usually depends more on the type of distance metric used than on any other factor. The central limitation of most of the commonly used algorithms is that they are unable to identify genes whose expression is similar to multiple, distinct gene groups, thereby masking the relationships between genes that are coregulated with different groups of genes in response to different conditions. For example, the K-means clustering partitions genes into a defined set of discrete clusters, attempting to maximize the expression similarity of the genes in each cluster assigning each gene to only one cluster, obscuring the relationship between the conditionally coregulated genes. The recently implemented heuristic variant of Fuzzy C-Means (FCM) clustering shows the advantage of the fuzzy clustering technique as a valuable tool for gene expression analysis as it presents overlapping clusters, pointing to distinct features of each gene's function and regulation. The paper presents a methodology to deal with macroarray data analysis. Each step of the data processing is described in detail. First, the crucial preprocessing step on the raw data to transform the data set into an appropriate and reliable form for clustering is applied. Secondly, subtractive clustering is implemented in order to obtain a good initial data partition. The obtained cluster centres are used to initialize the fuzzy C-means clustering algorithm by which the optimal values of the cluster centres and partition matrix are obtained. The partition matrix is used to determine the gene groups that belong to a given cluster with a prescribed membership degree. The proposed procedure is applied to macroarray data of B. subtilis. The obtained results show that informative clusters are obtained. The extracted gene clusters are overlapping, pointing to distinct aspects of the gene function and regulation.

MDS_polar: A new Approach for Dimension Reduction to Visualize High Dimensional Data

F. Rehm, F. Klawonn, R. Kruse:

Many applications in science and business such as signal analysis or costumer segmentation deal with large amounts of data which are usually high dimensional in the feature space. Before further analysis or processing of data is carried out with more sophisticated data mining techniques, data preprocessing and exploratory data analysis is an important step. As a part of this process, visualization of the data helps to decide which kind of method probably leads to good results. Since the visual assessment of a feature space that has more than three dimensions is not possible, it becomes necessary to find an appropriate visualization scheme for such datasets. The general data visualization problem we consider here is to map high dimensional data to a two-dimensional plane -- usually a computer screen -- trying to preserve as many properties or as much information of the high dimensional data as possible. In other words, we have to face a dimension reduction problem. A very simple approach is to look at scatter plots obtained from projections to two selected features. However, each scatter plot will only contain the information of the two chosen features and with a high number of features it is infeasible to inspect the scatter plots resulting from all possible combinations of two features. Before finding a mapping of the high dimensional data to the two-dimensional plane, an error or quality measure must be defined in order to evaluate the suitability of such a mapping. Principal component analysis is one possible choice, producing an affine transform that preserves as much of the variance in the data as possible. However, instead of the variance, other criteria like the distance between the single data vectors might be of higher interest. Multidimensional (see for instance \cite{kruskalwish78}) scaling is a technique that aims at preserving the distances between the data, when mapping them to lower dimensions. Although multidimensional scaling and related approaches yield promising and interesting results, they suffer from high computational needs concerning memory as well as computation time. In this paper we present a new approach for dimension reduction to visualize high dimensional data. Instead of trying to preserve the distances between feature vectors directly, our algorithm transforms high dimensional feature vectors into two-dimensional feature vectors under the constraints that the length of each vector is preserved and that the angles between vectors approximate the corresponding angles in the high dimensional space as good as possible, enabling us to come up with an efficient computing scheme. After a brief review of the concept of multidimensional scaling and related approaches, we explain the theoretical background of our approach and discuss some illustrative examples in the end of the paper.

Learning Methods for Air Traffic Management

F. Rehm, F. Klawonn:

Weather is an important source of delay for aircraft. Recent studies have shown that certain weather factors have significant influence on air traffic. More than 50\% of all delay accounts to weather and causes among others high costs to airlines and passengers. In this work we will show to what extent weather factors in the closer region of Frankfurt Airport have an impact on the delay of flights. Besides the results of a linear regression model we will also present the results of some modern data mining approaches, such as regression trees and fuzzy clustering techniques. With the clustering approach we will show that several weather conditions have a similar influence on the delay of flights. Our analyses focus on the delay that will be explicitly caused by weather factors in the vicinity of the airport, the so-called terminal management area (TMA). Thus, delay caused by weather at the departure airport or by other circumstances during the flight will not bias our results. With our methods it becomes possible to predict the delay of flights if certain weather factors are known. We will specify these factors and quantify their effects on delay.

A Classifier System-Based Learning Algorithm for Interpretable, Adaptive Nearest Neighbour Classifiers

F. Klawonn, K. Tschumitschew:

Nearest neighbour classifiers provide intuitive classification decisions in the sense that they determine the class of an unknown object on the basis of the most similar cases in a sample database. However, in order to reach a good interpretability of a nearest neighbour classifier, two important aspects should be taken into account: The sample database should be kept reasonably small and the similarity or distance measure should be transparent. Especially, when objects with many attributes are considered, a distance measure like the Euclidean distance does not always lead to intuitive results. In this paper, we present an approach to construct a nearest neighbour classifier that tries to take care of these two aspects. The sample database is kept small by adopting ideas from classifier systems. We use an adaptive distance measure that enables us to explain the similarity measure and the classification decision in terms of simple fuzzy rules.

Learning Rules about the Development of Variables over Time

F. Höppner, F. Klawonn:

The human approach to learning from time series is guided by the impressive pattern recognition capabilities of the human brain. Rather than looking at quantitative values, a human automatically segments the time series approriately and acharacterizes or classifies the segments. Similarity of time series is then decided on the basis of this abstracted representation. In this chapter, we propose an algorithm to support a human in this approach to dealing with large data sets. As an example, we consider the problem of deriving local weather forecasting rules that allow us to conclude from the qualitative behavior of the air-pressure curve to the wind strength. Formally, we learn these rules from a single series of labeled intervals, whcihas been obtained from (multivariate) time series by extracting various features of interest, for instance, segments of increasing and decreasing local trends. We seek the identification of frequent local patterns in this state series. A temporal pattern is defined as a set of states together with their interval relationships described in terms of Allen's interval logic. In the spirit of association rule mining, we propose an algorithm to discover frequent temporal patterns and to generate temporal rules. We discuss an efficient algorithm for frequent pattern discovery in detail.

Fuzzy Clustering of Short Time-Series and Unevenly Distributed Sampling Points

C.S. Möller-Levet, F. Klawonn, K.-H. Cho, O. Wolkenhauer:

This paper proposes a new clustering algorithm in the fuzzy-c-means family, which is designed to cluster time series and is particularly suited for short time series and those with unevenly spaced sampling points. Short time series, which do not allow a conventional statistical model, and unevenly sampled time series appear in many practical situations. The algorithm developed here is motivated by experiments in biology. Conventional clustering algorithms based on the Euclidean distance or the Pearson correlation coefficient, such as hard k-means or hierarchical clustering are not able to include the temporal information in the distance measurement. Uneven sampling commonly occurs in biological experiments. The temporal order of the data is important and the varying length of sampling intervals should be considered in clustering time series. The proposed short time series (STS) distance is able to measure similarity of shapes which are formed by the relative change of amplitude and the corresponding temporal information. We develop a fuzzy time series (FSTS) clustering algorithm by incorporating the STS distance into the standard fuzzy clustering scheme. An example is provided to illustrate the performance of the proposed FSTS clustering algorithm in comparison with fuzzy c-means, k-means and single linkage hierarchical clustering.