Post a Project

RedPel

Subtitle

Including "Parallel & Distributed System"

1. Use of FCM & fuzzy min max algorithm in lung cancer
Project Code : JDM1601                    Year : 2016 (IEEE)

Abstract— Lung cancer is a disease characterized by uncontrolled cell growth in tissues of the lung and is the most common fatal malignancy in both men and women. Early detection and treatment of lung cancer can greatly improve the survival rate of patient. Artificial Neural Network (ANN), Fuzzy C-Mean (FcM) and Fuzzy Min-Max Neural network (FMNN) are useful in medical diagnosis because of several advantages. Like ANN has fault tolerance, flexibility, non linearity, while FcM gives best result for overlapped data set, data point may belong to more then one cluster center and always converges .and , also, FMNN has advantages like online adaptation, non-linear separability, less training time, soft and hard decision. In this work, we propose to use FcM and FMNN on standard datasets, to detect lung cancer

2. Systematic prediction of keywords over IMDB Database.
Project Code : JDM1602                       Year : 2016 (IEEE)

ABSTRACT: Keyword queries on databases provide easy access to data, but often suffer from low ranking quality, i.e., low precision and/or recall, as shown in recent benchmarks. It would be useful to identify queries that are likely to have low ranking quality to improve the user satisfaction. For instance, the system may suggest to the user alternative queries for such hard queries. In this paper, we analyze the characteristics of hard queries and propose a novel framework to measure the degree of difficulty for a keyword query over a database, considering both the structure and the content of the database and the query results. We evaluate our query difficulty prediction model against two effectiveness benchmarks for popular keyword search ranking methods. Our empirical results show that our model predicts the hard queries with high accuracy. Further, we present a suite of optimizations to minimize the incurred time overhead.

3. Performance Evaluation and Estimation Model Using Regression Method for Hadoop WordCount .
Project Code : JDM1603                          Year : 2016 (IEEE)

ABSTRACT :  Given the rapid growth in cloud computing, it is important to analyze the performance of different Hadoop MapReduce applications and to understand the performance bottleneck in a cloud cluster that contributes to higher or lower performance. It is also important to analyze the underlying hardware in cloud cluster servers to enable the optimization of software and hardware to achieve the maximum

performance possible. Hadoop is based on MapReduce, which is one of the most popular programming models for big data analysis in a parallel computing environment. In this paper, we present a detailed performance analysis, characterization, and evaluation of Hadoop MapReduce WordCount application.

We also propose an estimation model based on Amdahl's law regression method to estimate performance and total processing time versus different input sizes for a given processor architecture. The estimation regression model is veried to estimate performance and run time with an error margin of <5%.

4. An Efficient Privacy-Preserving Ranked Keyword Search Method
Project Code : JDM1604                      Year : 2016 (IEEE)

Abstract —Cloud data owners prefer to outsource documents in an encrypted form for the purpose of privacy preserving. Therefore it is essential to develop efficient and reliable ciphertext search techniques. One challenge is that the relationship between documents will be normally concealed in the process of encryption, which will lead to significant search accuracy performance degradation. Also the volume of data in data centers has experienced a dramatic growth. This will make it even more challenging to design ciphertext search schemes that can provide efficient and reliable online information retrieval on large volume of encrypted data. In this paper, a hierarchical clustering method is proposed to support more search semantics and also to meet the demand for fast ciphertext search within a big data environment. The proposed hierarchical approach clusters the documents based on the minimum relevance threshold, and then partitions the resulting clusters into sub-clusters until the constraint on the maximum size of cluster is reached. In the search phase, this approach can reach a linear computational complexity against an exponential size increase of document collection. In order to verify the authenticity of search results, a structure called minimum hash sub-tree is designed in this paper. Experiments have been conducted using the collection set built from the IEEE Xplore. The results show that with a sharp increase of documents in the dataset the search time of the proposed method increases linearly whereas the search time of the traditional method increases exponentially. Furthermore, the proposed method has an advantage over the traditional method in the rank privacy and relevance of retrieved

documents.

5. PRISM: PRivacy-aware Interest Sharing and Matching in Mobile Social Networks.
Project Code : JDM1605                      Year : 2016 (IEEE)

Abstract —In a profile matchmaking application of mobile social networks, users need to reveal their interests to each other in order to find the common interests. A malicious user may harm a user by knowing his personal information. Therefore,

mutual interests need to be found in a privacy preserving manner. In this paper, we propose an efficient privacy protection and interests sharing protocol referred to as PRivacy-aware Interest Sharing and Matching (PRISM). PRISM enables users

to discover mutual interests without revealing their interests. Unlike existing approaches, PRISM does not require revealing the interests to a trusted server. Moreover, the protocol considers attacking scenarios that have not been addressed previously and provides an efficient solution. The inherent mechanism reveals any cheating attempt by a malicious user. PRISM also proposes the procedure to eliminate Sybil attacks. We analyze the security of PRISM against both passive and active attacks. Through implementation, we also present a detailed analysis of the performance of PRISM and compare it with existing approaches. The results show the effectiveness of PRISM without any significant performance degradation.

6. Mapping Bug Reports to Relevant Files  using instant selection and feature selection.
Project Code : JDM1606                    Year : 2016 (IEEE)

Abstract—

Open source projects for example Eclipse and Firefox have open source bug repositories. User reports bugs to these repositories. Users of these repositories are usually non-technical and cannot assign correct class to these bugs.

Triaging of bugs, to developer, to fix them is a tedious and time consuming task. Developers are usually expert in particular areas. For example, few developers are expert in GUI and others are in java functionality. Assigning a particular bug to relevant developer could save time and would help to maintain the interest level of developers by assigning bugs according to their interest. However, assigning right bug to right developer is quite difficult for tri-ager without knowing the actual class, the bug belongs to. In this research, we have classified the bugs in different labels on the basis of summary of the bug. Multinomial Naïve Bayes text classifier is used for classification purpose. For feature selection, Chi-Square and TFIDF algorithms were used. Using Naïve Bayes and Chi- square, we get average of 83 % accuracy.

7. Inference Patterns from Big Data using Aggregation, Filtering
and Tagging- A Survey.
Project Code : JDM1607                      Year : 2016 (IEEE)

Abstract : This paper reviews various approaches to infer the patterns from Big Data using aggregation, filtering and tagging. Earlier research shows that data aggregation concerns about gathered data and how efficiently it can be utilized. It is understandable that at the time of data gathering one does not care much about whether the gathered data will be useful or not. Hence, filtering and tagging of the data are the crucial steps in collecting the relevant data to fulfill the need. Therefore the main goal of this paper is to present a detailed and comprehensive survey on different approaches. To make the concept clearer, we have provided a brief introduction of Big Data, how it works, working of two data aggregation tools (namely, flume and sqoop), data processing tools (hive and mahout) and various algorithms that can be useful to understand the topic. At last we have included comparisons between aggregation tools, processing tools as well as various algorithms through its pre-process, matching time, results and reviews.

Publish

8. Outsourced Similarity Search on Metric Data Assets.
Project Code : JDM1608                     Year : 2016 (IEEE)

ABSTRACT:

This paper considers a cloud computing setting in which similarity querying of

metric data is outsourced to a service provider. The data is to be revealed only to

trusted users, not to the service provider or anyone else. Users query the server for

the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low-initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential. Given this setting, the paper presents techniques that transform the data prior to supplying it to the service provider for similarity queries on the transformed data. Our techniques provide interesting trade-offs between query cost and accuracy. They are then further extended to offer an intuitive privacy guarantee. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of similarity queries.

9. CCD: A Distributed Publish/Subscribe Framework for Rich Content Formats.
Project Code : JDM1609                      Year : 2016 (IEEE)

Abstract:

In this paper, we propose a content-based publish/subscribe (pub/sub) framework that delivers matching content to subscribers in their desired format. Such a framework enables the pub/sub system to accommodate richer content formats including multimedia publications with image and video content. In our proposed framework, users (consumers) in addition to specifying their information needs (subscription queries), also specify their profile which includes the information about their receiving context which includes characteristics of the device used to receive the content (e.g., resolution of a PDA used by a consumer). The pub/sub system besides being responsible for matching and routing the published content, also becomes responsible for converting the content into the suitable format for each user. Content conversion is achieved through a set of content adaptation operators (e.g., image transcoder, document translator, etc.). We study algorithms for placement of such operators in heterogeneous pub/sub broker overlay in order to minimize the communication and computation resource consumption. Our experimental results

show that careful placement of operators in pub/sub overlay network results in significant cost reduction.

10. Measuring the Sky: On Computing Data Cubes via Skylining the Measures.
Project Code : JDM1610                     Year : 2016 (IEEE)

ABSTRACT:

Data cube is a key element in supporting fast OLAP. Traditionally, an aggregate

function is used to compute the values in data cubes. In this paper, we extend the

notion of data cubes with a new perspective. Instead of using an aggregate function, we propose to build data cubes using the skyline operation as the “aggregate function.” Data cubes built in this way are called “group-by skylincubes” and can support a variety of analytical tasks. Nevertheless, there are several challenges in implementing group-by skyline cubes in data warehouses: 1) the skyline operation is computational intensive, 2) the skyline operation is holistic, and 3) a group-by skyline cube contains both grouping and skyline dimensions, rendering it infeasible to pre-compute all cuboids in advance. This paper gives details on how to store, materialize, and query such cubes.

11. Finding Frequently Occurring Item set Pair On Big Data.
Project Code : JDM1611                      Year : 2016 (IEEE)

Abstract—Frequent Itemset Mining (FIM) is one of the most well known techniques to extract knowledge from data. The combinatorial explosion of FIM methods become even more problematic when they are applied to Big Data. Fortunately, recent improvements in the field of parallel programming already provide good tools to tackle this problem. However, these tools come with their own technical challenges, e.g. balanced data distribution and inter-communication costs. In this paper, we investigate the applicability of FIM techniques on the MapReduce platform. We introduce two new methods for mining large datasets: Dist-Eclat focuses on speed while BigFIM is optimized to run on really large datasets. In our experiments we show the scalability of our methods.

12. Mining Social Media for Understanding Students’ Learning Experiences.
Project Code : JDM1612                    Year : 2016 (IEEE)

Abstract—Students’ informal conversations on social media (e.g., Twitter, Facebook) shed light into their educational experiences— opinions, feelings, and concerns about the learning process. Data from such uninstrumented environments can provide valuable knowledge to inform student learning. Analyzing such data, however, can be challenging. The complexity of students’ experiences reflected from social media content requires human interpretation. However, the growing scale of data demands automatic data analysis techniques. In this paper, we developed a workflow to integrate both qualitative analysis and large-scale data mining techniques. We focused on engineering students’ Twitter posts to understand issues and problems in their educational experiences. We first conducted a qualitative analysis on samples taken from about 25,000 tweets related to engineering students’ college life. We found engineering students encounter problems such as heavy study load, lack of social engagement, and sleep deprivation. Based on these results, we implemented a multi-label classification algorithm to classify tweets reflecting students’ problems. We then used the algorithm to train a detector of student problems from about 35,000 tweets streamed at the geo-location of Purdue University. This work, for the first time, presents a methodology and results that show how informal social media data can provide insights into students’ experiences.

13. Private Search and Content Protecting Location Based Queries on google map.
Project Code : JDM1613                      Year : 2016 (IEEE)

ABSTRACT:
In this paper we present a solution to one of the location-based query problems. This problem is defined as follows: (i) a user wants to query a database of location data, known as Points Of Interest (POIs), and does not want to reveal his/her location to the server due to privacy concerns; (ii) the owner of the location data, that is, the location server, does not want to simply distribute its data to all users. The location server desires to have some control over its data, since the data is its asset. We propose a major enhancement upon previous solutions by introducing a two stage approach, where the first step is based on Oblivious Transfer and the second step is based on Private Information Retrieval, to achieve a secure solution for both parties. The solution we present is efficient and practical in many scenarios. We implement our solution on a desktop machine and a mobile device to assess the efficiency of our protocol. We also introduce a  security model and analyse the security in the context of our protocol. Finally, we highlight a security weakness of our previous work and present a solution to overcome it.

14. CLUSTBIGFIM-FREQUENT ITEMSET MINING OF BIG DATA USING PRE-PROCESSING BASED ON MAPREDUCE FRAMEWORK.
Project Code : JDM1614                     Year : 2016 (IEEE)

ABSTRACT:  
Now a day enormous amount of data is getting explored through Internet of Things (IoT) as technologies are advancing and people uses these technologies in day to day activities, this data is termed as Big Data having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets but it has large communication cost which reduces execution efficiency. This proposed new pre-processed k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using kmeans algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets from generated clusters using MapReduce programming model. Results shown that execution efficiency of ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as one of the pre-processing technique

15. Clustering and Sequential Pattern Mining of Online Collaborative Learning Data.
Project Code : JDM1615                     Year : 2016 (IEEE)

Abstract : Group work is widespread in education. The growing use of online tools supporting group work generates huge amounts of data. We aim to exploit this data to support mirroring: presenting useful high-level views of information about the group, together with desired patterns characterizing the behavior of strong groups. The goal is to enable the groups and their facilitators to see relevant aspects of the group's operation and provide feedback if these are more likely to be associated with positive or negative outcomes and indicate where the problems are. We explore how useful mirror information can be extracted via a theory-driven approach and a range of clustering and sequential pattern mining. The context is a senior software development project where students use the collaboration tool TRAC. We extract patterns distinguishing the better from the weaker groups and get insights in the success factors. The results point to the importance of leadership and group interaction, and give promising indications if they are occurring. Patterns indicating good individual practices were also identified. We found that some key measures can be mined from early data. The results are promising for advising groups at the start and early identification of effective and poor practices, in time for remediation.

16. Monitoring online Test.
Project Code : JDM1616                     Year : 2016 (IEEE)

Abstract : E-testing systems are widely adopted in academic environments, as well as in combination with other assessment means, providing tutors with powerful tools to submit different types of tests in order to assess learners’ knowledge. Among these, multiple- choice tests are extremely popular, since they can be automatically corrected. However, many learners do not welcome this type of test, because often, it does not let them properly express their capacity, due to the characteristics of multiple-choice questions of being closed-ended. Even many examiners doubt about the real effectiveness of structured tests in assessing learners’ knowledge, and they wonder whether learners are more conditioned by the question type than by its actual difficulty.
      In this project, we propose a data exploration approach exploiting information visualization in order to involve tutors in a visual data mining process aiming to detect structures, patterns, and relations between data, which can potentially reveal previously unknown knowledge inherent in tests, such as the test strategies used by the learners, correlations among different questions, and many other aspects, including their impact on the final score .It captures the occurrence of question browsing and answering events by the learners and uses these data to visualize charts containing a chronological review of tests. Other than identifying the most frequently employed strategies, the tutor can determine their effectiveness by correlating their use with the final test scores.

17. profile matching in social networking.
Project Code : JDM1617                      Year : 2016 (IEEE)

ABSTRACT :  In this paper, we study user profile matching with privacy-preservation in mobile social networks (MSNs) and introduce a family of novel profile matching protocols. We first propose an explicit Comparison-based Profile Matching protocol (eCPM) which runs between two parties, an initiator and a responder. The eCPM enables the initiator to obtain the comparison-based matching result about a specified attribute in their profiles, while preventing their attribute values from disclosure. We then propose an implicit Comparison-based Profile Matching protocol (iCPM) which allows the initiator to directly obtain some messages instead of the comparison result from the responder. The messages unrelated to user profile can be divided into multiple categories by the responder. The initiator implicitly chooses the interested category which is unknown to the responder. Two messages in each category are prepared by the responder, and only one message can be obtained by the initiator according to the comparison result on a single attribute. We further generalize the iCPM to an implicit Predicate-based Profile Matching protocol (iPPM) which allows complex comparison criteria spanning multiple attributes. The anonymity analysis shows all these protocols achieve the confidentiality of user profiles. In addition, the eCPM reveals the comparison result to the initiator and provides only conditional anonymity; the iCPM and the iPPM do not reveal the result at all and provide full anonymity. We analyze the communication overhead and the anonymity strength of the protocols.

18. Analysis of twitter trends based on key detection and link detection.
Project Code : JDM1618                     Year : 2016 (IEEE)

ABSTRACT:
Detection of emerging topics is now receiving renewed interest motivated by the rapid growth of social networks. Conventional-term-frequency-based approaches may not be appropriate in this context, because the information exchanged in social-network posts include not only text but also images, URLs, and videos. We focus on emergence of topics signaled by social aspects of theses networks. Specifically, we focus on mentions of user links between users that are generated dynamically (intentionally or unintentionally) through replies, mentions, and retweets. We propose a probability model of the mentioning behavior of a social network user, and propose to detect the emergence of a new topic from the anomalies measured through the model. Aggregating anomaly scores from hundreds of users, we show that we can detect emerging topics only based on the reply/mention relationships in social-network posts. We demonstrate our technique in several real data sets we gathered from Twitter. The experiments show that the proposed mention-anomaly-based approaches can detect new topics at least as early as text-anomaly-based approaches, and in some cases much earlier when the topic is poorly identified by the textual contents in posts.

19. Big Data Frequent Pattern Mining.
Project Code : JDM1619                      Year : 2016 (IEEE)

Abstract : 

Frequent pattern mining is an essential data mining task, with a goal of discovering knowledge in the form of repeated patterns. Many efficient pattern mining algorithms have been discovered in the last two decades, yet most do not scale to the type of data we are presented with today, the so-called “Big Data”. Scalable parallel algorithms hold the key to solving the problem in this context. In this chapter, we review recent advances in parallel frequent pattern mining, analyzing them through the Big Data lens. We identify three areas as challenges to designing parallel frequent pattern mining algorithms: memory scalability, work partitioning, and load balancing. With these challenges as a frame of reference, we extract and describe key algorithmic design patterns from the wealth of research conducted in this domain.

20. Bootstrapping Privacy ontology for web services.
Project Code : JDM1620                      Year : 2016 (IEEE)

ABSTRACT: Ontologies have become the de-facto modeling tool of choice, employed in many applications and prominently in the semantic web. Nevertheless, ontology construction remains a daunting task. Ontological bootstrapping, which aims at automatically generating concepts and their relations in a given domain, is a promising technique for ontology construction. Bootstrapping an ontology based on a set of predefined textual sources, such as web services, must address the problem of multiple, largely unrelated concepts. In this paper, we propose an ontology bootstrapping process for web services. We exploit the advantage that web services usually consist of both WSDL and free text descriptors. The WSDL descriptor is evaluated using two methods, namely Term Frequency/Inverse Document Frequency (TF/IDF) and web context generation. Our proposed ontology bootstrapping process integrates the results of both methods and applies a third method to validate the concepts using the service free text descriptor, thereby offering a more accurate definition of ontologies. We extensively validated our bootstrapping method using a large repository of real-world web services and verified the results against existing ontologies. The experimental results indicate high precision. Furthermore, the recall versus precision comparison of the results when each method is separately implemented presents the advantage of our integrated bootstrapping approach

21. Then and Now: On the Maturity of the Cybercrime Markets.
Project Code : JDM1621                      Year : 2016 (IEEE)

ABSTRACT: Due to the rise and rapid growth of E-Commerce, use of credit cards for online purchases has dramatically increased and it caused an explosion in the credit card fraud. As credit card becomes the most popular mode of payment for both online as well as regular purchase, cases of fraud associated with it are also rising. In real life, fraudulent transactions are scattered with genuine transactions and simple pattern matching techniques are not often sufficient to detect those frauds accurately. Implementation of efficient fraud detection systems has thus become imperative for all credit card issuing banks to minimize their losses. Many modern techniques based on Artificial Intelligence, Data mining, Fuzzy logic, Machine learning, Sequence Alignment, Genetic Programming etc., has evolved in detecting various credit card fraudulent transactions. A clear understanding on all these approaches will certainly lead to an efficient credit card fraud detection system. This paper presents a survey of various techniques used in credit card fraud detection mechanisms and evaluates each methodology based on certain design criteria.

22. Social Set Analysis: A Set Theoretical Approach to Big Data Analytics.
Project Code : JDM1622                     Year : 2016 (IEEE)

ABSTRACT :  Current analytical approaches in computational social science can be characterized by four dominant paradigms: text analysis (information extraction and classification), social network analysis (graph theory), social complexity analysis (complex systems science), and social simulations (cellular automata and agent-based modeling). However, when it comes to organizational and societal units of analysis, there exists no approach to conceptualize, model, analyze, explain, and predict social media interactions as individuals’ associations with ideas, values, identities, and so on. To address this limitation, based on the sociology of associations and the mathematics of set theory, this paper presents a new approach to big data analytics called social set analysis. Social set analysis consists of a generative framework for the philosophies of computational social science, theory of social data, conceptual and formal models of social data, and an analytical framework for combining big social data sets with organizational and societal data sets. Three empirical studies of big social data are presented to illustrate and demonstrate social set analysis in terms of fuzzy set-theoretical sentiment analysis, crisp set-theoretical interaction analysis, and eventstudies-oriented set-theoretical visualizations. Implications for big data analytics, current limitations of the set-theoretical approach, and future directions are outlined.

23. Personalized Travel Sequence Recommendation on Multi-Source Big Social Media.
Project Code : JDM1622                      Year : 2016 (IEEE)

ABSTRACT:
Recent years have witnessed an increased interest in recommender systems. Despite significant progress in this field, there still remain numerous avenues to explore. Indeed, this paper provides a study of exploiting online travel information for personalized travel package recommendation. A critical challenge along this line is to address the unique characteristics of travel data, which distinguish travel packages from traditional items for recommendation. To that end, in this paper, we first analyze the characteristics of the existing travel packages and develop a tourist-area-season topic (TAST) model. This TAST model can represent travel packages and tourists by different topic distributions, where the topic extraction is conditioned on both the tourists and the intrinsic features (i.e., locations, travel seasons) of the landscapes. Then, based on this topic model representation, we propose a cocktail approach to generate the lists for personalized travel package recommendation. Furthermore, we extend the TAST model to the tourist-relation-area-season topic (TRAST) model for capturing the latent relationships among the tourists in each travel group. Finally, we evaluate the TAST model, the TRAST model, and the cocktail recommendation approach on the real-world travel package data. Experimental results show that the TAST model can effectively capture the unique characteristics of the travel data and the cocktail approach is, thus, much more effective than traditional recommendation techniques for travel package recommendation. Also, by considering tourist relationships, the TRAST model can be used as an effective assessment for travel group formation.

24. A Parallel Patient Treatment Time Prediction Algorithm and Its Applications in Hospital Queuing-Recommendation in a Big Data Environment.
Project Code : JDM1623                     Year : 2016 (IEEE)

Abstract : There is a need of continuous monitoring of vitalparameters of patient at critical situation. The current scenario in hospital has a digital display for such parameters which is observed by nurse. For such monitoring a dedicated person(nurse) is required. But looking at the growing population this ratio of one nurse per patient would be a considerable probable in future. So manually monitoring the patient should be replaced by some other method. Online monitoring has attracted considerable attraction for many years. It includes the applications which are not only limited up to industrial process monitoring and control but has been extended up to civilian application areas like healthcare application, home automation, traffic control etc. This paper discusses the feasibility of Instant Notification System in Heterogeneous Sensor Network with Deployment of XMPP Protocol for medical application. The system aims to provide an environment which enables medical practitioners to distantly monitor various vital parameters of patients. For academic purpose we have limited this system for use of monitoring patients’ body temperature and blood pressure. The proposed system collects data from various heterogeneous sensor networks – for example: patients’ body temperature, and blood pressure - converts it to a standard packet and provides the facility to send it over a network using Extensible Messaging and Presence Protocol (XMPP)- (in more common terms Instant Messaging (IM)). Use of heterogeneous sensor networks (HSN) provides the much required platform independence, while XMPP enables the instant notification

25. Relevance Feature Discovery for Text Mining.
Project Code : JDM1624                      Year : 2016 (IEEE)

Abstract —It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of large scale terms and data patterns. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, there has been often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences; yet, how to effectively use large scale patterns remains a hard problem in text mining. To make a breakthrough in this challenging issue, this paper presents an innovative model for relevance feature discovery. It discovers both positive and negative patterns in text documents as higher level features and deploys them over low-level features (terms). It also classifies terms into

categories and updates term weights based on their specificity and their distributions in patterns. Substantial experiments using this model on RCV1, TREC topics and Reuters-21578 show that the proposed model significantly outperforms both the state-of-the-art term-based methods and the pattern based methods.

26. A Novel Methodology of Frequent Itemset Mining on Hadoop.
Project Code : JDM1626                      Year : 2016 (IEEE)

Abstract — Frequent Itemset Mining is one of the classical data mining problems in most of the data mining applications. It requires very large computations and I/O traffic capacity. Also resources like single processor’s memory and CPU are very limited, which degrades the performance of algorithm. In this paper we have proposed one such distributed algorithm which will run on Hadoop – one of the recent most popular distributed frameworks which mainly focus on mapreduce paradigm. The proposed approach takes into account inherent characteristics of the Apriori algorithm related to the frequent itemset generation and through a block-based partitioning uses a dynamic workload management. The algorithm greatly enhances the performance and achieves high scalability compared to the existing distributed Apriori based approaches. Proposed algorithm is implemented and tested on large scale datasets distributed over a cluster.

27. Online java compiler .
Project Code : JDM1624                   Year : 2016 (IEEE)

Abstract
As it is a competitive world and very fast world, every thing in the universes is to be internet. In this internet world all the things are on-line. So we created software called “On-line java compiler with security editor”.
The main aim of this project we can easily to write a java program and compile it and debug in on-line. The client machine doesn’t having java development kit .The client machine only connected to the server.  The server having java compiler .so server executes the java code and produce the error message to the appropriate client machine.
In this project is also creating a security editor. This editor performs Encrypt and decrypts the file. Encryption and decryption process perform using RSA Algorithms. There is lot of security algorithms are there, but RSA algorithm is very efficient to encrypt and decrypt the file.
In this project is used to view all type of java API .It is very useful for writing the java program easily, for example if any error in the format of API means we can able to view API throw this modules.

28. A Cloud Service Architecture for Analyzing Big Monitoring Data.
Project Code : JDM1628                     Year : 2016 (IEEE)

Abstract: Cloud monitoring is of a source of big data that are constantly produced from traces of infrastructures, platforms, and applications. Analysis of monitoring data delivers insights of the system’s workload and usage pattern and ensures workloads are operating at optimum levels. The analysis process involves data query and

extraction, data analysis, and result visualization. Since the volume of monitoring data is big, these operations require a scalable and reliable architecture to extract, aggregate, and analyze data in an arbitrary range of granularity. Ultimately, the results of analysis become the knowledge of the system and should be shared and

communicated. This paper presents our cloud service architecture that explores a search cluster for data indexing and query. We develop REST APIs that the data can be accessed by different analysis modules. This architecture enables extensions to integrate with software frameworks of both batch processing (such as Hadoop) and stream processing (such as Spark) of big data. The analysis results are structured in Semantic Media Wiki pages in the context of the monitoring data source and the analysis process. This cloud architecture is empirically assessed to evaluate its responsiveness when processing a large set of data records under node failures.

29. A Tutorial on Secure Outsourcing of Large-scale Computations for Big Data.
Project Code : JDM1629                   Year : 2016 (IEEE)

ABSTRACT:  Today's society is collecting a massive and exponentially growing amount of data that can potentially revolutionize scientic and engineering elds, and promote business innovations.With the advent of cloud computing, in order to analyze data in a cost-effective and practical way, users can outsource their computing tasks to the cloud, which offers access to vast computing resources on an on-demand and pay-per-use basis. However, since users' data contains sensitive information that needs to be kept secret for ethical, security, or legal reasons, many users are reluctant to adopt cloud computing. To this end, researchers have proposed techniques that enable users to ofoad computations to the cloud while protecting their data privacy. In this paper, we review the recent advances in the secure outsourcing of large-scale computations for a big data analysis. We rst introduce two most fundamental and common computational problems, i.e., linear algebra and optimization, and then provide an extensive review of the data privacy preserving techniques. After that, we explain how researchers have exploited the data privacy preserving techniques to construct secure outsourcing algorithms for large-scale computations.

30. Protection of Big Data Privacy.
Project Code : JDM1630                 Year : 2016 (IEEE)

ABSTRACT :  In recent years, big data have become a hot research topic. The increasing amount of big data also increases the chance of breaching the privacy of individuals. Since big data require high computational power and large storage, distributed systems are used. As multiple parties are involved in these systems, the

risk of privacy violation is increased. There have been a number of privacy-preserving mechanisms developed for privacy protection at different stages (e.g., data generation, data storage, and data processing) of a big data life cycle. The goal of this paper is to provide a comprehensive overview of the privacy preservation

mechanisms in big data and present the challenges for existing mechanisms. In particular, in this paper, we illustrate the infrastructure of big data and the state-of-the-art privacy-preserving mechanisms in each stage of the big data life cycle. Furthermore, we discuss the challenges and future research directions related to

privacy preservation in big data.

31. Towards a Virtual Domain Based authentication on mapreduce.
Project Code : JDM1631                      Year : 2016 (IEEE)

ABSTRACT :  This paper has proposed a novel authentication solution for the MapReduce (MR) model, a new distributed and parallel computing paradigm commonly deployed to process BigData by major IT players, such as Facebook and Yahoo. It identies a set of security, performance, and scalability requirements that are

specied from a comprehensive study of a job execution process using MR and security threats and attacks in this environment. Based on the requirements, it critically analyzes the state-of-the-art authentication solutions, discovering that the authentication services currently proposed for the MR model is not adequate.

This paper then presents a novel layered authentication solution for the MR model and describes the core components of this solution, which includes the virtual domain based authentication framework (VDAF). These novel ideas are signicant, because, rst, the approach embeds the characteristics of MR-in-cloud deployments into security solution designs, and this will allow the MR model be delivered as a software as a service in a public cloud environment along with our proposed authentication solution; second, VDAF supports the authentication of every interactions by any MR components involved in a job execution ow, so long as the interactions are for accessing resources of the job; third, this continuous authentication service is provided in such a manner that the costs incurred in providing the authentication service should be as low as possible.

32. Predicting Instructor Performance Using Data Mining Techniques in Higher Education.
Project Code : JDM1632                Year : 2016 (IEEE)

ABSTRACT : Data mining applications are becoming a more common tool in understanding and solving educational and administrative problems in higher education. In general, research in educational mining focuses on modeling student's performance instead of instructors' performance. One of the common tools to evaluate instructors' performance is the course evaluation questionnaire to evaluate based on students' perception. In this paper, four different classication techniquesdecision tree algorithms, support vector machines, articial neural networks, and discriminant analysisare used to build classier models. Their

performances are compared over a data set composed of responses of students to a real course evaluation questionnaire using accuracy, precision, recall, and specicity performance metrics. Although all the classier models show comparably high classication performances, C5.0 classier is the best with respect to accuracy, precision, and specicity. In addition, an analysis of the variable importance for each classier model is done. Accordingly, it is shown that many of the questions in the course evaluation questionnaire appear to be irrelevant. Furthermore, the analysis shows that the instructors' success based on the students' perception mainly depends on the interest of the students in the course. The ndings of this paper indicate the effectiveness and expressiveness of data mining models in course evaluation and higher education mining. Moreover, these ndings may be used to improve the measurement instruments.

33. Intra- and Inter-Fractional Variation Prediction of Lung Tumors Using Fuzzy Deep Learning.
Project Code : JDM1633          Year : 2016 (IEEE)

ABSTRACT : Tumor movements should be accurately predicted to improve delivery accuracy and reduce unnecessary radiation exposure to healthy tissue during radiotherapy. The tumor movements pertaining to respiration are divided into intra-fractional variation occurring in a single treatment session and inter- fractional variation arising between different sessions. Most studies of patients' respiration movements deal with intra-fractional variation. Previous studies on inter-fractional variation are hardly mathematized and cannot predict movements well due to inconstant variation. Moreover, the computation time of the prediction should be reduced. To overcome these limitations, we propose a new predictor for intra- and inter-fractional data variation, called intra- and inter-fraction fuzzy deep learning (IIFDL), where FDL, equipped with breathing clustering, predicts the movement accurately and decreases the computation time. Through the experimental results, we validated that the IIFDL improved root-mean-square error (RMSE) by 29.98% and

prediction overshoot by 70.93%, compared with existing methods. The results also showed that the IIFDL enhanced the average RMSE and overshoot by 59.73% and 83.27%, respectively. In addition, the average computation time of IIFDL was 1.54 ms for both intra- and inter-fractional variation, which was much smaller than the existing methods. Therefore, the proposed IIFDL might achieve real-time estimation as well as better tracking techniques in radiotherapy.

34. Web Service Personalized Quality of Service Prediction via Reputation-Based Matrix Factorization.
Project Code : JDM1634                   Year : 2016 (IEEE)

Abstract—With the fast development of Web services in serviceoriented

systems, the requirement of efficient Quality of Service (QoS) evaluation methods becomes strong. However, many QoS values are unknown in reality. Therefore, it is necessary to predict the unknown QoS values of Web services based on the obtainable QoS values. Generally, the QoS values of similar users are employed to make predictions for the current user. However, the QoS values may be contributed from unreliable users, leading to inaccuracy of the prediction results. To address this problem, we present a highly credible approach, called reputation-based Matrix

Factorization (RMF), for predicting the unknown Web service QoS values. RMF first calculates the reputation of each user based on their contributed QoS values to quantify the credibility of users, and then takes the users' reputation into consideration for achieving more accurate QoS prediction. Reputation-based matrix

factorization is applicable to the prediction of QoS data in the presence of unreliable user-provided QoS values. Extensive experiments are conducted with real-world Web service QoS data sets, and the experimental results show that our proposed approach outperforms other existing approaches.

If You want More Latest Topic With IEEE Paper & Abstract Please Download Project List and Send to us topics name we'll send you IEEE Paper, Abstract, PPT etc.