There are two streams of relevant research, targeting different privacy requirements. With the increasing prevalence of information networks, research on privacy-preserving network data publishing has received substantial attention recently. The experiments suggest that our proposed text representation outperforms the bag-of-lexical-n-grams, Latent Dirichlet Allocation, Latent Semantic Analysis, PVDM, PVDBOW, and word2vec representations. We evaluate the performance of our approach on the problems of authorship characterization and authorship verification with the Twitter, novel, and essay datasets. In particular, the proposed models allow topical, lexical, syntactical, and character-level feature vectors of each document to be extracted as stylometrics. In this paper, to mimic the human sentence composition process using a neural network approach, we propose to incorporate different categories of linguistic features into distributed representation of words in order to learn simultaneously the writing style representations based on unlabeled texts for authorship analysis. Consequently, the choice of feature set has been shown to be scenario- or dataset-dependent. However, most of the previous techniques critically depend on the manual feature engineering process. It is an essential process for various areas, such as cybercrime investigation, psycholinguistics, political socialization, etc. It extracts an author's identity and sociolinguistic characteristics based on the reflected writing styles in the text. The result suggests that our model outperforms the state-of-the-art on semantic preservation, authorship obfuscation, and stylometric transformation.Īuthorship analysis (AA) is the study of unveiling the hidden properties of authors from a body of exponentially exploding textual data. We evaluate the performance of the proposed model on the real-life peer reviews dataset and the Yelp review dataset. It does not require any conditioned labels or paralleled text data during training. Combined with a semantic embedding reward loss function and the exponential mechanism, our proposed auto-encoder can generate differentially-private sentences that have a close semantic and similar grammatical structure to the original text while removing personal traits of the writing style. We propose a novel text generation model for authorship anonymization. Recent studies on writing style anonymization can only output numeric vectors which are difficult for the recipients to interpret. However, personal writing style, as a strong indicator of the authorship, is often neglected. Most of privacy protection studies for textual data focus on removing explicit sensitive identifiers.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |