Hello! My name is Adrian, and I am a Computer Scientist who likes traveling, mountain hiking, and the wilderness in general. I am currently living in Switzerland, and I have previously lived and worked in California, New York, and the UK (repeatedly in each).
Like many people in my field, I have been interested in Computer Science ever since I was in elementary school, and after graduating from my Bachelor's I decided to pursue a Master's, and later a PhD degree at the University of Cambridge, UK, where I was working on Citation Recommendation and reading about Natural Language Processing and Machine Learning.
Ideologically, I question the extent to which Natural Language Processing should be linguistically-motivated, and I am a firm adherent to data-driven approaches. However, my understanding grows and evolves with every book, paper, and experiment, and I'm always happy to be challenged on my beliefs.
In January 2016, I joined Google Switzerland to work on the Machine Learning side of automatically preventing abuse via Google accounts. In my spare time, I write and manage a team of columnists for a Computer Science magazine in New York City, I read, and I travel as much as possible.
Between June 15th and June 26th 2015, I designed, organized, and taught a summer school on the topic of Machine Learning for Natural Language Processing (ML4NLP) at the "Politehnica" University of Bucharest. The workshop comprised two tages: (A) a taught component which took place between June 15th and June 19th with two hours of taught lectures daily, followed by a one-hour hands-on practical session in which the participants solved a small problem in order to apply the notions learned, and (B) a weeklong hackathon which took place between June 23rd and June 26th, and during which a selected team of participants worked under my supervision to create a prototype of a post-OCR text regeneration tool.
In total, 13 applicants were selected to participate, with background ranging from High School level to University Lecturer. The feedback was overwhelmingly positive, with an overall average score of 4.6 out of 5 across the feedback forms.
The main page for the summer school (which includes links to all the presentation slides from the first week), can be publicly accessed here.
My MPhil Research Project was focused on using unsupervised and weakly-supervised Machine Learning to Natural Language Processing, under the supervision of Dr. Diarmuid Ó Séaghdha and Prof. Stephen Clark.
In my thesis, titled "Multilingual Generative Models for Selectional Preference Learning" (click thumbnail to download PDF), I investigated the use of Latent Dirichlet Allocation (LDA) for inducing plausibility estimates specific to the selectional preferences of Verb-Subject and Verb-DirectObject pairs in English (and verified state-of-the-art performance on three other European languages), and I tested the feasibility of Vector Space Alignment to transfer the estimates from resource-rich European languages like English (for which dependency parsers can be trained), to languages which do not benefit from a large body of research: German, Spanish, and Romanian.
As part of my research, I produced clean, dependency-parsed corpora from the non-listy German, Spanish, and Romanian Wikipedia articles in CONLL format, which I am releasing below. Please read the "README.txt" file in each archive for details about how the text was extracted and processed, as well as for information regarding licensing.
For evaluation purposes, I also created the first test dataset for selectional preference estimation in the Romanian language, by eliciting responses online from native raters. You can access the datasets in either PNG or CSV format.
The methodology of compiling the dataset is detailed in the main body of the dissertation. If you wish to publish results based on this dataset, please contact me by email first.
The code developed as part of the project is hosted publicly on my BitBucket account.
|The main project code, mostly written in Java to handle probability tables.||(repository)|
|The code I wrote to strip away Textile markup and retrieve the plain text from the extracted Wikipedia articles.||(repository)|
In 2013, I carried out a study (click thumbnail to download PDF) on the problem of email spam classification in order to verify the empirical claim that the performance of word-based classifiers as a function of the leading K tokens of an email saturates quickly with increasing values of K. In doing so, I built on the work of (Çıltık and Güngör, 2008), and tested Multinomial Naive Bayes (MNB), Support Vector Machine (SVM), Bayesian Logistic Regression (BLR), and Interpolated Language Models (ILM) approaches on the GenSpam corpus.
The study concluded that the MNB, SVM, and BLR methods are robust to decreasing text message length down to approximately 100 tokens, while ILM is robust down to approximately 20 tokens.
Upon graduation, I wrote my Bachelor Thesis on the topic of automated plagiarism detection in specialized corpora (academic papers in the field of Computer Science), under the supervision of Dr. Traian Rebedea and Prof. Răzvan Rughiniș.
In my thesis, titled "The AuthentiCop System for Plagiarism Detection in Specialized Corpora: Algorithms and Data Processing" (click thumbnail to download PDF), I described building an automated pipeline which can perform plagiarism detection based on the Encoplot algorithm (Grozea and Popescu, 2011). Together with Filip Buruiană, we wrote a paper on the topic of the thesis which got the best paper award at the Student Scientific Paper Session of the "Politehnica" Unversity of Bucharest, 2012.
During my undergraduate years, I also acted as a Teaching Assistant for a number of courses at the Politehnica University of Bucharest: Operating Systems Usage, Computer Programming, Data Structures, and Algorithm Design. As a teaching assistant, I taught during laboratory classes, wrote laboratory exercises, came up with homework assignment and final exam questions, wrote tutorials, and managed course repositories. I am publicly releasing some of the work below.
|The source code for the official C++ solutions to the 12 coding laboratories is hosted publicly on my BitBucket account.||(repository)|
The repository also contains the source code I wrote for the automated grader and visualizer for the second assignment of the 2011-2012 academic year, which required students to write an engine that can beat our AI at the Connect4 board game. Two video tutorials explaining how to use the visualizer to test hand-written AI bots for Connect4 are given below (Romanian language only).
As first a TA, and later the head of the TA team for the Computer Programming Course, among my typical duties I adopted and maintained an open source online judge platform for the course laboratories (which is live here). The source code for the platform belongs to the popular Romanian competitive algorithm design online judge Infoarena. By December 2014, the judge reached 638 registered students.
A short video tutorial instructing the students how to use the website is given above (Romanian language only).
In Fall 2011, I participated in the Google AI Challenge together with a team of 3 colleagues from my undergraduate course. We designed and implemented an AI bot that would control a swarm of ants as they forage for food and wage war against opponent swarms on the map. In the final championship, our bot ranked 63 out of a total of 7,897 teams from all over the world. You can watch an online demo of our AI bot at work below.
|The source code for the project is hosted publicly on my BitBucket account.||(repository)|
In 2010, I led a team to design a Chess engine in C++, which at the end of the year won the tournament organized as part of my Algorithm Design class. We developed the engine to be compatible with the XBoard Chess platform from the GNU Foundation. The implementation was based on an Alpha-Beta prunned NegaMax algorithm, with added support for Quiescence Seach, custom heuristics, and a database of openings from famous chess championships to buy us time in the early stages of the game. We implemented the algorithms over our custom bit-wise representation of the gameboard, hence naming the engine "BitBoard". You can watch an online demo of the engine below.
|The source code for the project is hosted publicly on my BitBucket account.||(repository)|
In 2011, upon becoming a Teaching Assistant for the Algorithm Design course at the Politehnica University of Bucharest, I promoted the use of C++ at the labs, and I wrote a Crash Course in C++ for Java Programmers handbook (click thumbnail to download PDF - Romanian language only) designed to integrate seamlessly with the OOP curriculum at the university. The hanbook proved successful with later generations as well, and was referenced externally, on websites such as www.itassistant.org.
Following its success, I was later invited in 2012 to teach a Crash Course in C++ workshop at THALES Systems Romania.
In 2010, a year after having graduated from the first edition of the ROSEdu Community Development Lab (CDL) myself, I taught a crash course in C++ Templates and the Standard Template Library (click thumbnail to download PDF slides - Romanian language only). The course was a success, which prompted me to return and teach it the following year as well.
I have been on the editorial board of the ACM XRDS, the Association for Computing Machinery’s international grad student magazine since December 2012. My job as columnist of the Profile department was to select, interview and write stories about world-class computer scientists and tech leaders whose work is related to each issue’s topic. I was Issue Editor for the Fall 2014 Issue on Natural Language Processing, and got subsequently promoted to Departments Chief. Below is a list of my published articles: In 2012, I worked with American author and economist Andrew Tobias to translate one of his books, entitled "The Best Little Boy in the World", into the Romanian language. The book was subsequently published online on Google Books, and is available to read for free at this link location.
In the summer of 2017, after having collaborated with UPES ACM Student Chapter in Dehradun, Uttarakhand, in the North of India, I was featured in the Summer 2017 Issue of "VOID" magazine, for which I gave a biographical interview. The best and most reliable way to reach me is via e-mail. Please click on the small envelope icon to the bottom-left of the screen to get my address. I am usually quite responsive, and reply within 48 hours. If I take longer, it is probably because I am traveling and I do not have access to Internet.
Published Research/Academic Papers
“Automatic Plagiarism Detection System for Specialized Corpora.”, F. Buruiană, A. Scoică, T. Rebedea, R. Rughiniș, in CSCS 19th International Conference on Control Systems and Computer Science, pp 77-82, IEEE, 2013. PDF bibtex “The Impact of Competitiveness in Open Source on Education Quality: The Romanian Open Source Education Community”, A. Scoică, 2nd Workshop on Education by Research and Competition, 2012. PDF
I have been on the editorial board of the ACM XRDS, the Association for Computing Machinery’s international grad student magazine since December 2012. My job as columnist of the Profile department was to select, interview and write stories about world-class computer scientists and tech leaders whose work is related to each issue’s topic. I was Issue Editor for the Fall 2014 Issue on Natural Language Processing, and got subsequently promoted to Departments Chief. Below is a list of my published articles:
In 2012, I worked with American author and economist Andrew Tobias to translate one of his books, entitled "The Best Little Boy in the World", into the Romanian language. The book was subsequently published online on Google Books, and is available to read for free at this link location.
In the summer of 2017, after having collaborated with UPES ACM Student Chapter in Dehradun, Uttarakhand, in the North of India, I was featured in the Summer 2017 Issue of "VOID" magazine, for which I gave a biographical interview.
The best and most reliable way to reach me is via e-mail. Please click on the small envelope icon to the bottom-left of the screen to get my address. I am usually quite responsive, and reply within 48 hours. If I take longer, it is probably because I am traveling and I do not have access to Internet.