About Me

Hello! My name is Adrian, and I am a Computer Scientist who likes traveling, mountain hiking, and the wilderness in general. I am currently living in Switzerland, and I have previously lived and worked in California, New York, and the UK (repeatedly in each).

Like many people in my field, I have been interested in Computer Science ever since I was in elementary school, and after graduating from my Bachelor's I decided to pursue a Master's, and later a PhD degree at the University of Cambridge, UK, where I was working on Citation Recommendation and reading about Natural Language Processing and Machine Learning.

Ideologically, I question the extent to which Natural Language Processing should be linguistically-motivated, and I am a firm adherent to data-driven approaches. However, my understanding grows and evolves with every book, paper, and experiment, and I'm always happy to be challenged on my beliefs.

In January 2016, I joined Google Switzerland to work on the Machine Learning side of automatically preventing abuse via Google accounts. In my spare time, I write and manage a team of columnists for a Computer Science magazine in New York City, I read, and I travel as much as possible.

Machine Learning for NLP Workshop @ ROSEdu Summer Workshops

Between June 15th and June 26th 2015, I designed, organized, and taught a summer school on the topic of Machine Learning for Natural Language Processing (ML4NLP) at the "Politehnica" University of Bucharest. The workshop comprised two tages: (A) a taught component which took place between June 15th and June 19th with two hours of taught lectures daily, followed by a one-hour hands-on practical session in which the participants solved a small problem in order to apply the notions learned, and (B) a weeklong hackathon which took place between June 23rd and June 26th, and during which a selected team of participants worked under my supervision to create a prototype of a post-OCR text regeneration tool.

In total, 13 applicants were selected to participate, with background ranging from High School level to University Lecturer. The feedback was overwhelmingly positive, with an overall average score of 4.6 out of 5 across the feedback forms.

The main page for the summer school (which includes links to all the presentation slides from the first week), can be publicly accessed here.

MPhil Research Dissertation

My MPhil Research Project was focused on using unsupervised and weakly-supervised Machine Learning to Natural Language Processing, under the supervision of Dr. Diarmuid Ó Séaghdha and Prof. Stephen Clark.

In my thesis, titled "Multilingual Generative Models for Selectional Preference Learning" (click thumbnail to download PDF), I investigated the use of Latent Dirichlet Allocation (LDA) for inducing plausibility estimates specific to the selectional preferences of Verb-Subject and Verb-DirectObject pairs in English (and verified state-of-the-art performance on three other European languages), and I tested the feasibility of Vector Space Alignment to transfer the estimates from resource-rich European languages like English (for which dependency parsers can be trained), to languages which do not benefit from a large body of research: German, Spanish, and Romanian.

Project Training Data

As part of my research, I produced clean, dependency-parsed corpora from the non-listy German, Spanish, and Romanian Wikipedia articles in CONLL format, which I am releasing below. Please read the "README.txt" file in each archive for details about how the text was extracted and processed, as well as for information regarding licensing.

Download German (ZIP archive covering 105,820 Wikipedia articles) 2.05 GB(view README.txt)
Download Spanish (ZIP archive covering 72,764 Wikipedia articles)1,37 GB(view README.txt)
Download Romanian (ZIP archive covering 9,668 Wikipedia articles) 0.19 GB(view README.txt)

Test Dataset for the Romanian Language

For evaluation purposes, I also created the first test dataset for selectional preference estimation in the Romanian language, by eliciting responses online from native raters. You can access the datasets in either PNG or CSV format.

The methodology of compiling the dataset is detailed in the main body of the dissertation. If you wish to publish results based on this dataset, please contact me by email first.

Project Code

The code developed as part of the project is hosted publicly on my BitBucket account.

The main project code, mostly written in Java to handle probability tables.(repository)
The code I wrote to strip away Textile markup and retrieve the plain text from the extracted Wikipedia articles.(repository)

Spam Detector Robustness Study

In 2013, I carried out a study (click thumbnail to download PDF) on the problem of email spam classification in order to verify the empirical claim that the performance of word-based classifiers as a function of the leading K tokens of an email saturates quickly with increasing values of K. In doing so, I built on the work of (Çıltık and Güngör, 2008), and tested Multinomial Naive Bayes (MNB), Support Vector Machine (SVM), Bayesian Logistic Regression (BLR), and Interpolated Language Models (ILM) approaches on the GenSpam corpus.

The study concluded that the MNB, SVM, and BLR methods are robust to decreasing text message length down to approximately 100 tokens, while ILM is robust down to approximately 20 tokens.


Bachelor Thesis

Upon graduation, I wrote my Bachelor Thesis on the topic of automated plagiarism detection in specialized corpora (academic papers in the field of Computer Science), under the supervision of Dr. Traian Rebedea and Prof. Răzvan Rughiniș.

In my thesis, titled "The AuthentiCop System for Plagiarism Detection in Specialized Corpora: Algorithms and Data Processing" (click thumbnail to download PDF), I described building an automated pipeline which can perform plagiarism detection based on the Encoplot algorithm (Grozea and Popescu, 2011). Together with Filip Buruiană, we wrote a paper on the topic of the thesis which got the best paper award at the Student Scientific Paper Session of the "Politehnica" Unversity of Bucharest, 2012.


Teaching Assistant @ Politehnica University of Bucharest

During my undergraduate years, I also acted as a Teaching Assistant for a number of courses at the Politehnica University of Bucharest: Operating Systems Usage, Computer Programming, Data Structures, and Algorithm Design. As a teaching assistant, I taught during laboratory classes, wrote laboratory exercises, came up with homework assignment and final exam questions, wrote tutorials, and managed course repositories. I am publicly releasing some of the work below.

The Algorithm Design Course

The source code for the official C++ solutions to the 12 coding laboratories is hosted publicly on my BitBucket account.(repository)

The repository also contains the source code I wrote for the automated grader and visualizer for the second assignment of the 2011-2012 academic year, which required students to write an engine that can beat our AI at the Connect4 board game. Two video tutorials explaining how to use the visualizer to test hand-written AI bots for Connect4 are given below (Romanian language only).

The Computer Programming Course

As first a TA, and later the head of the TA team for the Computer Programming Course, among my typical duties I adopted and maintained an open source online judge platform for the course laboratories (which is live here). The source code for the platform belongs to the popular Romanian competitive algorithm design online judge Infoarena. By December 2014, the judge reached 638 registered students.

A short video tutorial instructing the students how to use the website is given above (Romanian language only).

Google AI Challenge, Fall 2011

In Fall 2011, I participated in the Google AI Challenge together with a team of 3 colleagues from my undergraduate course. We designed and implemented an AI bot that would control a swarm of ants as they forage for food and wage war against opponent swarms on the map. In the final championship, our bot ranked 63 out of a total of 7,897 teams from all over the world. You can watch an online demo of our AI bot at work below.

The source code for the project is hosted publicly on my BitBucket account.(repository)

Chess Engine

In 2010, I led a team to design a Chess engine in C++, which at the end of the year won the tournament organized as part of my Algorithm Design class. We developed the engine to be compatible with the XBoard Chess platform from the GNU Foundation. The implementation was based on an Alpha-Beta prunned NegaMax algorithm, with added support for Quiescence Seach, custom heuristics, and a database of openings from famous chess championships to buy us time in the early stages of the game. We implemented the algorithms over our custom bit-wise representation of the gameboard, hence naming the engine "BitBoard". You can watch an online demo of the engine below.

The source code for the project is hosted publicly on my BitBucket account.(repository)

Crash Course in C++ for Java Programmers @ UPB

In 2011, upon becoming a Teaching Assistant for the Algorithm Design course at the Politehnica University of Bucharest, I promoted the use of C++ at the labs, and I wrote a Crash Course in C++ for Java Programmers handbook (click thumbnail to download PDF - Romanian language only) designed to integrate seamlessly with the OOP curriculum at the university. The hanbook proved successful with later generations as well, and was referenced externally, on websites such as www.itassistant.org.

Following its success, I was later invited in 2012 to teach a Crash Course in C++ workshop at THALES Systems Romania.

Crash Course in C++ Templates and the STL @ ROSEdu CDL

In 2010, a year after having graduated from the first edition of the ROSEdu Community Development Lab (CDL) myself, I taught a crash course in C++ Templates and the Standard Template Library (click thumbnail to download PDF slides - Romanian language only). The course was a success, which prompted me to return and teach it the following year as well.

Publications

Published Research/Academic Papers



“Automatic Plagiarism Detection System for Specialized Corpora.”, F. Buruiană, A. Scoică, T. Rebedea, R. Rughiniș, in CSCS 19th International Conference on Control Systems and Computer Science, pp 77-82, IEEE, 2013. PDF bibtex
 
“The Impact of Competitiveness in Open Source on Education Quality: The Romanian Open Source Education Community”, A. Scoică, 2nd Workshop on Education by Research and Competition, 2012. PDF

ACM XRDS International Grad Student Magazine


I have been on the editorial board of the ACM XRDS, the Association for Computing Machinery’s international grad student magazine since December 2012. My job as columnist of the Profile department was to select, interview and write stories about world-class computer scientists and tech leaders whose work is related to each issue’s topic. I was Issue Editor for the Fall 2014 Issue on Natural Language Processing, and got subsequently promoted to Departments Chief. Below is a list of my published articles:

Quarterly Issue and Year
Spring 2013Interview with Buddy Bland: High Performance Computing at the Oak Ridge National Laboratory
Summer 2013 Interview with Ken Museth: A Career Shaped by Creativity
Fall 2013 Interview with Jessica Staddon: Managing Google's privacy research
Winter 2013 Interview with Ori Inbar: Making Augmented Reality a Reality
Spring 2014 Interview with Benjamin Cichy: Writing Code to Run on Mars
Summer 2014 Interview with Peter Havelock: How Does the World's Largest IT Company Understand Diversity?
Fall 2014 Interview with Geoffrey Hinton: Unlocking the Language of the Brain
Winter 2014 Interview with Trevor van Mierlo: The Story of Building a Startup in Health Informatics
Spring 2015 Interview with Ian Pratt: Pioneering Security through Virtualization
Summer 2015 Interview with Sriram Kosuri: Never Mind the Cloud, Back Up Your Selfies to DNA
Fall 2015 Interview with Susumu Tachi: The Scientist Who Invented Telexistence
Winter 2015 Interview with Matthew Pryor: Using Tech to Manage Drought, from Australia to California
Spring 2016 Interview with Dennis Bormann: The Man Who Introduced Antarctica's Davis Station to 3-D Printing
Summer 2016 Interview with Sanny Gaddafi: Living at the Forefront of Indonesia's Tech Emancipation
Fall 2016 Interview with David Deutsch: Understanding Computation as a Consequence of Physics

Translation Work


In 2012, I worked with American author and economist Andrew Tobias to translate one of his books, entitled "The Best Little Boy in the World", into the Romanian language. The book was subsequently published online on Google Books, and is available to read for free at this link location.

Curriculum Vitae

Contact

The best and most reliable way to reach me is via e-mail. Please click on the small envelope icon to the bottom-left of the screen to get my address. I am usually quite responsive, and reply within 48 hours. If I take longer, it is probably because I am traveling and I do not have access to Internet.