Human Language Technology Center of Excellence, Johns Hopkins University

Position Title: SCALE Student Intern
Position Location:Baltimore, Maryland 21211, United States [map]
Subject Areas: Computer Science
Computer Engineering
Human Language Technology
Machine Translation
Natural Language Processing
Electrical Engineering
2012/01/06
Position Description:    

Human Language Technology Center of Excellence (HLTCOE)

Johns Hopkins University

Summer Camp in Advanced Language Exploration (SCALE)

JHU Summer Internships in Speech and Language Processing


We are looking for outstanding graduate students for summer internships in speech and language processing. Interns will work one on one with a mentor as part of a larger workshop team. Research projects will be in the areas of: - Natural Language Processing and Understanding - Information extraction, knowledge distillation, semantics, sentiment, parsing, morphology - Machine Translation - Low-resource languages, large-scale training, phrase-based and syntax-based approaches - Speech Processing - Robust speech recognition and speaker identification (multiple languages, genres, and channels, limited resources), speech retrieval, language identification - Machine Learning - Large"scale learning, transfer learning, semi"supervised learning, data mining

Prior experience is not necessarily required: we will get you interested! Good programming skills and ability to work within a team are required.

Internships will be available for a 10 week period June 4-August 10, 2012.

What to Expect: Johns Hopkins is a very busy place during the summer. In addition to the HLTCOE workshop (about 20 people), there are several CLSP summer workshops, as well as weekly seminar speakers. In addition to getting a lot of work done, you should expect to learn a lot about other people's research.

About Us: The HLTCOE is an independent research center that is part of Johns Hopkins University and located near its Homewood Campus. We work closely with the Center for Language and Speech Processing (CLSP), the Department of Computer Science, Electrical and Computer Engineering, and Applied Math and Statistics.

The HLTCOE is about 10 minutes from Baltimore's beautiful Inner Harbor and its many attractions.

Your Accommodations & Travel: Interns will be expected to find their own housing. A generous stipend will be offered to pay for housing, transportation, and food. Your transportation to and from the workshop will be paid for by the HLTCOE.

Salary: These are (well) paid internships! More information will be available during the application process.

Application requirements: - Curriculum Vita (please include a list of languages you speak. After all, we do work on language!)

- Letter of recommendation from your adviser

Citizenship: Applicants will be required to obtain a US security clearance, which requires US citizenship. If you do not already have a clearance, we will work with you to obtain one.

DESCRIPTION OF TOPIC: Vertex Nomination on Attributed Graphs: If I know of a few "interesting" people, how can human language technology and graph theory help me find other "interesting" people? If I know of a few people committing a crime (e.g. fraud), how can I determine who their co-conspirators are?

If I can infer basic properties of an individual, does this help? Given a set of actors deemed "interesting", we aim to find other actors who are similarly "interesting". We are given a collection of informal communications (written and spoken) and a corresponding communications graph. In this graph, each vertex represents either a communication handle, or a communication (e.g., email), and each edge connects between a handle and a communication that that handle participated in. Our goals are three-fold: (1) posit a set of actors that use one or more handles; (2) associate author attributes with actors, based on communication content; and (3) nominate an actor as "interesting", based on other actors already labeled "interesting".

For an illustrative example, the email corpus of a hypothetical corporation consists of communications between actors, a few of which are committing fraud. Some of their fraudulent activity is captured in emails between them, along with many other innocuous emails (both between the fraudsters and between the other employees in the company). Some accounts may be used by multiple actors, such as an administrative account used by multiple administrators. Some actors may use multiple accounts, such as an administrator that uses the administrative account as well as their individual email address. We are to assign basic properties to the actors based on their language use.

We are then given the identities of a few fraudster vertices and asked to nominate one other vertex in the graph as likely representing another actor committing fraud.

Further Info:
810 Wyman Park Drive
Stieff Building
Baltimore, Maryland 21211

