|Position ID:||JHU-HLTCOE-SCALEIN [#1330]|
|Position Title:||SCALE Student Intern|
|Position Location:||Baltimore, Maryland 21211, United States [map]|
|Subject Areas:||Computer Engineering|
Human Language Technology
Natural Language Processing
|Appl Deadline:||2012/01/06 finished (posted 2011/12/01, finished 2012/03/05, listed until 2012/09/01)|
Human Language Technology Center of Excellence (HLTCOE)
Johns Hopkins University
Summer Camp in Advanced Language Exploration (SCALE)
JHU Summer Internships in Speech and Language Processing
APPLICATION DEADLINE: January 6, 2012
We are looking for outstanding graduate students for summer internships in speech and language processing. Interns will work one on one with a mentor as part of a larger workshop team. Research projects will be in the areas of: - Natural Language Processing and Understanding - Information extraction, knowledge distillation, semantics, sentiment, parsing, morphology - Machine Translation - Low-resource languages, large-scale training, phrase-based and syntax-based approaches - Speech Processing - Robust speech recognition and speaker identification (multiple languages, genres, and channels, limited resources), speech retrieval, language identification - Machine Learning - Large"Âscale learning, transfer learning, semi"Âsupervised learning, data mining
Prior experience is not necessarily required: we will get you interested! Good programming skills and ability to work within a team are required.
Internships will be available for a 10 week period June 4-August 10, 2012.
What to Expect: Johns Hopkins is a very busy place during the summer. In addition to the HLTCOE workshop (about 20 people), there are several CLSP summer workshops, as well as weekly seminar speakers. In addition to getting a lot of work done, you should expect to learn a lot about other people's research.
About Us: The HLTCOE is an independent research center that is part of Johns Hopkins University and located near its Homewood Campus. We work closely with the Center for Language and Speech Processing (CLSP), the Department of Computer Science, Electrical and Computer Engineering, and Applied Math and Statistics.
The HLTCOE is about 10 minutes from Baltimore's beautiful Inner Harbor and its many attractions.
Your Accommodations & Travel: Interns will be expected to find their own housing. A generous stipend will be offered to pay for housing, transportation, and food. Your transportation to and from the workshop will be paid for by the HLTCOE.
Salary: These are (well) paid internships! More information will be available during the application process.
Application requirements: - Curriculum Vita (please include a list of languages you speak. After all, we do work on language!)
- Letter of recommendation from your adviser
Citizenship: Applicants will be required to obtain a US security clearance, which requires US citizenship. If you do not already have a clearance, we will work with you to obtain one.
DESCRIPTION OF TOPIC: Vertex Nomination on Attributed Graphs: If I know of a few "interesting"Â people, how can human language technology and graph theory help me find other "interesting"Â people? If I know of a few people committing a crime (e.g. fraud), how can I determine who their co-conspirators are?
If I can infer basic properties of an individual, does this help? Given a set of actors deemed "interesting"Â, we aim to find other actors who are similarly "interesting"Â. We are given a collection of informal communications (written and spoken) and a corresponding communications graph. In this graph, each vertex represents either a communication handle, or a communication (e.g., email), and each edge connects between a handle and a communication that that handle participated in. Our goals are three-fold: (1) posit a set of actors that use one or more handles; (2) associate author attributes with actors, based on communication content; and (3) nominate an actor as "interesting", based on other actors already labeled "interesting".
For an illustrative example, the email corpus of a hypothetical corporation consists of communications between actors, a few of which are committing fraud. Some of their fraudulent activity is captured in emails between them, along with many other innocuous emails (both between the fraudsters and between the other employees in the company). Some accounts may be used by multiple actors, such as an administrative account used by multiple administrators. Some actors may use multiple accounts, such as an administrator that uses the administrative account as well as their individual email address. We are to assign basic properties to the actors based on their language use.
We are then given the identities of a few fraudster vertices and asked to nominate one other vertex in the graph as likely representing another actor committing fraud.