Overview

Whitepages, founded in 1997, helps you contact, research and verify people in your world. More than 30 million people per month use our people search engine to get in touch with extended friends and family, research backgrounds and verify that people are who they say. Whitepages identity verification is also used by leading companies including Jet Blue, Lego, and Intuit to prevent fraudulent transactions while delivering great online experiences.

We are looking for a Software Engineer to help our team improve and maintain our big data pipelines at scale. Our group is responsible for the Whitepages Identity Graph – linking people, businesses, locations, phone numbers, email addresses, and URLs. The graph is enormous (5 billion data elements and 150TB in size) and updating it is challenging both in complexity and scale – we process more than 1B data events per month with highly optimized Spark jobs and synthesize data from more than 20 different providers.

We use a wide variety of technologies:

• Programming in Scala
• Hive and Spark
• Redis, DynamoDB, ElasticSearch, and Solr
• AWS products:  EMR, S3
• We are heavily oriented toward Linux and open source software

In this role you will:

• Design and development of processing large data files as per product requirements using Spark framework
• Ensure code follows the design and insist on highest coding standards
• Maintain sufficiently generic yet simple and economical solution
• Follow and create best practices for clean code and architecture
• Mentor and develop team mates, lead by example on best practices, design and code reviews
• Manage a sense of urgency and risks on project timelines and proposes creative strategies for delivering constant business value
• Develop deep understanding of data, get a good sense of signal vs noise to help business with shape new products

In this role you will need:

• 3+ years of experience building complex ETLs, Data Warehousing or custom pipelines from multiple data sources, including proper monitoring, alerting, verification, and metrics in a commercial environment
• 1+ years of experience with the Spark ecosystem in production environment
• Deep understanding of MapReduce and “big data” tools like Hive or Spark
• AWS Cloud experience with EC2, S3, RDS, Lambda, and VPC in a big data environment
• Proven track record building multi-tenant scalable enterprise software in cloud
• Fundamentals around JVM and garbage collection optimization understanding.
• Bachelor’s degree in Computer Science or related area

Nice to Have experience:

• Deep understanding of Spark internals
• Experience with Databricks and related platform

Combine all this with a dynamic, can-do culture, and Whitepages is a pretty awesome place to work for folks who want to have impact.  We are a small team with a passion for what we do, and we keep our employees at the center of our mission.  We host weekly events, including catered lunches and happy hours, enjoy unlimited vacation, keep a fully stocked kitchen, and work in some great cities, with headquarters in downtown Seattle, and offices in New York City and Budapest, Hungary.

If this sounds like the kind of place you want to spend your days, then visit us at: http://about.whitepages.com/.  Whitepages Inc. prides itself on being an equal-opportunity employer.

Tagged as: , , , ,

About Whitepages

Whitepages, founded in 1997, helps you contact, research and verify people in your world. More than 30 million people per month use our people search engine to get in touch with extended friends and family, research backgrounds and verify that people are who they say. Whitepages identity verification is also used by leading companies including Jet Blue, Lego, and Intuit to prevent fraudulent transactions while delivering great online experiences.