LexisNexis: Data Pipeline Engineer (Kafka I Hadoop I Spark)

full time tech

Job Details

LexisNexis is dedicated to advancing the rule of law around the world, which is vital for building peace and prosperity in all societies. To accomplish this noble goal, LexisNexis is transforming legal research globally. Our search index contains more than 81 billion richly annotated legal documents, creating an unprecedented legal knowledge graph. Our technology changes how lawyers practice law by providing fast and relevant access to their most difficult questions. The data engineer will be dedicated to solving the scalability issues of data ingestion pipelines for the search backend of LexisNexis, dramatically improving both velocity and consistency of ETLs from data lake to Solr. We are looking for someone who can bring their own perspective on how to solve a variety of internal and external opportunities. We expect this person to be versatile, display leadership qualities, and be enthusiastic to tackle new problems as we continue to push technology forward. Skillset & experience... required: Minimum 2 years of developing and maintaining ETL pipelines in Spark, Hadoop or KafkaMinimum 2 years of experience in Java or ScalaMinimum 2 years of scaling search server clusters to accommodate increasing traffic to meet specific performance requirementsExperience parsing data from XML documentsExperience in data modeling, design and manipulation, optimization, and best practicesComplete complex bug fixes.Minimum 5+ years of Software Engineering experienceBS Engineering/Computer Science or equivalent experiencePreferred skillset & experience:Advanced degree in Engineering/Computer Science.Expertise in containerization techniques such as Docker and Cloud orchestration platforms such as Kubernetes.Expertise in enterprise development languages such as Java or Scala.Expertise in test-driven development and maintenance including techniques for applying best practices for overall project benefit. (java and cucumber scripting)Software development process expert in applicable methodologies (e.g., Agile, Test Driven Development).5+ years of experience with AWS productsMust include intimate familiarity with SPARKFamiliarity with EC2, Redshift, RDS, and S3Should have hands-on experience with Athena, DynamoDB, API Gateway, Lambda, and EMRAWS Certification is a plusAdvanced Linux shell expertise. Must be able to analyze loads and tune job scheduling.Familiarity with Natural Language Processing (NLP)Expert Java, or SCALA programmerExpert with SQL languages (ANSI, Postgres, MySQL, etc.)Expertise in server backup and recovery

See all jobs

See something wrong with this listing?

Contact support

Data Pipeline Engineer (Kafka I Hadoop I Spark) - REMOTE (Eastern/Central Time)

LexisNexis

Posted 4 years ago

Job Details