Job Header
- Job Title: Data Engineer
- Employment Type: Full-time
- Location: Fully Remote
- Working Time Zone: EST
Role Summary
The Data Engineer designs, builds, and maintains reliable data pipelines and platforms that power analytics, reporting, and data products. You will partner with engineering, analytics, and business stakeholders to ensure data is accurate, accessible, well-modeled, and secure.
Key Responsibilities
- Design, build, and maintain scalable batch and or streaming data pipelines (ETL or ELT) from source systems into data lakes and warehouses.
- Develop and maintain data models and curated datasets that enable trusted reporting and analytics, with clear definitions and consistent business logic.
- Implement data quality checks, monitoring, and alerting to ensure reliability and quickly diagnose and resolve pipeline and data issues.
- Collaborate with cross-functional stakeholders to gather requirements, plan deliveries, and continuously improve data platform performance, cost, and usability.
Required Qualifications
- Education: Bachelor’s degree in Computer Science, Engineering, Information Systems, or equivalent experience.
- Experience: 3+ years in data engineering, analytics engineering, or backend engineering with strong data pipeline ownership.
- Core Skills:
- Strong SQL and experience building datasets in a warehouse or lakehouse environment.
- Proficiency in Python (or similar language) for data processing, automation, and pipeline development.
- Experience with orchestration and pipeline operations (scheduling, retries, SLAs, incident troubleshooting).
- Strong fundamentals in data modeling, data quality, and debugging complex data issues.
- Certifications or Licenses: Not required.
- Language: English fluency required.
Preferred Qualifications
- Experience with modern data stacks and tools such as Airflow, dbt, Spark, Databricks, or similar technologies.
- Experience with cloud data platforms (AWS, GCP, or Azure) and common services for storage, compute, and warehousing.
- Familiarity with streaming technologies (Kafka, Kinesis, Pub/Sub) and event-driven data patterns.
- Cloud or data engineering certifications (optional), and or strong evidence of advanced platform ownership and delivery.
Skills and Competencies
- Hard Skills: data pipeline engineering, data modeling, SQL optimization, orchestration, data quality and observability.
- Soft Skills: communication, ownership and accountability, structured problem-solving, collaboration.
- Leadership: influence without authority, mentoring where applicable, proactive stakeholder management.
- Behavioral Expectations: integrity and confidentiality, attention to detail, bias for action, client focus.
Tools, Tech, and Methods
- Primary Tools: SQL, Python, Git and code review, workflow orchestration, and a cloud data warehouse or lakehouse platform.
- Tech Stack (optional): cloud services (AWS, GCP, Azure), storage (S3, GCS), warehouses (Snowflake, BigQuery, Redshift), transformation (dbt), compute (Spark, Databricks), streaming (Kafka), IaC (Terraform), CI and CD tooling.
- Methodologies: Agile delivery, DataOps practices, automated testing for pipelines, monitoring and alerting, performance and cost optimization.
- Documentation: maintain clear data contracts, model definitions, pipeline runbooks, and operational notes in shared documentation (Notion, Confluence, Google Docs).
- Security or Privacy Tooling: follow access controls, least-privilege permissions, secrets management, encryption, and secure data-sharing practices.
Security and Confidentiality
This role may involve access to confidential and sensitive data. Candidates must:
- Handle sensitive information with extreme care.
- Follow policies for data access, storage, and sharing.
- Maintain strict confidentiality and professional integrity in all situations.
Application Instructions
- Email your resume to: careers [at] valuenode [dot] com.
- Optional: include your LinkedIn profile and a short note summarizing data pipeline ownership, warehouse or lakehouse experience, and the tools you have used (SQL, Python, orchestration, cloud).