top of page

Node.JS and hdf5....

Now that I know I want to process my data in a Lambda function in AWS (YES I LOVE LAMBDAS ON AWS!!!), I had one more question for ChatGPT: "can my lambda node js read hdf5 files?"

Here is an answer that saved me a lot of time.... I have enjoyed working in javascript since its interpreter is my browser.... glad I asked first!


Perfect! You're in a great position then — using Python in AWS Lambda to process the HDF5 files, and JavaScript in the browser to request only the relevant data is an ideal setup. Let's walk through a simple, working solution step-by-step:

✅ Your Architecture Overview:

  1. HDF5 files are stored in S3

  2. Python Lambda:

    • Reads the HDF5 file from S3

    • Extracts only the required data

    • Returns it via an API Gateway

  3. Browser (JavaScript):

    • Sends a request to your API Gateway

    • Receives a lightweight JSON response

    • Uses it for display/processing

🧠 Step 1: Create Python Lambda Function to Read HDF5

📦 Requirements:

You’ll need the h5py library. Since it’s a C-based package, you’ll either:

  • Use a Lambda Layer that includes h5py, or

  • Deploy your Lambda using a Docker container (which gives you full control)

🚀 We'll go with the Lambda Layer method first (simpler if you're not using Docker yet).

🧱 Step 2: Build a Lambda Layer with h5py

Here’s how you can build it on your local machine (you need Docker or a Linux environment for compatibility):

A. Create directory structure:

mkdir -p python
cd python
pip install h5py -t .
cd ..
zip -r h5py-layer.zip python
If you’re on Windows, use WSL or a Docker container for the right architecture (Amazon Linux compatible).

B. Upload as a Lambda Layer:

  • Go to AWS Console → Lambda → Layers → Create layer

  • Name: h5py-layer

  • Upload h5py-layer.zip

  • Compatible runtimes: Python 3.9 (or the version you’ll use)

  • Click Create

🧑‍💻 Step 3: Write the Python Lambda Function

import json
import boto3
import h5py
import io

def lambda_handler(event, context):
    # S3 parameters (could also come from event)
    bucket = 'your-bucket-name'
    key = 'your-file.h5'

    # Fetch file from S3
    s3 = boto3.client('s3')
    response = s3.get_object(Bucket=bucket, Key=key)
    file_content = response['Body'].read()

    # Open as HDF5 and read dataset
    with h5py.File(io.BytesIO(file_content), 'r') as hdf:
        dataset = hdf['your_dataset_name'][:]
    
    # Convert to Python list so it can be returned as JSON
    data_list = dataset.tolist()

    return {
        'statusCode': 200,
        'body': json.dumps({
            'data': data_list
        }),
        'headers': {
            'Content-Type': 'application/json',
            'Access-Control-Allow-Origin': '*'  # Allow from browser
        }
    }
Replace 'your_dataset_name' with the actual path inside your .h5 file.

🌐 Step 4: Set Up API Gateway to Call Lambda

  1. In AWS Console → API Gateway

  2. Create a new HTTP API

  3. Set up a route like /get-data

  4. Connect it to your Lambda

  5. Deploy and grab the API endpoint URL

🧪 Step 5: Call it from the Browser (JavaScript)

fetch('https://your-api-id.amazonaws.com/get-data')
  .then(response => response.json())
  .then(data => {
    console.log('HDF5 data:', data.data); // Use as needed
  })
  .catch(err => console.error('Error fetching data:', err));

🔐 Bonus Tips:

  • For larger HDF5 files, consider passing the filename or key in the API request (e.g., via query string) so Lambda can load different files.

  • For security, lock down the API Gateway with auth (e.g., IAM, Cognito, or API keys) depending on your app's needs.

✅ Summary:

Task

Tool

Read HDF5

Python + h5py

Host data

AWS Lambda

Store files

Amazon S3

Serve via API

API Gateway

Consume in browser

JavaScript (Fetch API)

If you'd prefer a Docker-based deployment (gives more control over Python libraries), or want help creating a dynamic API where the browser can choose which dataset to load — happy to walk through that next!

Recent Posts

See All
hold on.... what's Node.JS

I thought Node.JS was just javascript..... so I asked ChatGPT: "what is the difference between javascript and node.JS" And here is the...

 
 
 
Multiprocessing on Windows

If I can parallellize a process, I will!!! I use the python library concurrent futures. After setting it up the first time, it's a script...

 
 
 

Comments


© 2020 by Silvia Mazzoni, Silvia's Brainery, Santa Monica, CA

bottom of page