Apple Patches Everything: March 31st 2025 Edition, (Mon, Mar 31st)

This post was originally published on this site

Today, Apple released updates across all its products: iOS, iPadOS, macOS, tvOS, visionOS, Safari, and XCode. WatchOS was interestingly missing from the patch lineup. This is a feature update for the operating systems, but we get patches for 145 different vulnerabilities in addition to new features. This update includes a patch for CVE-2025-24200 and CVE-2025-24201, two already exploited iOS vulnerabilities, for older iOS/iPadOS versions. Current versions received this patch a few weeks ago.

Accelerate operational analytics with Amazon Q Developer in Amazon OpenSearch Service

This post was originally published on this site

Today, I’m happy to announce Amazon Q Developer support for Amazon OpenSearch Service, providing AI-assisted capabilities to help you investigate and visualize operational data. Amazon Q Developer enhances the OpenSearch Service experience by reducing the learning curve for query languages, visualization tools, and alerting features. The new capabilities complement existing dashboards and visualizations by enabling natural language exploration and pattern detection. After incidents, you can rapidly create additional visualizations to strengthen your monitoring infrastructure. This enhanced workflow accelerates incident resolution and optimizes engineering resource usage, helping you focus more time on innovation rather than troubleshooting.

Amazon Q Developer in Amazon OpenSearch Service improves operational analytics by integrating natural language exploration and generative AI capabilities directly into OpenSearch workflows. During incident response, you can now quickly gain context on alerts and log data, leading to faster analysis and resolution times. When alert monitors trigger, Amazon Q Developer provides summaries and insights directly in the alerts interface, helping you understand the situation quickly without waiting for specialists or consulting documentation. From there, you can use Amazon Q Developer to explore the underlying data, build visualizations using natural language, and identify patterns to determine root causes. For example, you can create visualizations that break down errors by dimensions such as Region, data center, or endpoint. Additionally, Amazon Q Developer assists with dashboard configuration and recommends anomaly detectors for proactive alerting, improving both initial monitoring setup and troubleshooting efficiency.

Get started with Amazon Q Developer in OpenSearch Service
To get started, I go to my OpenSearch user interface and sign in. From the home page, I choose a workspace to test Amazon Q Developer in OpenSearch Service. For this demonstration, I use a preconfigured environment with the sample logs dataset available on the user interface.

This feature is on by default through the Amazon Q Developer Free tier, which is also on by default. You can disable the feature by unselecting the Enable natural language query generation checkbox under the Artificial Intelligence (AI) and Machine Learning (ML) section during domain creation or by editing the cluster configuration in console.

In OpenSearch Dashboards, I navigate to Discover from the left navigation pane. To use natural language to explore the data, I switch to PPL language in order to show the prompt box.

I choose the Amazon Q icon in the main navigation bar to open the Amazon Q panel. You can use this panel to create recommended anomaly detectors to drive alerting and use natural language to generate visualization.

I enter the following prompt in the Ask a natural language question text box:

Show me a breakdown of HTTP response codes for the last 24 hours

When results appear, Amazon Q automatically generates a summary of these results. You can control the summary display using the Show result summarization option under the Amazon Q panel to hide or show the summary. You can use the thumbs up or thumbs down buttons to provide feedback, and you can copy the summary to your clipboard using the copy button.

Other capabilities of Amazon Q Developer in OpenSearch Service are generating visualizations directly from natural language descriptions, providing conversational assistance for OpenSearch related queries, providing AI-generated summaries and insights for your OpenSearch alerts, and analyzing your data, and suggesting appropriate anomaly detectors.

Let’s look into how to generate visualizations directly from natural language descriptions. I choose Generate visualization from Amazon Q panel. I enter Create a bar chart showing the number of requests by HTTP status code in the input field and choose generate.

To refine the visualization, you can choose Edit visual and add style instructions such as Show me a pie chart or Use a light gray background with a white grid.

Now available
You can now use Amazon Q Developer in OpenSearch Service to reduce mean time to resolution, enable more self-service troubleshooting, and help teams extract greater value from your observability data.

The service is available today in US East (N. Virginia), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (London), Europe (Paris), and South America (São Paulo) AWS Regions.

To learn more, visit the Amazon Q Developer documentation and start using Amazon Q Developer in your OpenSearch Service domain today.

— Esra


How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

Amazon API Gateway now supports dual-stack (IPv4 and IPv6) endpoints

This post was originally published on this site

Today, we are launching IPv6 support for Amazon API Gateway across all endpoint types, custom domains, and management APIs, in all commercial and AWS GovCloud (US) Regions. You can now configure REST, HTTP, and WebSocket APIs, and custom domains, to accept calls from IPv6 clients alongside the existing IPv4 support. You can also call API Gateway management APIs from dual-stack (IPv6 and IPv4) clients. As organizations globally confront growing IPv4 address scarcity and increasing costs, implementing IPv6 becomes critical for future-proofing network infrastructure. This dual-stack approach helps organizations maintain future network compatibility and expand global reach. To learn more about dualstack in the Amazon Web Services (AWS) environment, see the IPv6 on AWS Documentation.

Creating new dual-stack resources

This post focuses on two ways to create an API or a domain name with a dualstack IP address type: AWS Management Console and AWS Cloud Development Kit (CDK).

AWS Console

When creating a new API or domain name in the console, select IPv4 only or dualstack (IPv4 and IPv6) for the IP address type.

As shown in the following image, you can select the dualstack option when creating a new REST API.
For custom domain names, you can similarly configure dualstack as shown in the next image.

If you need to revert to IPv4-only for any reason, you can modify the IP address type setting, with no need to redeploy your API for the update to take effect.

REST APIs of all endpoint types (EDGE, REGIONAL and PRIVATE) support dualstack. Private REST APIs only support dualstack configuration.

AWS CDK

With AWS CDK, start by configuring a dual-stack REST API and domain name.

const api = new apigateway.RestApi(this, "Api", {
  restApiName: "MyDualStackAPI",
  endpointConfiguration: {ipAddressType: "dualstack"}
});

const domain_name = new apigateway.DomainName(this, "DomainName", {
  regionalCertificateArn: 'arn:aws:acm:us-east-1:111122223333:certificate/a1b2c3d4-5678-90ab',
  domainName: 'dualstack.example.com',
  endpointConfiguration: {
    types: ['Regional'],
    ipAddressType: 'dualstack'
  },
  securityPolicy: 'TLS_1_2'
});

const basepathmapping = new apigateway.BasePathMapping(this, "BasePathMapping", {
  domainName: domain_name,
  restApi: api
});

IPv6 Source IP and authorization

When your API begins receiving IPv6 traffic, client source IPs will be in IPv6 format. If you use resource policies, Lambda authorizers, or AWS Identity and Access Management (IAM) policies that reference source IP addresses, make sure they’re updated to accommodate IPv6 address formats.

For example, to permit traffic from a specific IPv6 range in a resource policy.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": "*",
      "Action": "execute-api:Invoke",
      "Resource": "execute-api:stage-name/*",
      "Condition": {
        "IpAddress": {
          "aws:SourceIp": [
            "192.0.2.0/24",
            "2001:db8:1234::/48"
          ]
        }
      }
    }
  ]
}

Summary

API Gateway dual-stack support helps manage IPv4 address scarcity and costs, comply with government and industry mandates, and prepare for the future of networking. The dualstack implementation provides a smooth transition path by supporting both IPv4 and IPv6 clients simultaneously.

To get started with API Gateway dual-stack support, visit the Amazon API Gateway documentation. You can configure dualstack for new APIs or update existing APIs with minimal configuration changes.

Betty

Special thanks to Ellie Frank (elliesf), Anjali Gola (anjaligl), and Pranika Kakkar (pranika) for providing resources, answering questions, and offering valuable feedback during the writing process. This blog post was made possible through the collaborative support of the service and product management teams.


How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

AWS Weekly Roundup: Amazon Bedrock, Amazon QuickSight, AWS Amplify, and more (March 31, 2025)

This post was originally published on this site

It’s AWS Summit season! Free events are now rolling out worldwide, bringing our cloud computing community together to connect, collaborate, and learn. Whether you prefer joining us online or in-person, these gatherings offer valuable opportunities to expand your AWS knowledge. I’ll be attending the AWS Amsterdam Summit and would love to meet you—if you’re planning to be there, please stop by to say hello! Visit the AWS Summit website today to find events in your area, sign up for registration alerts, and reserve your spot at an AWS Summit near you.

Speaking of AWS news, let’s look at last week’s new announcements.

Last week’s launches
Here are the launches that got my attention.

AWS WAF integration with AWS Amplify Hosting now generally available – You can now directly attach AWS WAF to your AWS Amplify applications through a one-click integration in the Amplify console or using infrastructure as code (IaC). This integration provides access to the full range of AWS WAF capabilities, including managed rules that protect against common web exploits like SQL injection and cross-site scripting (XSS). You can also create custom rules based on your application needs, implement rate-based rules to protect against distributed denial of service (DDoS) attacks by limiting request rates from IP addresses, and configure geo-blocking to restrict access from specific countries. Firewall support is available in all AWS Regions in which Amplify Hosting operates.

Amazon Bedrock Custom Model Import introduces real-time cost transparency – If you’re using Amazon Bedrock Custom Model Import to run your customized foundation models (FMs), you can now access full transparency into compute resources and calculate inference costs in real time. Before model invocation, you can view the minimum compute resources (custom model units or CMUs) required through both the Amazon Bedrock console and Amazon Bedrock APIs. As models scale to handle increased traffic, Amazon CloudWatch metrics provide real-time visibility into total CMUs used, enabling better cost control through near-instant visibility. This helps you make on-the-fly model configuration changes to optimize costs. The feature is available in all Regions where Amazon Bedrock Custom Model Import is supported, with additional details available in Calculate the cost of running a custom model in the Amazon Bedrock User Guide.

Amazon Bedrock Knowledge Bases now supports Amazon OpenSearch Managed Cluster for vector storageAmazon Bedrock Knowledge Bases securely connects FMs to company data sources for Retrieval Augmented Generation (RAG), delivering more relevant and accurate responses. With this launch, you can use Amazon OpenSearch Managed Cluster as a vector database while using the full suite of Amazon Bedrock Knowledge Bases features. This integration expands the list of supported vector databases, which already includes Amazon OpenSearch Serverless, Amazon Aurora, Amazon Neptune Analytics, Pinecone, MongoDB Atlas, and Redis. The native integration with vector databases helps mitigate the need to build custom data source integrations. This feature is now generally available in all existing Amazon Bedrock Knowledge Bases and OpenSearch Service Regions.

Amazon Bedrock Guardrails announces the general availability of industry-leading image content filters – This new capability offers industry-leading text and image content safeguards that help you block up to 88% of harmful multimodal content without building custom safeguards or relying on error-prone manual content moderation. Image content filters can be applied across all categories within the content filter policy including hate, insults, sexual, violence, misconduct, and prompt attacks. Amazon Bedrock Guardrails provides configurable safeguards to detect and block harmful content and prompt attacks, define topics to deny and disallow specific topics, redact personally identifiable information (PII) such as personal data, and block specific words. It also provides contextual grounding checks to detect and block model hallucinations and to identify the relevance of model responses and claims, and to identify, correct, and explain factual claims in model responses using Automated Reasoning checks. This capability is generally available in the US East (N. Virginia), US West (Oregon), Europe (Frankfurt), and Asia Pacific (Tokyo) Regions. To learn more, visit Amazon Bedrock Guardrails image content filters provide industry-leading safeguards in the AWS Machine Learning Blog and Stop harmful content in models using Amazon Bedrock Guardrails in the Amazon Bedrock User Guide.

Scenarios capability now generally available for Amazon Q in QuickSight – This capability guides you through data analysis by uncovering hidden trends, making recommendations for your business, and intelligently suggesting next steps for deeper exploration using natural language interactions. Now you can explore past trends, forecast future scenarios, and model solutions without needing specialized skill, analyst support, or manual manipulation of data in spreadsheets. With its intuitive interface and step-by-step guidance, the scenarios capability of Amazon Q in QuickSight helps you perform complex data analysis up to 10x faster than spreadsheets. Whether you’re optimizing marketing budgets, streamlining supply chains, or analyzing investments, Amazon Q makes advanced data analysis accessible so you can make data-driven decisions across your organization. This capability is accessible from any Amazon QuickSight dashboard, so you can move seamlessly from visualizing data to asking what-if questions and comparing alternatives. Previous analyses can be easily modified, extended, and reused, helping you quickly adapt to changing business needs.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

We launched existing services and instance types in additional Regions:

Other AWS events
Check your calendar and sign up for upcoming AWS events.

AWS GenAI Lofts are collaborative spaces and immersive experiences that showcase AWS expertise in cloud computing and AI. They provide startups and developers with hands-on access to AI products and services, exclusive sessions with industry leaders, and valuable networking opportunities with investors and peers. Find a GenAI Loft location near you and don’t forget to register.

Browse all upcoming AWS led in-person and virtual events here.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Esra

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!


How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

Apache Camel Exploit Attempt by Vulnerability Scan (CVE-2025-27636, CVE-2025-29891), (Mon, Mar 31st)

This post was originally published on this site

About three weeks ago, Apache patched two vulnerabilities in Apache Camel. The two vulnerabilities (CVE-2025-27636 and CVE-2025-29891) may lead to remote code execution, but not in the default configuration. The vulnerability is caused by Apache Camel using case-sensitive filters to restrict which headers may be used. However HTTP headers are not case-sensitive, and an attacker may trivially bypass the filter.

A Tale of Two Phishing Sites, (Fri, Mar 28th)

This post was originally published on this site

In phishing and in malspam, as in any other field, one can see certain trends develop over time. For obvious reasons, most threat actors like to use techniques and approaches that are novel and, thus, more effective. This commonly leads to adoption of the same techniques and technologies by multiple threat actors at the same time, which applies even to the use of the same phishing kits. Still, the same kit may end up looking completely different in the hands of different actors, as the following example shows.

Since our main “handler” e-mail address has been publicly listed for years on the ISC website, it has been scraped countless times by various bots and concurrently added to many address lists used in phishing campaigns. We therefore receive quite a lot of different phishing and malspam samples on it. Two of these caught my attention yesterday – not because of the content of the messages by themselves (both were run of the mill phishing messages using usual lures – an “almost full mailbox” and an “expiring domain registration”), but because of the websites they linked to.

Links from both messages led to legitimate domains that had clearly been compromised. The credential-stealing pages were nearly identical in both cases, indicating that the same phishing kit was the basis for both…

It soon turned out that this was where the similarities ended, however.

The code of the first page was not obfuscated in any way, and it was easy to identify how and where the credentials were supposed to be sent – specifically, to another compromised web server.

Although it looked nearly the same, under the proverbial hood, the second page was significantly different.

Since its authors left in the corresponding banner, we can clearly see that the HTML code was obfuscated using a simple function offered by the Snap Builder service…

This – trivial to bypass – layer of protection wasn’t the only one present. If one were to decode the HTML code into a readable form, one would still find some portions of it obfuscated through a common substitution mechanism.

Bypassing similar protection is of course reasonably simple, and in this instance, it could even be done manually, as the following example shows.

The URL, to which credentials should have been sent was:

hxxps[:]//api.telegram[.]org/bot7246282440:AAHJb7KssReEsgMVGaXOjj0TL_3mJGAMIcA/sendMessage

As we can see, although – given the visual similarities – the starting point for both credential stealing web pages was almost certainly the same phishing kit, the pages turned out to be quite different in the way they functioned and in the way they were protected.

This shows quite well that although the aforementioned claim about threat actors aligning in their use of similar techniques and tools holds true, this doesn’t necessarily mean that the end result will be the same…

———–
Jan Kopriva
LinkedIn
Nettles Consulting

(c) SANS Internet Storm Center. https://isc.sans.edu Creative Commons Attribution-Noncommercial 3.0 United States License.

Accelerating CI with AWS CodeBuild: Parallel test execution now available

This post was originally published on this site

I’m excited to announce that AWS CodeBuild now supports parallel test execution, so you can run your test suites concurrently and reduce build times significantly.

With the demo project I wrote for this post, the total test time went down from 35 minutes to six minutes, including the time to provision the environments. These two screenshots from the AWS Management Console show the difference.

Sequential execution of the test suite

CodeBuild Parallel Test Results

Parallel execution of the test suite

CodeBuild Parallel Test Results

Very long test times pose a significant challenge when running continuous integration (CI) at scale. As projects grow in complexity and team size, the time required to execute comprehensive test suites can increase dramatically, leading to extended pipeline execution times. This not only delays the delivery of new features and bug fixes, but also hampers developer productivity by forcing them to wait for build results before proceeding with their tasks. I have experienced pipelines that took up to 60 minutes to run, only to fail at the last step, requiring a complete rerun and further delays. These lengthy cycles can erode developer trust in the CI process, contribute to frustration, and ultimately slow down the entire software delivery cycle. Moreover, long-running tests can lead to resource contention, increased costs because of wasted computing power, and reduced overall efficiency of the development process.

With parallel test execution in CodeBuild, you can now run your tests concurrently across multiple build compute environments. This feature implements a sharding approach where each build node independently executes a subset of your test suite. CodeBuild provides environment variables that identify the current node number and the total number of nodes, which are used to determine which tests each node should run. There is no control build node or coordination between nodes at build time—each node operates independently to execute its assigned portion of your tests.

To enable test splitting, configure the batch fanout section in your buildspec.xml, specifying the desired parallelism level and other relevant parameters. Additionally, use the codebuild-tests-run utility in your build step, along with the appropriate test commands and the chosen splitting method.

The tests are split based on the sharding strategy you specify. codebuild-tests-run offers two sharding strategies:

  • Equal-distribution. This strategy sorts test files alphabetically and distributes them in chunks equally across parallel test environments. Changes in the names or quantity of test files might reassign files across shards.
  • Stability. This strategy fixes the distribution of tests across shards by using a consistent hashing algorithm. It maintains existing file-to-shard assignments when new files are added or removed.

CodeBuild supports automatic merging of test reports when running tests in parallel. With automatic test report merging, CodeBuild consolidates tests reports into a single test summary, simplifying result analysis. The merged report includes aggregated pass/fail statuses, test durations, and failure details, reducing the need for manual report processing. You can view the merged results in the CodeBuild console, retrieve them using the AWS Command Line Interface (AWS CLI), or integrate them with other reporting tools to streamline test analysis.

Let’s look at how it works
Let me demonstrate how to implement parallel testing in a project. For this demo, I created a very basic Python project with hundreds of tests. To speed things up, I asked Amazon Q Developer on the command line to create a project and 1,800 test cases. Each test case is in a separate file and takes one second to complete. Running all tests in a sequence requires 30 minutes, excluding the time to provision the environment.

In this demo, I run the test suite on ten compute environments in parallel and measure how long it takes to run the suite.

To do so, I added a buildspec.yml file to my project.

version: 0.2

batch:
  fast-fail: false
  build-fanout:
    parallelism: 10 # ten runtime environments 
    ignore-failure: false

phases:
  install:
    commands:
      - echo 'Installing Python dependencies'
      - dnf install -y python3 python3-pip
      - pip3 install --upgrade pip
      - pip3 install pytest
  build:
    commands:
      - echo 'Running Python Tests'
      - |
         codebuild-tests-run 
          --test-command 'python -m pytest --junitxml=report/test_report.xml' 
          --files-search "codebuild-glob-search 'tests/test_*.py'" 
          --sharding-strategy 'equal-distribution'
  post_build:
    commands:
      - echo "Test execution completed"

reports:
  pytest_reports:
    files:
      - "*.xml"
    base-directory: "report"
    file-format: JUNITXML 

There are three parts to highlight in the YAML file.

First, there’s a build-fanout section under batch. The parallelism command tells CodeBuild how many test environments to run in parallel. The ignore-failure command indicates if failure in any of the fanout build tasks can be ignored.

Second, I use the pre-installed codebuild-tests-run command to run my tests.

This command receives the complete list of test files and decides which of the tests must be run on the current node.

  • Use the sharding-strategy argument to choose between equally distributed or stable distribution as I explain above.
  • Use the files-search argument to pass all the files that are candidates for a run. We recommend to use the provided codebuild-glob-search command for performance reasons, but any file search tool, such as find(1), will work.
  • I pass the actual test command to run on the shard with the test-command argument.

Lastly, the reports section instructs CodeBuild to collect and merge the test reports on each node.

Then, I open the CodeBuild console to create a project and a batch build configuration for this project. There’s nothing new here, so I’ll spare you the details. The documentation has all the details to get you startedParallel testing works on batch builds. Make sure to configure your project to run in batch.

CodeBuild : create a batch build

Now, I’m ready to trigger an execution of the test suite. I can commit new code on my GitHub repository or trigger the build in the console.

CodeBuild : trigger a new build

After a few minutes, I see a status report of the different steps of the build; with a status for each test environment or shard.

CodeBuild: status

When the test is complete, I select the Reports tab to access the merged test reports.

CodeBuild: test reports

The Reports section aggregates all test data from all shards and keeps the history for all builds. I select my most recent build in the Report history section to access the detailed report.

CodeBuild: Test Report

As expected, I can see the aggregated and the individual status for each of my 1,800 test cases. In this demo, they’re all passing, and the report is green.

The 1,800 tests of the demo project take one second each to complete. When I run this test suite sequentially, it took 35 minutes to complete. When I run the test suite in parallel on ten compute environments, it took six minutes to complete, including the time to provision the environments. The parallel run took 17.1 percent of the time of the sequential run. Actual numbers will vary with your projects.

Additional things to know
This new capability is compatible with all testing frameworks. The documentation includes examples for Django, Elixir, Go, Java (Maven), Javascript (Jest), Kotlin, PHPUnit, Pytest, Ruby (Cucumber), and Ruby (RSpec).

For test frameworks that don’t accept space-separated lists, the codebuild-tests-run CLI provides a flexible alternative through the CODEBUILD_CURRENT_SHARD_FILES environment variable. This variable contains a newline-separated list of test file paths for the current build shard. You can use it to adapt to different test framework requirements and format test file names.

You can further customize how tests are split across environments by writing your own sharding script and using the CODEBUILD_BATCH_BUILD_IDENTIFIER environment variable, which is automatically set in each build. You can use this technique to implement framework-specific parallelization or optimization.

Pricing and availability
With parallel test execution, you can now complete your test suites in a fraction of the time previously required, accelerating your development cycle and improving your team’s productivity. The demo project I created to illustrate this post consumes 18.7 percent of the time of a sequential build.

Parallel test execution is available on all three compute modes offered by CodeBuild: on-demand, reserved capacity, and AWS Lambda compute.

This capability is available today in all AWS Regions where CodeBuild is offered, with no additional cost beyond the standard CodeBuild pricing for the compute resources used.

I invite you to try parallel test execution in CodeBuild today. Visit the AWS CodeBuild documentation to learn more and get started with parallelizing your tests.

— seb

PS: Here’s the prompt I used to create the demo application and its test suite: “I’m writing a blog post to announce codebuild parallel testing. Write a very simple python app that has hundreds of tests, each test in a separate test file. Each test takes one second to complete.”


How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

Sitecore "thumbnailsaccesstoken" Deserialization Scans (and some new reports) CVE-2025-27218, (Thu, Mar 27th)

This post was originally published on this site

On March 6th, Searchlight Cyber published a blog revealing details about a new deserialization vulnerability in Sitecore [1]. Sitecore calls itself a "Digital Experience Platform (CXP)," which is a fancy content management system (CMS). Sitecore itself is written in .Net and is often sold as part of a solution offered by Sitecore partners. Like other CMSs, it makes it easy to manage a website's content. It offers several attractive features to marketing professionals seeking more insight into user patterns.

Firewall support for AWS Amplify hosted sites

This post was originally published on this site

Today, we’re announcing the general availability of the AWS WAF integration with AWS Amplify Hosting.

Web application owners are constantly working to protect their applications from a variety of threats. Previously, if you wanted to implement a robust security posture for your Amplify Hosted applications, you needed to create architectures using Amazon CloudFront distributions with AWS WAF protection, which required additional configuration steps, expertise, and management overhead.

With the general availability of AWS WAF in Amplify Hosting, you can now directly attach a web application firewall to your AWS Amplify apps through a one-click integration in the Amplify console or using infrastructure as code (IaC). This integration gives you access to the full range of AWS WAF capabilities including managed rules, which provide protection against common web exploits and vulnerabilities like SQL injection and cross-site scripting (XSS). You can also create your own custom rules based on your specific application needs.

This new capability helps you implement defense-in-depth security strategies for your web applications. You can take advantage of AWS WAF rate-based rules to protect against distributed denial of service (DDoS) attacks by limiting the rate of requests from IP addresses. Additionally, you can implement geo-blocking to restrict access to your applications from specific countries, which is particularly valuable if your service is designed for specific geographic regions.

Let’s see how it works
Setting up AWS WAF protection for your Amplify app is straightforward. From the Amplify console, navigate to your app settings, select the Firewall tab, and choose the predefined rules you want to apply to your configuration. AWS WAF integration in AWS Amplify Hosting

Amplify hosting simplifies configuring firewall rules. You can activate four categories of protection.

  • Amplify-recommended firewall protection – Protect against the most common vulnerabilities found in web applications, block IP addresses from potential threats based on Amazon internal threat intelligence, and protect against malicious actors discovering application vulnerabilities.
  • Restrict access to amplifyapp.com – Restrict access to the default Amplify generated amplifyapp.com domain. This is useful when you add a custom domain to prevent bots and search engines from crawling the domain.
  • Enable IP address protection – Restrict web traffic by allowing or blocking requests from specified IP address ranges.
  • Enable country protection – Restrict access based on specific countries.

Protections enabled through the Amplify console will create an underlying web access control list (ACL) in your AWS account. For fine-grained rulesets, you can use the AWS WAF console rule builder.

After a few minutes, the rules are associated to your app and AWS WAF blocks suspicious requests.

If you want to see AWS WAF in action, you can simulate an attack and monitor it using the AWS WAF request inspection capabilities. For example, you can send a request with an empty User-Agent value. It will trigger a blocking rule in AWS WAF.

Let’s first send a valid request to my app.

curl -v -H "User-Agent: MyUserAgent" https://main.d3sk5bt8rx6f9y.amplifyapp.com/
* Host main.d3sk5bt8rx6f9y.amplifyapp.com:443 was resolved.
...(redacted for brevity)...
> GET / HTTP/2
> Host: main.d3sk5bt8rx6f9y.amplifyapp.com
> Accept: */*
> User-Agent: MyUserAgent
> 
* Request completely sent off
< HTTP/2 200 
< content-type: text/html
< content-length: 0
< date: Mon, 10 Mar 2025 14:45:26 GMT
 

We can observe that the server returned an HTTP 200 (OK) message.

Then, send a request with no value associated to the User-Agent HTTP header.

 curl -v -H "User-Agent: " https://main.d3sk5bt8rx6f9y.amplifyapp.com/ 
* Host main.d3sk5bt8rx6f9y.amplifyapp.com:443 was resolved.
... (redacted for brevity) ...
> GET / HTTP/2
> Host: main.d3sk5bt8rx6f9y.amplifyapp.com
> Accept: */*
> 
* Request completely sent off
< HTTP/2 403 
< server: CloudFront
... (redacted for brevity) ...
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD><BODY>
<H1>403 ERROR</H1>
<H2>The request could not be satisfied.</H2>

We can observe that the server returned an HTTP 403 (Forbidden) message.

AWS WAF provide visibility into request patterns, helping you fine-tune your security settings over time. You can access logs through Amplify Hosting or the AWS WAF console to analyze traffic trends and refine security rules as needed.

AWS WAF integration in AWS Amplify Hosting - Dashboard

Availability and pricing
Firewall support is available in all AWS Regions in which Amplify Hosting operates. This integration falls under an AWS WAF global resource, similar to Amazon CloudFront. Web ACLs can be attached to multiple Amplify Hosting apps, but they must reside in the same Region.

The pricing for this integration follows the standard AWS WAF pricing model, You pay for the AWS WAF resources you use based on the number of web ACLs, rules, and requests. On top of that, AWS Amplify Hosting adds $15/month when you attach a web application firewall to your application. This is prorated by the hour.

This new capability brings enterprise-grade security features to all Amplify Hosting customers, from individual developers to large enterprises. You can now build, host, and protect your web applications within the same service, reducing the complexity of your architecture and streamlining your security management.

To learn more, visit the AWS WAF integration documentation for Amplify or try it directly in the Amplify console.

— seb


How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

[Guest Diary] Leveraging CNNs and Entropy-Based Feature Selection to Identify Potential Malware Artifacts of Interest, (Wed, Mar 26th)

This post was originally published on this site

[This is a Guest Diary by Wee Ki Joon, an ISC intern as part of the SANS.edu Bachelor's Degree in Applied Cybersecurity (BACS) program [1].]

Executive Summary

This diary explores a novel methodology for classifying malware by integrating entropy-driven feature selection [2] with a specialized Convolutional Neural Network (CNN) [3]. Motivated by the increasing obfuscation tactics used by modern malware authors, we will focus on capturing high-entropy segments within files, regions most likely to harbor malicious functionality, and feeding these distinct byte patterns into our model.

The result is a multi-class classification model [4] capable of delivering accurate, easily accessible malware categorizations, in turn creating new opportunities for deeper threat correlation and more efficient triage.

We will also discuss the following sections:

  • Entropy-Based Sliding Window Extraction: Rather than analyzing only the initial segment of each file, we apply a sliding window mechanism that computes entropy in overlapping chunks. This locates suspicious, high-entropy hotspots without being limited by fixed-size segments.
     
  • CNN Architecture and Training: A multi-layer convolutional neural network design ingesting byte-image representation was employed. Techniques such as class weighting, batch normalization, and early stopping ensure balanced, high-fidelity learning.
     
  • Evaluation and Results: Tested against a large corpus of malicious and benign binaries, the model achieved approximately 91% overall accuracy, with notable performance in distinguishing multiple malware families. Confusion matrices also highlight pitfalls among closely related classes (e.g. droppers and downloaders).

 

Motivations and Background

Analysts today are inundated with vast volumes of data, most of which comprise routine or low-value threats. This overwhelming noise complicates efforts to identify genuinely novel or targeted cyber threats, which often hide within seemingly mundane data. Inspired by previous diaries exploring DBScan for clustering [5][6], it may also be worthwhile to explore machine learning as a means of identifying potential malware artifacts of interest for deeper analysis.

While machine learning offers numerous practical benefits [7], one compelling example relevant to DShield honeypot analysts is the ability to generate accurate, easily accessible malware categorizations.

Such capabilities can streamline the triage process by allowing analysts to rapidly correlate suspicious behaviors and indicators with established malware categories, thus accelerating informed decision-making. Enhanced categorization directly contributes to broader threat intelligence clustering initiatives, enabling analysts to more effectively discern campaign-level patterns and common Tactics, Techniques, and Procedures (TTPs). Additionally, it can uncover subtle yet critical connections among malware samples that initially seem unrelated, driving further investigations that may reveal extensive attacker campaigns or shared infrastructure.

 

Attempts to Build a Better Mousetrap

My introduction to CNN-based malware classification began in SEC595 [8], where the initial N bytes of a file were converted into image representations for malware analysis.

Malware today is frequently obfuscated—through packing, encryption, or other structural manipulations designed to evade detection. These techniques mean that critical identifying features do not always appear within just the first N bytes of a file.

Fig 1. First N Bytes of Traditional Malware

 

While the first N bytes approach is effective at discerning traditional malware from benign files, it struggles significantly when faced with more sophisticated, obfuscated threats.

Fig 2. First N Bytes is Insufficient to Distinguish Obfuscated Malware

 

To overcome these challenges, a multi-segment feature extraction approach, guided by Shannon entropy [9], was implemented. This strategy enables more robust detection by capturing meaningful patterns throughout the entire file, rather than relying exclusively on its initial segment.

Our methodology consists of 2 major components:

  • Entropy-based Sliding Window Feature Extraction, which finds and extracts multiple informative byte regions from each malware sample.
  • A CNN Architecture that ingests the composite byte-region image and learns to classify the malware.

 

Part 1: Entropy-Based Sliding Window Feature Extraction

While we could split a file into fixed-size chunks, a sliding window guided by entropy highlights the most suspicious regions, enabling the extraction of fewer—yet more informative—byte segments. This approach both reduces noise and pinpoints high-risk code segments, ensuring the classifier focuses on content most likely to distinguish a specific malware.
Building on the concepts from a paper which analyzes an entire file’s entropy by creating a time series [10] and applying wavelet transforms to capture variations, our approach similarly capitalizes on distinct 'entropy signatures' in different malware families.

The idea is that different malware categories can have distinct entropy 'signatures' (some might have two spikes corresponding to two encrypted sections, others a single long spike).  We will extend those same concepts to our process, since those same 'signatures' should translate to our byte-image representation.

While the entropy time series in the paper was captured using non-overlapping, fixed-size chunks, our sliding window ensures we don’t miss critical high-entropy regions, even when we sample only part of the file.

Consider the following example entropy profile of a binary file: The high entropy regions (light blue region) stand out from low-entropy areas. A sliding window approach allows such regions to be captured regardless of alignment, unlike fixed-size segmentation (red blocks, in this case aligned to every 1000 blocks) which unable to accurately capture the full entropy 'signature' (yellow region).

Fig 3. Example entropy profile of a binary file across its length (dark blue curve is entropy per block). *Image modified from [11]*

In the following section, we walk through each of our sliding window approach, from defining the window, to applying a dynamic threshold and assembling those regions into concise feature vectors:

 

Fig 4. Overview of Entropy-Based Sliding Window Feature Extraction Process.

 

  1. Defining Sliding Windows


    Fig 5. Code to Define Sliding Window

     

    • 12,544 bytes sized windows are created that will 'slide' across the file.
    • Each stride will be 1/4 of the window size (3,136 bytes) so in a single window, 4 entropy 'snapshots' will be captured.
    • Sliding window will begin at byte 0 and extract from byte 0 till byte 12,543 in for the 1st region, moving the window forward 3,136 bytes each time, until the end of the file.


      Fig 6. Example of sliding window implementation
       

  2. Entropy Calculation for Each Window
    • For each window, shannon entropy is calculated to calculate entropy value and stored along with the windows start/end position.


      Fig 7. Snippet of Entropy Calculation Code

       

  3.  Entropy Threshold Calculation


    Fig 8. Snippet of Entropy Threshold Code

     

    • Entropy scores from all window positions are collected.
    • Mean (average entropy for this specific file) and standard deviation (entropy variation within this specific file) is then calculated.
       
  4. Setting a Dynamic Threshold


    Fig 9. Snippet of Dynamic Threshold Setting

     

    • Threshold is set at mean + 1.5 standard deviation for each file. This means files with overall higher entropy will have a higher threshold and lower entropy, lower threshold in order to select for the most anomalous parts relative to each file.
    • If no windows pass the threshold, we will just use the top entropy window (safety net to ensure that we always capture some features, even if file has relatively uniform entropy, which itself can be a characteristic that might distinguish certain malware categories i.e. absence of distinctive regions becomes a feature in itself such as files with artificially manipulated entropy to evade detection).
       
  5. Rank and Select Representative Windows
    • Filtered windows are then sorted by entropy values
    • Highest entropy window is selected first, then for each subsequent window, calculate overlap with highest entropy window, if overlap >50%, discard this window to enforce more diversity.


      Fig 10. Function Selecting Entropy Windows
       

  6. Feature Vector Construction
    We now have 4 segments for each file:

    1. First 12KB Bytes: Capturing headers, imports, and static indicators.
    2. Highest Entropy 12KB Byte Region: The selection of the highest entropy region was identified based on packed malware samples, where obfuscation techniques such as packing and encryption often manifest as high-entropy regions.
    3. 2nd Highest Entropy 12KB Byte Region: Captures additional entropy 'signature'.
    4. Last 12KB Bytes: Captures potential artifacts and obfuscation techniques used by threat actors such as APT28.


      Fig 11. Captured Segments of a Malware Sample

These segments are combined into a single 50,176-byte vector, where:

  1. A list of feature_arrays is first initialized. We will use np.frombuffer() to convert the binary data in first_region into a NumPy array.
  2. We then loop through the compiled high_entropy_regions and covert them into a NumPy array before adding them to the same feature_arrays.
  3. The last region is also converted and appended the same way.
  4. We will use np.hstack() to concatenate them along the first axis, creating a feature vector.
  5. We will double check the size, truncating the vector if oversized, and padding with zeroes if undersized.


    Fig 12. Snippet for Final Feature Vector Construction

 

What This Means in Practice

Given a malware file of 500 KB:

  • The sliding window now generates ~160 entropy measurements instead of only 40 with fixed windows chunks.
  • Adaptive threshold captures what is 'significant', for example if the entropy is 6 bits and only 1 window is 7.5 bits while the 2nd highest region is 5.9 bits, we do not capture the 2nd highest region since it is not meaningfully different from the rest of the file.

 

BODMAS Dataset

We used the BODMAS (Blue Hexagon Open Dataset for Malware Analysis) dataset [12] for this project. The BODMAS (Blue Hexagon Open Dataset for Malware Analysis) dataset is a comprehensive dataset of 57,293 malicious and 77,142 benign Windows PE files, including disarmed malware binaries, feature vectors and metadata, kindly provided by Zhi Chen and Dr Gang from the University of Illinois Urbana-Champaign for this exploration, the dataset offers more up-to-date and well-annotated samples compared to many existing sources, which often lack family/feature details, withhold binaries, or contain outdated threats.
 


Fig 13. Excerpt from a Previous Attack Observation Highlighting Potential Issues with Public Labels

 

Data Preprocessing

While there are 14 malware categories within the BODMAS dataset, because not all malware categories contain sufficient samples, we only focus on selected categories and store them in a dictionary mapping:

  • Worm
  • Trojan
  • Downloader
  • Dropper
  • Backdoor
  • Ransomware
  • Information Stealer
  • Virus

 
Fig 14. Data Preprocessing Code for Creating Category Mappings

 

We will categorize the benign files first since any files that are unlabeled are benign files, before categorizing the malicious files.


Fig 15. Data Preprocessing Code for Filtering Out Benign Files

We will then create integer labels for the all the categories in ascending order since we are using sparse_categorical_crossentropy. The 'benign' category is placed last, which assigns all the uncategorized files as benign.


Fig 16. Integer Labeling for All Assigned Categories

 

Tangent: Importance of Logging

During first several runs of the training process, there appears to be excessive files showing up as low entropy from the logging added to catch edge cases. Duplicate '._<hash>' files of our malware samples were being picked up by our self.executables_dir.glob("*.exe") file glob pattern as treated as valid executables files since they had the same extension as our original malware samples.


Fig 17. Logging Messages Showing ._<hash> Files

These files were AppleDouble files [13] that are automatically created when macOS interacts with files on non-macOS filesystems, which was not accounted for during the switch to an external drive to better organize the large BODMAS malware dataset.

This negatively impacted the initial training process and would probably have gone unnoticed if not for the deliberate logging added during the training process as they were hidden even with system commands to show hidden files from the finder.

A fix was subsequently added to the data preparation script to filter out the files.


Fig 18. Fix to Filter out Resource Fork Files

 

Part 2: CNN Architecture (and Training Strategy)

To address class imbalance in our dataset, before CNN training, we used  sklearn.utils.class_weight to compute the class weights [14] based on the precomputed y_labels, where the weights scales inversely proportional to the class frequencies. The weights are calculated after train/test split to properly reflect training and testing distributions.


Fig 19. Class Weight Balancing Based on Class Frequencies

The combined data from previous data preparation steps were split into training and test sets. We have not shuffled the data before this stage to  maintain class distribution and will use stratified sampling [15] (stratify=y_labels) in order to maintain class distribution in our test splits. This ensures that our test set properly represents our entire population. We will use a random seed of 42 because *insert pop culture reference here*. [16]


Fig 20. Stratified Sampling on Train Test Split

 

The final CNN model consists of multiple convolutional layers with batch normalization and dropout to enhance feature extraction and prevent overfitting. The architecture is as follows:

  1. Convolutional Layers Progression: 32, 64, 128, 256 filters
  2. Kernel Size: 3×3
  3. Padding: 'same'
  4. Activation Layer: 'relu'
  5. Batch Normalization
  6. MaxPooling: 2×2
  7. Progressive Dropout Rates: 0.2, 0.3, 0.4, 0.5
  8. Activation Layer: 'Softmax'
  9. Loss: Sparse Categorical Cross Entropy
  10. Optimizer: Adam
  11. Learning Rate: 0.001


Fig 21. CNN Model Parameters

The model was compiled with an Adam optimizer (learning rate = 0.0001) and trained using the following hyperparameter optimization:

  • BatchNormalization to stabilize learning.
  • ReduceLROnPlateau: Adjusts the learning rate when validation loss plateaus after 3 epochs.


    Fig 22. ReduceLROnPlateau triggering on Epoch 9

     

  • EarlyStopping: Prevents overfitting by stopping training if validation loss stops improving after 10 epochs (patience = 10, restores best weights).
  • ModelCheckpoint: Saves the best-performing model based on validation loss.
     

Model Evaluation and Benchmarking

The model was evaluated on a held-out test set of 11,442 malware samples (20% of the BODMAS malware dataset, never seen during training)


Fig 23. Distributions of Test Set

Key evaluation metrics include:

  • Precision
  • Recall
  • Confusion matrix
  • Additionally, we also tracked per-class accuracy, along with F1-score for each class

With a final test score of 90.88% and test loss of 0.3317, the results were very promising, demonstrating that the model not only achieves high overall accuracy, but does so consistently across all malware types. Below, we dive into each evaluation aspect and then compare our approach with other malware detection solutions.

The model faced the most difficulty when classifying  dropper and downloader, with both showing a high recall and low precision [17], indicating that the model misclassifies other samples as droppers and downloaders.


Fig 24. Per Class Metrics

The figure below plots the model's training progress over 30 epochs. The training accuracy (blue curve) shows a smooth upward trend, reaching 92.42% accuracy by the final epoch, while the validation accuracy (orange curve) climbs to 91.76%, tracking closely with the training curve, indicating to us that our model is generalizing well (not overfitting) and that there is likely still room to increase model capacity.


Fig 25. Model Training and Validation Performance History

Let us break down the per class accuracy:

 

  • Given 1466 unseen backdoors tested: 96.6% (1416) were accurately determined
  • Given 206 unseen downloaders tested: 91.7% (189) were accurately determined
  • Given 143 unseen droppers tested: 91.6% (131) were accurately determined
  • Given 90 information stealers tested: 88.9% (80) were accurately determined
  • Given 164 unseen ransomware tested: 88.4% (145) were accurately determined
  • Given 5995 unseen trojan tested: 87.0% (5215) were accurately determined
  • Given 38 unseen virus tested: 94.7% (36) were accurately determined
  • Given 3340 unseen worm tested: 95.4% (3187) were accurately determined


Fig 26. Model Training and Validation Performance History

 

Contextualizing Multi-class Model Performance

While accuracy is a common and intuitive metric for evaluating machine learning models, it can be misleading, particularly in multi-class classification scenarios like ours.

Consider a different scenario: a multi-class model classifying data into eight distinct categories achieves an accuracy of 50%. Initially, this result might seem unimpressive and akin to random guessing. However, compared to random guessing—which would yield only 12.5% accuracy—a 50% accuracy indicates significant predictive capability.


Fig 27. "I Swear All These Look The Same"

Nevertheless, accuracy alone doesn't reveal the entire performance story. A confusion matrix provides valuable insights by clearly displaying how the model’s predictions are distributed among different categories. It helps identify specific categories frequently confused with each other, enabling targeted improvements such as the need for additional distinguishing features, or reconsideration of category definitions, possibly merging categories that are inherently similar.

Analyzing the confusion matrix [18], we observe clear diagonal dominance indicating accurate class separation overall, yet some notable misclassification patterns emerge:

  1. Trojans are often misclassified as other classes:
    • This could be due to malware categories generally sharing similar byte-level patterns with Trojan (which deliberately disguises themselves as other software)
    • Our class-weighting strategies might also lead the model to overly emphasize minority-class characteristics (as the model is penalized heavily for misclassifying the minority), inadvertently causing proportionally more misclassifications into minority class like downloaders and droppers.
       
  2. Worms are notably misclassified as Droppers, and Trojans:
    • Worms and Droppers, in particular, may possess common characteristics that are challenging to differentiate clearly, such as similar network call patterns or byte-level features.


      Fig 28. Model's Confusion Matrix [18]

 

(Short) Comparative Analysis Against Contemporary Models

To place our model’s performance in context, we compared its results against DeepGray [19] due to its contemporary relevance and extensive performance evaluation using several established pre-trained image recognition models, such as:

•    VGG16 [20]
•    InceptionV3 [21]
•    EfficientnetV2B0 [22]
•    Vision Transformers(ViT-B32) [23]

It is important to recognize that due to the slight difference in category labels chosen, it is not a direct 'apples to apples' comparison. However, these comparisons still help validate our model’s efficacy relative to latest methodologies and offer valuable insight in general effectiveness of our methodology and highlights areas where further refinement could drive even stronger results.

Overall, our self-taught model did admirably well, even when compared to established pre-trained image recognition models.

Our accuracy of 91% puts it right in the middle of the pack, although due to our class imbalance, our macro averages—which treat each class equally—were significantly lower than our weighted averages, which adjust performance by class frequency. This disparity highlights the nuances of multi-class evaluation: while the model is highly effective in aggregate, certain minority classes (e.g. droppers, downloaders) drive down the macro metrics.
 

Model Accuracy Score Precision Recall F1-Score
VGG16 (DeepGray) 0.82 Macro
Weighted
0.83
0.83
0.80
0.82
0.80
0.82
InceptionV3 (DeepGray) 0.90  Macro
Weighted
0.90
0.90
0.90
0.90
0.90
0.90
Our Model 0.91 Macro
Weighted
0.73
0.93
0.92
0.91
0.80
0.91
EfficientnetV2B0 (DeepGray) 0.93 Macro
Weighted
0.93
0.93
0.93
0.93
0.93
0.93
ViT-B32 Vision Transformer (DeepGray) 0.95 Macro
Weighted
0.96
0.95
0.96
0.95
0.96
0.95

Table 1. Comparison of our model against models used in DeepGray

 

Conclusion

By leveraging entropy-driven feature selection and a specialized CNN architecture, we demonstrated a robust multi-class malware classification approach that effectively distinguish obfuscated threats. Through dynamic thresholding of high-entropy regions, we ensure the model focuses on byte-level segments most indicative of malicious behavior, delivering an approximate 91% accuracy on a held-out test set. Notably, the model maintains strong performance across varied malware families—including trojans, worms, and ransomware.

Beyond pure accuracy metrics, the model’s confusion matrix reveals targeted misclassification challenges. Categories like droppers and downloaders show overlapping features, emphasizing the need for continuous refinement of feature engineering to reduce false positives. Nonetheless, these results are encouraging, even when compared to leading pre-trained image recognition models on analogous malware detection tasks, validating the general effectiveness of our methodology.

From a practical standpoint, the work also highlights the importance of rigorous data preprocessing, class balancing, and proper logging. Identifying and filtering out spurious files (e.g., AppleDouble files) proved crucial to maintaining dataset integrity. Similarly, stratified sampling and class-weight adjustments helped ensure fairer representation of minority classes, mitigating skewed distribution and improving both model accuracy and reliability.

Also, scripts for performing the data preparation and classification are shared on GitHub [24].

References

[1] https://www.sans.edu/cyber-security-programs/bachelors-degree/
[2] https://en.wikipedia.org/wiki/Feature_selection
[3] https://en.wikipedia.org/wiki/Convolutional_neural_network
[4] https://developers.google.com/machine-learning/crash-course/classification/multiclass
[5] https://isc.sans.edu/diary/31050
[6] https://isc.sans.edu/diary/31194
[7] https://www.researchgate.net/publication/378288472_A_comprehensive_review_of_machine_learning's_role_in_enhancing_network_security_and_threat_detection
[8] https://www.sans.org/cyber-security-courses/applied-data-science-machine-learning/
[9] https://www.quantamagazine.org/how-claude-shannons-concept-of-entropy-quantifies-information-20220906/
[10] https://www.researchgate.net/publication/365110043_Classification_of_Malware_by_Using_Structural_Entropy_on_Convolutional_Neural_Networks
[11] https://deadhacker.com/2007/05/13/finding-entropy-in-binary-files/
[12] L. Yang, A. Ciptadi, I. Laziuk, A. Ahmadzadeh and G. Wang, "BODMAS: An Open Dataset for Learning based Temporal Analysis of PE Malware," 2021 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, 2021, pp. 78-84, doi: 10.1109/SPW53761.2021.00
[13] https://en.wikipedia.org/wiki/AppleSingle_and_AppleDouble_formats
[14] https://www.tensorflow.org/tutorials/structured_data/imbalanced_data
[15] https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
[16] https://en.wikipedia.org/wiki/Phrases_from_The_Hitchhiker%27s_Guide_to_the_Galaxy#Answer_to_the_Ultimate_Question_of_Life,_the_Universe,_and_Everything_(42)
[17] https://en.wikipedia.org/wiki/Precision_and_recall
[18] https://en.wikipedia.org/wiki/Confusion_matrix
[19] https://www.researchgate.net/publication/380818154_DeepGray_Malware_Classification_Using_Grayscale_Images_with_Deep_Learning
[20] https://arxiv.org/pdf/1409.1556
[21] https://arxiv.org/pdf/1512.00567
[22] https://arxiv.org/pdf/2104.00298
[23] https://arxiv.org/pdf/2101.03771
[24] https://github.com/weekijoon/entropy-malware-classifier-cnn

 


Jesse La Grew
Handler

(c) SANS Internet Storm Center. https://isc.sans.edu Creative Commons Attribution-Noncommercial 3.0 United States License.