7 minutes
How to use AWS StepFunction to dynamically pass commands to ECS tasks
These days I stumbled upon an interesting problem, the solution of which I would like to share with you here. It was difficult for me to find a suitable solution on the Internet, which is why I would like to make this article available to you here.
tl; dr
Use this yaml code in order to create a StepFunction triggered by a Cloudwatch/EventBridge event which starts any arbritrary ECS task by passing a start command to the target task.
Background
I had the task of migrating a cluster of applications from the customer’s own data center to AWS. It turned out that these applications are often only the same application that is called with different parameters. Our customer has already done preliminary work and containerized the application. The result was an image whose container should be started with different start commands. The applications should always be started at certain times (batch processing). The infrastructure should be made available to the application team in the form of CloudFormation scripts.
The idea
The first thing that came to mind was ECS as the service for running the containers. Coupled with Cloudwatch events, this service is an easy way to start containers on a time-controlled basis. However, it quickly became apparent that a task definition and associated other resources should have been created for each of these containers, which only differed from one another by their start command. That would have bloated the code extremely and resulted in poorly maintainable code. Unfortunately, CloudFormation also offers very few possibilities (if at all) to iterate over arrays or the like. Further considerations then led me to the idea of storing the commands in the Cloudwatch events and giving them to the containers at the start. Unfortunately, this turned out to be a dead end, as this is not technically supported.
So I thought about an alternative solution and came up with the following idea:
Instead of starting the containers (tasks) directly via Cloudwatch Events, I switch a StepFunction in between. This can receive parameters from Cloudwatch and also supports the transfer of these to the ECS task. In addition, you have the option to switch several calls in a row, should that be necessary. With a StepFunction I can now start all tasks with different start commands as long as they use the same image. An initial search on the Internet was disappointing, however: there was hardly any suitable documentation to be found. So I had to figure out how to do it myself.
The implementation
First, two roles need to be created. One role is required for the Cloudwatch event, the other for the execution of the actual StepFunction:
InvokeStatemachineRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- events.amazonaws.com
Action:
- "sts:AssumeRole"
Path: /
Policies:
- PolicyName: CloudWatchLogsDeliveryFullAccessPolicy
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- "logs:CreateLogDelivery"
- "logs:GetLogDelivery"
- "logs:UpdateLogDelivery"
- "logs:DeleteLogDelivery"
- "logs:ListLogDeliveries"
- "logs:PutResourcePolicy"
- "logs:DescribeResourcePolicies"
- "logs:DescribeLogGroups"
Resource: "*"
- PolicyName: StatemachineInvokePolicy
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- "states:DescribeExecution"
- "states:StartExecution"
- "states:StopExecution"
Resource: "*"
StatemachineRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- states.amazonaws.com
Action:
- "sts:AssumeRole"
Path: /
Policies:
- PolicyName: StatemachineExecutionPolicy
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- states:DescribeExecution
- states:StartExecution
- states:StopExecution
- ecs:DescribeTaskDefinition
- ecs:DescribeTasks
- ecs:ListTaskDefinitions
- ecs:ListTasks
- ecs:RunTask
- ecs:StartTask
- ecs:StopTask
- logs:CreateLogGroup
- logs:CreateLogDelivery
- logs:CreateLogStream
- logs:DescribeResourcePolicies
- logs:DescribeLogGroups
- logs:DescribeLogStreams
- logs:DeleteDestination
- logs:GetLogDelivery
- logs:ListTagsLogGroup
- logs:PutMetricFilter
- logs:TagLogGroup
- logs:DescribeSubscriptionFilters
- logs:FilterLogEvents
- logs:PutSubscriptionFilter
- logs:PutResourcePolicy
- logs:PutDestination
- logs:PutDestinationPolicy"
- logs:UpdateLogDelivery
- logs:DeleteLogDelivery
- logs:ListLogDeliveries
- logs:PutResourcePolicy
- events:PutTargets
- events:PutRule
- events:DescribeRule
- iam:PassRole
Resource: "*"
In order to find out what happens when the StepFunction is executed, logs should be collected in Cloudwatch. A so-called LogGroup is required for this:
CloudwatchLogGroup:
Type: AWS::Logs::LogGroup
Properties:
RetentionInDays: 7
LogGroupName: my/loggroup
Now that the roles exist, the EventRules can be created. An example is shown here:
ScheduledRule:
Type: AWS::Events::Rule
Properties:
Description: "ScheduledRule"
ScheduleExpression: "cron(45 21 * * ? *)" #Mon-Sun 23:45 CET
State: "ENABLED"
Targets:
- Arn: !Ref MYStateMachine
Id: MYStateMachine
Input: |
{
"commands": [
"put-your-command-here"
]
}
RoleArn: !GetAtt InvokeStatemachineRole.Arn
RoleArn: !GetAtt InvokeStatemachineRole.Arn
The EventRule transfers the start commands to the StepFunction in the form of a JSON object.
Now comes the interesting part: The StepFunction.
MYStateMachine:
Type: AWS::StepFunctions::StateMachine
Properties:
StateMachineName: MYStateMachine
DefinitionString: !Sub |-
{
"Comment": "This is your state machine",
"StartAt": "ECS RunTask",
"States": {
"ECS RunTask": {
"Type": "Task",
"Resource": "arn:aws:states:::ecs:runTask.sync",
"Parameters": {
"LaunchType": "FARGATE",
"Cluster": "your-ecs-cluster-arn",
"TaskDefinition": "your-ecs-taskdefinition-arn",
"NetworkConfiguration": {
"AwsvpcConfiguration": {
"Subnets": [
"your-subnet-id",
"your-subnet-id"
],
"SecurityGroups": ["your-security-group"],
"AssignPublicIp": "DISABLED"
}
},
"Overrides": {
"ContainerOverrides": [{
"Command.$": "$.commands",
"Name": "your-container-name"
}]
}
},
"End": true
}
}
}
RoleArn: !GetAtt StatemachineRole.Arn
LoggingConfiguration:
Destinations:
- CloudWatchLogsLogGroup:
LogGroupArn: !GetAtt CloudwatchLogGroup.Arn
Level: ERROR
The so-called DefinitionString defines how the StepFunction behaves - i.e. which transitions it comprises of, in which order they are called and what the input and output variables are. The following line is particularly interesting:
"Command.$": "$.Commands",
This line ensures that the commands section of the provided input JSON object are passed to the ECS task as is.
Here´s the whole yaml file including the roles, the event rule and the state machine:
Putting it all together
Resources:
InvokeStatemachineRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- events.amazonaws.com
Action:
- "sts:AssumeRole"
Path: /
Policies:
- PolicyName: CloudWatchLogsDeliveryFullAccessPolicy
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- "logs:CreateLogDelivery"
- "logs:GetLogDelivery"
- "logs:UpdateLogDelivery"
- "logs:DeleteLogDelivery"
- "logs:ListLogDeliveries"
- "logs:PutResourcePolicy"
- "logs:DescribeResourcePolicies"
- "logs:DescribeLogGroups"
Resource: "*"
- PolicyName: StatemachineInvokePolicy
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- "states:DescribeExecution"
- "states:StartExecution"
- "states:StopExecution"
Resource: "*"
StatemachineRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- states.amazonaws.com
Action:
- "sts:AssumeRole"
Path: /
Policies:
- PolicyName: AdminPolicy
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- states:DescribeExecution
- states:StartExecution
- states:StopExecution
- ecs:DescribeTaskDefinition
- ecs:DescribeTasks
- ecs:ListTaskDefinitions
- ecs:ListTasks
- ecs:RunTask
- ecs:StartTask
- ecs:StopTask
- logs:CreateLogGroup
- logs:CreateLogDelivery
- logs:CreateLogStream
- logs:DescribeResourcePolicies
- logs:DescribeLogGroups
- logs:DescribeLogStreams
- logs:DeleteDestination
- logs:GetLogDelivery
- logs:ListTagsLogGroup
- logs:PutMetricFilter
- logs:TagLogGroup
- logs:DescribeSubscriptionFilters
- logs:FilterLogEvents
- logs:PutSubscriptionFilter
- logs:PutResourcePolicy
- logs:PutDestination
- logs:PutDestinationPolicy"
- logs:UpdateLogDelivery
- logs:DeleteLogDelivery
- logs:ListLogDeliveries
- logs:PutResourcePolicy
- events:PutTargets
- events:PutRule
- events:DescribeRule
- iam:PassRole
Resource: "*"
ScheduledRule:
Type: AWS::Events::Rule
Properties:
Description: "ScheduledRule"
ScheduleExpression: "cron(45 21 * * ? *)" #Mon-Sun 23:45 CET
State: "ENABLED"
Targets:
- Arn: !Ref MYStateMachine
Id: MYStateMachine
Input: |
{
"commands": [
"put-your-command-here"
]
}
RoleArn: !GetAtt InvokeStatemachineRole.Arn
RoleArn: !GetAtt InvokeStatemachineRole.Arn
CloudwatchLogGroup:
Type: AWS::Logs::LogGroup
Properties:
RetentionInDays: 7
LogGroupName: my/loggroup
MYStateMachine:
Type: AWS::StepFunctions::StateMachine
Properties:
StateMachineName: MYStateMachine
DefinitionString: !Sub |-
{
"Comment": "This is your state machine",
"StartAt": "ECS RunTask",
"States": {
"ECS RunTask": {
"Type": "Task",
"Resource": "arn:aws:states:::ecs:runTask.sync",
"Parameters": {
"LaunchType": "FARGATE",
"Cluster": "your-ecs-cluster-arn",
"TaskDefinition": "your-ecs-taskdefinition-arn",
"NetworkConfiguration": {
"AwsvpcConfiguration": {
"Subnets": [
"your-subnet-id",
"your-subnet-id"
],
"SecurityGroups": ["your-security-group"],
"AssignPublicIp": "DISABLED"
}
},
"Overrides": {
"ContainerOverrides": [{
"Command.$": "$.commands",
"Name": "your-container-name"
}]
}
},
"End": true
}
}
}
RoleArn: !GetAtt StatemachineRole.Arn
LoggingConfiguration:
Destinations:
- CloudWatchLogsLogGroup:
LogGroupArn: !GetAtt CloudwatchLogGroup.Arn
Level: ERROR
Looking beyond
When thinking about StepFunctions, you usually have a use case which requires sequential or parallel execution of multiple tasks. This is of course also possible with the solution showed above. However, one problem I faced was that the output of the first task was passed as input to the second task, which resulted in a loss of the initial commands. To solve this issue, make sure you specify the following line in the task definition of your first task: "ResultPath": null
. This passes the raw execution input to the subsequent task. See this SO article for more information on this.
Comments