Why you should invest in AWS Big Data & 8 steps to becoming certified

Why you should invest in AWS Big Data & 8 steps to becoming certified

No comments

A decision that many engineers face at some point of their career is deciding what to focus their attention on next. One of the amazing advantages of working in a consultancy is being exposed to many different technologies, providing you the opportunity to explore any emerging trends you might be interested in. I’ve been lucky enough to work with a huge variety of clients ranging from industry leaders in the FTSE 100 to smaller start-ups disrupting the same technology space.

So why did I pick Big Data?

A common pattern I’ve noticed is that everyone has access to data – large amounts of raw, unstructured data. Business and technology leaders all recognise the importance of it, and the value and insight that it can deliver. Processes have been established to extract, transform and store this large amount of information, but the architecture is usually inefficient and incomplete.

Years ago these steps may have equated to the definition of an efficient data pipeline but now with emerging technologies such as Kinesis Streams, Redshift and even Server-less databases there is another way. We now have the possibility of having a real-time, cost efficient and low operational overhead solution.

Alongside this, companies set their sights on creating a data lake in the cloud. In doing so, they take advantage of a whole suite of technologies to store information in formats that they currently leverage and also in a configuration they possibly may harness in the future. These are all clear steps in the journey towards digital transformation, and with the current pace of development in AWS technologies it is the perfect time to become more acquainted with Big Data.

 

But why is the certification necessary?

The AWS Certified Big Data Speciality exam introduces and validates several key big data fundamentals. The exam itself is not just limited to AWS specific technologies but also explores the big data community. Taken straight from the exam guide we can see that the domains cover:

  1. Collection
  2. Storage
  3. Processing
  4. Analysis
  5. Visualization
  6. Data Security

These domains involve a broad range of technical roles ranging from data engineers and data scientists to individuals in SecOps. Personally, I’ve had some exposure to collection and storage of data but much less with regards to visualisation and security. You certainly have to be comfortable with wearing many different hats when tackling this exam as it tests not only your technical understanding of the solutions but also the business value created from the implementation. It’s equally important to consider the costs involved including any forecasts as the solution scales.

Having already completed several associate exams I found this certification much greater in difficulty because you are required to deep dive into Big Data concepts and the relevant technologies. One of the benefits of this certification is that the scope extends to these technologies’ application of Big Data so be prepared to dive into Machine Learning and popular frameworks like Spark & Presto.

 

Okay so how do I pass the exam?

1. A Cloud Guru’s certified big data specialty course provides an excellent introduction and overview.

2. Have some practical experience of Big data in AWS, theoretical knowledge is not enough to pass this exam…

  1. Practice architecting data pipelines, consider when Kinesis Streams vs Firehose would be appropriate.
  2. Think about how the solution would differ according to the size of the data transfer, sometimes even Snowmobile can become efficient.

3. Understand the different storage options on AWS – S3, DynamoDB, RDS, Redshift, HDFS vs EMRFS, HBase…

4. Understand the differences and use cases of popular Big Data frameworks e.g. Presto, Hive, Spark. 

5. Data Security contributes the most to your overall exam score at 20% and is involved in every single AWS service. There are always options for making the solution more secure and sometimes they’re enabled by default.

  1. Understand how to enable encryption at rest or in-transit, whether to use KMS or S3, or client side vs server side.
  2. How to grant privileged access to data e.g. IAM, Redshift Views.
  3. Authentication flows with Cognito and integrations with external identity providers.

6. Performance is a key trend

  1. Have a sound understanding of what GSI’s and LSI’s are in DynamoDB.
  2. Consider primary & sort keys, distribution styles in all of the database services
  3. Different compression types and speed of compressing/decompressing.

7.  Dive into Machine learning (ML)

  1. The Cloud Guru course mentioned above gives a good overview of the different ML models.
  2. If you have time I would recommend this machine learning course by Andrew Ng on Coursera. The technical depth is more lower level than you will need for the exam but it provides a very good introduction to a novice about the whole machine learning landscape.

8. Dive into Visualisation

  1. The A Cloud Guru course provides more than enough knowledge to tackle any questions here.
  2. Again if you have the time there’s an excellent data science course on Udemy which has a data visualisation chapter that would prove useful here.

 

Exam prep

It can’t be emphasised enough that AWS themselves provide amazing resources for learning. Definitely as preparation for the exam watch re:Invent videos and read AWS blogs & case studies.

 

Watch these videos:

  1. AWS re:Invent 2017: Big Data Architectural Patterns and Best Practices on AWS 
  2. AWS re:Invent 2017: Best Practices for Building a Data Lake in Amazon S3 and Amazon
  3. AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns  
  4. AWS Summit Series 2016 | Chicago – Deep Dive + Best Practices for Real-Time Streaming Applications 

 

Read these AWS blogs:

  1. Secure Amazon EMR with Encryption 
  2. Building a Near Real-Time Discovery Platform with AWS 

 

Whitepapers

  1. Streaming Data Solutions on AWS with Amazon Kinesis
  2. Big Data Analytics Options on AWS 
  3. Lambda Architecture for Batch and Real-Time Processing on AWS with Spark Streaming and Spark SQL 

 

All of the Big Data services developer guides.

 

One last note….

This exam will expect you to consider the question from many different perspectives. You’ll need to think about not just the technical feasibility of the solution presented but also the business value that can be created. The majority of questions are scenario specific and often there is more than one valid answer, look for subtle clues to determine which solution is more ‘correct’ than the others, e.g. whether speed is a factor or if the question expects you to answer from a cost perspective.

Finally, this exam is very long (3 hours) and requires a lot of reading. I found that the time given was more than enough but remember to pace yourself otherwise you can get burned out quite easily.

Hopefully my experience and tips will have helped in preparation for the exam. Let us know if they helped you. 

Good Luck!!!

Visit our services to explore how we enable organisations to transform their internal cultures, to make it easier for teams to collaborate, and adopt practices such as Continuous Integration, Continuous Delivery, and Continuous Testing. 

ECS DigitalWhy you should invest in AWS Big Data & 8 steps to becoming certified
read more
Our first DevOps Playground in Edinburgh – Hands on with AWS Fargate

Our first DevOps Playground in Edinburgh – Hands on with AWS Fargate

No comments

On the 27th February, we had the privilege of launching ECS Digital’s very first DevOps Playground in Edinburgh. Such an occasion called for a special topic, and this time we focused on AWS Fargate, a brand-new and often misunderstood technology from Amazon Web Services.

For those who don’t know, DevOps Playground is a small meetup run by ECS-Digital consultants to give people a hands-on practical introduction to DevOps technologies and help to evangelise them.

 

So, what is Fargate?

Picture1-12.png

Amazon Web Services defines it as a ‘technology that allows you to run containers without managing the underlying instances’ – this means that by using Fargate as launch type, your container will be downloaded and launched in some physical place inside the AWS datacenters, away from your direct management.

The Playground

As is tradition at DevOps Playground, we prepared a hands-on example to give the audience a taste of the functionality of this technology. There was a brief introduction to AWS Elastic Container Services, with an explanation on the logical division between Clusters, Services and Tasks. Then we proceeded to create the Cluster, register the container specifications (Task Definitions), and finally run a service. A single instance of the famous Ghost blog engine.

The foundation that makes all of this possible is the new networking mode called awsvpc, which allows us to attach an Elastic Network Interface to a container, rather than to an EC2 Instance. This gives us direct access to container services, rather than the underlying hosts, in turn making Fargate possible.

Limitations

Of course, since Fargate is pretty new, there are some limitations on it’s usage. Amongst the most important are the following:

  • Currently, it’s available only in one region: us-east-1. This is because it has limited availability at the moment and us-east-1 is considered the experimental region.
  • It is not possible to attach persistent EBS volumes, directly to the container for now .
  • Limits of 20 Fargate services, per account, per region
  • Limits of 20 public IP addresses per Fargate services, per account, per region.

Reception

Despite the heavy snowstorm, we had a good turnout with many questions in between the presentation and after that led discussions regarding containers, service management and related problems. It was great to talk to passionate engineers and gain an insight into the tech environment of Edinburgh.

All in all, it was a fantastic experience and I feel that there will be a lot of potential in the Scottish capital for this new technology in the future.

Looking forward to the next DevOps Playground in Edinburgh.

Resources

In case you want to play with Fargate, We publicly released the walkthrough we executed at the meetup. You can find it here, in our public Github repository.

For more information about DevOps Playground in your city click below:

Interested in attending one of our DevOps Playground events? Follow up on Meetup to recieve a notification about the next event – Join us!

Enzo RivelloOur first DevOps Playground in Edinburgh – Hands on with AWS Fargate
read more
5 Common AWS EC2 Challenges – and How to Tackle Them!

5 Common AWS EC2 Challenges – and How to Tackle Them!

No comments

The Cloud revolution is well and truly underway, as demonstrated by the 44,000 attendees at the recent Amazon Web Services (AWS) re:invent conference. Many businesses have adopted a Cloud provider like AWS in some form or another.

AWS’s Elastic Compute Cloud (EC2) has been a service offering since 2006. It allows users to launch virtual computing environments on demand. EC2 is one of over 100 services from AWS, and each provides incredible value in your business’ journey to the Cloud.

Many businesses will be in the early stages of their Cloud adoption journey. We have seen some really successful transitions and some…not so great transitions. This post will explore some of the challenges with AWS EC2, and how they can be solved.

Challenges with AWS EC2

Resource Utilisation

Challenge: AWS EC2 makes it easy for businesses to scale. EC2 gives you complete control over your instances, with a range of instance types at your disposal. The challenge in these cases is how you manage the number of instances you have, so that costs aren’t impacted by large, long-running instances.

Solution(s):

  • Limit the number of acceptable instances, using Infrastructure as Code tools such as AWS’ CloudFormation or Hashicorp’s Terraform as a provisioning strategy. This will help display resource graphs to gain a further insight into your infrastructure.
  • Understand the type of instances you require. AWS has four payment options for instances: on-demand, reserved, spot and dedicated. These will significantly reduce cost and help a business understand if EC2 is being used in the right way. For example, if all your instances are dedicated, you are most likely not leveraging the true benefits of the Cloud.
  • Use AWS CloudWatch to detect and shut down idle instances. This will remove any long running instances that are not used, and ensure the environment is not cluttered.

Security: 

Challenge: Whilst EC2 places importance on security, many organisations still face challenges when ensuring that instances are running securely. What happens when you have an instance that is public-facing? Who has access, and how is this monitored?

Solution(s):

  • Use AWS CloudTrail. This will track all user and API usage. This, as a minimum, will help toward auditing and begin to satisfy compliance controls.
  • Create rules that restrict misconfigured instances, such as allowing for Public IPs. These could be integrated into your CloudFormation or Terraform
  • Use Amazon GuardDuty to monitor your AWS accounts and workloads. This uses intelligent threat detection to determine any malicious activity, and can take action with automated remediation.

Deploying at Scale

Challenge: Running hundreds (or even thousands) of instances can result in unmanageable and cluttered environments. This can make it difficult to determine who owns which instance, which regions are using it and what it’s being used for.

Solution(s):

  • As your business scales, separate it into different AWS accounts, to maintain control. AWS Organisations will enable policy-based management for these separate accounts.
  • Use CloudFormation or Terraform to enforce a tagging strategy for the separation of environments, applications, business units and more.

Configuration Management

Challenge: Businesses will use some of the default Amazon Machine Images (AMIs) provided by AWS. However, as adoption matures, many find that custom configurations need – such as additional users and patching – need to be made.

Solution(s):

  • Create a process to manage the lifecycle of your AMI, using the default AMIs. Then, use Hashicorp’s Packer, to make further changes to the image.
  • Use Cloud Init to handle the early initialisation of an instance. This, along with other config management tools such as Puppet and Ansible, can be used to make custom changes.

Serverless

Challenge: Managing EC2 instances! What if we could deploy code without worrying about the instances it has to get deployed to?

Solution: Use AWS Lambda. AWS Lambda lets you run code without provisioning or managing servers. With Lambda, you can run code for virtually any type of application or back-end service. Just upload your code, and Lambda will take care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services, or call it directly from any web or mobile app. The learning curve for Lambda can be steep, but once you have passed that barrier, you will never look at code deployment in the same way again.

How to best adopt AWS Cloud

Many of the solutions we’ve mentioned in this post are tools available within AWS. When starting your Cloud journey, it’s important to understand these resources (and the many more) that are available in AWS, to ensure successful implementation.

Some additional general practices that should be considered when adopting any Cloud, include:

Cost Management

Using tools like AWS Cost explorer will enable you to see patterns and trends in your spend over time, that can help you understand Cloud costs. This data can then be used to forecast Cloud costs over the next quarter, which can be used to set budgets for your Cloud spend. Tools like AWS budgets can alert businesses when costs or usage are forecasted to exceed, and provide oversight over where overspend is occurring.

Build with fault tolerance in mind

Nowadays, companies who don’t achieve 99.99% uptime are in risk of grave loss of both business and client trust, simply because they’re not available (usually in a high-traffic periods). Tools such as AWS S3 guarantee 99.99% uptime for your static assets, while services like RDS and CloudFront are designed with failover in mind, to provide HA and data availability at any given time. AWS also publishes regular whitepapers that illustrate how to architect and build resilient applications, helping businesses to decrease infrastructure and ownership costs.

There will always be challenges and learnings involved in the adoption of new technology. To make sure your Cloud migration is as effective, efficient and valuable as possible, it’s important to consider potential challenges and solutions of Cloud configuration, migration and management, before you adopt. Being aware of the challenges your business might experience using AWS or any other Cloud platform will enable you to tackle any issues and recover much more quickly.

If you’re experiencing issues with your AWS implementation, or Cloud infrastructure in general, please get in touch. ECS Digital offers a Cloud Health Assessment to help businesses realise the potential of Cloud and ensure their applications are truly native in the Cloud.

Thivan Visvanathan5 Common AWS EC2 Challenges – and How to Tackle Them!
read more
AWS reveals Managed Kubernetes: EKS

AWS reveals Managed Kubernetes: EKS

No comments

There were many product announcements at the AWS re:Invent 2017 conference in November that have got the team at ECS excited, particularly in the compute space.

As announced by Andy Jassy, during his re:Invent keynote, the goal for AWS is to create a platform that provides everything builders require. Enabling services, platforms and tooling that can be utilised effectively and securely within an enterprise environment. Werner Vogels, CTO at Amazon, expanded on this concept in his keynote speech a day later, when discussing building a platform that not only helps businesses achieve their goals today, but enables them to build for 2020.

With that in mind, AWS launched Elastic Container Service for Kubernetes (EKS), “a managed service that makes it easy for you to run Kubernetes on AWS without needing to install and operate your own Kubernetes clusters“.

This long awaited move to realign the cloud colossus with other services providers (Azure and GCP), who already provide native support for this technology, is fully compatible with the existing AWS ecosystem such as:

  • Fully managed user authentication to the K8S masters through IAM
  • Restricted access through the newly revealed PrivateLink
  • Native AZ Cluster distribution to provide High Availability

You can read more about how to use this service in the following blog post, produced by Jeff Barr, Chief Evangelist at AWS.

So what do these new developments mean for our customers? Why was this solution sought after, even when AWS launched its own container solution ECS?

To answer this question, we need to step back and understand what Kubernetes is and its role in the modern containerisation scene.

Kubernetes takes its name from a greek word that means “”Helmsman”, and is a Container Scheduler that can be better defined as an “Operative System for clusters”.

It was first released in 2014, when Google released an open source version of their own internal scheduler, Borg. In the past 3 years it has gained huge momentum thanks to an active community directly involved in its roadmap. It’s designed with stability and high-availability in mind, removing the complexity of managing an entire cluster at a unique endpoint.

Even with all this support, managing large clusters can be complex and challenging. Problems like missing system containers and failing correct scheduling are real, and can introduce fallacy into a critical microservice, which in turn can cause downtime across the entire service.

On this matter, AWS recognised that managing production workloads  “is not for faint of heart”, with many moving pieces contributing to its unpredictability.

EKS is a total managed solution: you decide the number of nodes, autoscaling rules, instance type, access policies and AWS will think of the rest. No need to worry about scalability or accessibility. Want more machines? Just add them to the clusters! Want to access them via command line? Just use kubectl!

Kubernetes in the Financial Sector

Amazon calculated that approximately 66% of the world’s Kubernetes workload runs on AWS. Amongst them, new banking companies like Monzo, who are using and massively contributing to this technology, enabling them to scale and grow much faster than the competition.

Bearing in mind the successes that the challenger banks have had with microservices and containerisation, Fintech companies will have enormous benefits leveraging the structured and resilient architecture of Kubernetes, paired with the ease of management and scalability offered by EKS.

If you’d like to find out more about how you can leverage these services in the Cloud please contact our experts today.

ECS DigitalAWS reveals Managed Kubernetes: EKS
read more
DevOps Playground Meetup #6: Hands on with HashiCorp’s Terraform

DevOps Playground Meetup #6: Hands on with HashiCorp’s Terraform

No comments

A successful sixth meetup!

This Tuesday, we hosted our sixth monthly #DevOpsPlayground meetup. It was a successful evening, attended by many.

These meetups allow us to explore and present DevOps tools – as well as providing others with the opportunity to give them a try.

This month, Mourad Trabelsi talked about HashiCorp’s Terraform.

Terraform

1.pngHashicorp’s Terraform allows you to write your infrastructure as code.

Writing configuration files and the running Terraform apply allows you to easily spin up new infrastructure. You can do this using multiple providers, including AWSDigitalOceanDocker and many more.

You can then provision them if needed.

Hands on!

During this meetup, Mourad guided us through creating a configuration file to create two webservers using one security group, then a load balancer in front of these two webservers, using its own security group, all of that in AWS.

Schema of the final infrastructure:

2.png

You can find a walkthrough of the technical steps on our GitHub page, here.

 

A big thank you to everyone who participated in this meetup.

We hope to see you all again in the next one!

Michel LebeauDevOps Playground Meetup #6: Hands on with HashiCorp’s Terraform
read more