Análise de Segurança e Desempenho na Detecção de Intrusão em Redes de Automação de Distribuição de Água – por Silvio Rocha da Silva

A convergência para sistemas abertos Ethernet-TCP/IP possibilitou o acesso às redes de automação e aos sistemas SCADA, por meio das redes corporativas ou internet, expondo-as às novas ameaças de segurança. As infraestruturas críticas no setor de distribuição de água necessitam de métodos de proteção para minimizar os riscos à saúde pública e ao meio ambiente, causados por ataques ou vulnerabilidades na rede de automação, mas que não impactem no desempenho da rede e dos sistemas. O artigo propôs, desenvolveu e validou o desempenho de um mecanismo de segurança para detectar intrusão, em tempo real, considerando propriedades de disponibilidade e tempo de resposta.

Uso de Qualidade de serviço em uma rede de dados em um sistema supervisório – por Luciano Santos de Lima

Análise e uso de qualidade de serviço em uma rede de automação com uso de um sistema supervisório (SCADA) em um ambiente de transporte considerando a analise de tráfego e rede de automação de um sistema supervisório (SCADA) assim como, propor o uso de qualidade de serviço em uma rede de automação adaptando a RFC 4594. Características de volume de tráfego, latência da rede e das aplicações são avaliadas neste trabalho

Making Facebook Self-Healing

When your infrastructure is the size of Facebook’s, there are always broken servers and pieces of software that have gone down or are generally misbehaving. In most cases, our systems are engineered such that these issues cause little or no impact to people using the site. But sometimes small outages can become bigger outages, causing errors or poor performance on the site. If a piece of broken software or hardware does impact the site, then it’s important that we fix it or replace it as quickly as possible. Even if it’s not causing issues for users yet, it could in the future so we need to take care of it quickly.

Facebook’s Site Reliability team is dedicated to keeping the site up and fast and stable. We handle everything from the smallest outages on individual servers to the largest outages across the entire site. When I joined the Site Reliability team a couple of years ago, it was clear that the infrastructure was growing too fast for us to be able to handle small repetitive outages manually. We had to find an automated way to handle these sorts of issues so that the human engineers could focus on solving and preventing the larger, more complex outages. So, I started writing scripts when I had time to automate the fixes for various types of broken servers and pieces of software.

Introducing FBAR (Not FUBAR)

Over time, I developed the scripts more and more. As they got better, they saved me more time, which I used to continue improving them. Eventually my team started benefiting from my scripts enough that I was asked to work on them full time. I separated out the common parts into generic APIs that model our infrastructure and I turned the rest into remediation modules that use these APIs to implement the business logic for individual components of the Facebook back end. Then I wrote a daemonized service that executes workflows comprising these remediation plugins to handle outages detected by our monitoring system. I named the whole system “Facebook Auto-Remediation” or “FBAR” for short. (I originally wanted to name it “FUBAR”, but I couldn’t come up with anything good for the “U” to stand for, so “FBAR” it is.)

To understand how FBAR works, let’s look at what happens when an individual server goes down. Imagine a hard drive goes bad on one of our Web servers. First, the monitoring system will detect the failed hardware and report this outage as an “alert”. FBAR’s “Alert Fetch Loop” runs continuously in the background querying the monitoring system to find new alerts. When it finds alerts, it processes them and calculates appropriate workflows to execute to handle the outages. The workflows get placed on a job queue for the FBAR Job Engine to execute.

FBAR Alert Fetch Loop

The FBAR Job Engine will then pull the job for this server off of the job queue and begin executing remediation plugins in precedence order. Each plugin is written against the FBAR API. This API gives the plugin access to hardware and configuration data about the host and to the alert that describes the detected outage. The API also provides access to power control, command execution on the host and to the host’s entries in our site-wide service configuration database.


When the job runs on our hypothetical Web server, the first remediation plugin would verify that the machine has damaged hardware, classify the failure type as hard_drive, then return that data to FBAR. At this point the workflow would branch. Rather than moving on to handle the next outage (like SSH or HTTP), FBAR would execute the plugin to remove the Web server from production service and then flag the machine as needing a part replacement.

Remediation Workflow

When the data center technician has replaced the bad drive on the machine, they would flag the machine as repaired. At this point, FBAR again takes control of the machine and verifies that it is ready for production service and re-enables it. The only human interaction with the machine is when a person replaces the physical hard drive. The rest of the process happens automatically without any manual intervention.

Automating the Work of Hundreds

Today, the FBAR service is developed and maintained by two full time engineers, but according to the most recent metrics, it’s doing the work of approximately 200 full time system administrators. FBAR now manages more than 50% of the Facebook infrastructure and we’ve found that services have dramatic increases in reliability when they go under FBAR control. Recently, we’ve opened up development of remediation plugins to other teams working on Facebook’s back end services so they can implement their service-specific business logic. As these teams write their own remediation plugins, we’re expanding FBAR coverage to more and more of the infrastructure. This is making the site more and more reliable for end users while reducing the workload of the supporting engineers.

Facebook is an amazing place to work for many reasons but I think my favorite part of the job is that engineers like me are encouraged to come up with our own ideas and implement them. Management here is very technical and there is very little bureaucracy, so when someone builds something that works, it gets adopted quickly. Even though Facebook is one of the biggest websites in the world it still feels like a start-up work environment because there’s so much room for individual employees to have a huge impact.

Like building infrastructure? Facebook is hiring infrastructure engineers. Apply here.

Patrick is a software engineer at Facebook.

Amazon Web Services Moves into New Territory…Again – ReadWriteCloud

This post is part of our ReadWriteCloud channel, which is dedicated to covering virtualization and cloud computing. The channel is sponsored by Intel and VMware.

Michelin Man in the Sky

Amazon Web Services is on an aggressive development cycle. Its latest announcement comes today with what it calls AWS CloudFormation, a service that Amazon’s Jeff Barr describes in a manner that makes it feel quite similar to cloud management technologies such as Puppet and Chef.

With AWS CloudFormation, developers can create their own templates for provisioning the resources needed for their applications. Barr’s descriptions show how far the “recipe” metaphor has spread through the cloud computing world. It’s related to Chef, which serves as a configuration environment. He tells a story about how much cooking is done in his house and the need to be precise in measurements when baking. The recipe has to be just right.

In this case, the recipe automates the creation of the stack for the developer.Christopher Peter replied to my question about how this can integrate with WordPress: “…I was referring to the example template. I like the programmatic approach to convert manual setup into 1 efficient command.”

That’s exactly it: AWS is programmable. Now it’s becoming automated to some extent, too.


To date, many people have used AWS in what we’ll have to think of as cooking mode. They launch some instances, assign some Elastic IP addresses, create some message queues, and so forth. Sometimes this is semi-automated with scripts or templates, and sometimes it is a manual process. As overall system complexity grows, launching the right combination of AMIs, assigning them to roles, dealing with error conditions, and getting all the moving parts into the proper positions becomes more and more challenging.

The mechanisms allows the developer to describe what resources are required. AWS CloudFormation then configures the setup accordingly.


Using CloudFormation, you can create an entire stack with one function call. The stack can be comprised of multiple Amazon EC2 instances, each one fully decked out with security groups, EBS (Elastic Block Store) volumes, and an Elastic IP address (if needed). The stack can contain Load Balancers, Auto Scaling Groups, RDS (Relational Database Service) Database Instances and security groups, SNS (Simple Notification Service) topics and subscriptions, Amazon CloudWatch alarms, Amazon SQS (Simple Queuue Service) message queues, and Amazon SimpleDB domains. Here’s a diagram of the entire process:


Barr goes into some depth about how these recipes work. The conversation on Twitter was buzzing on the topic. Christian Reilly (@reilly) tried the service this morning and he was impressed with it. He said it is a solid end-to-end service from virtual machine to app. It’s free and in part for that reason it could be a killer in the market as the lines blur in terms of the differentiation between it, Chef and other services.

But after just a day, it is already spawning integrations with Chef. A post on Hacker News details how this can be done.

Reaction has been generally positive.


AWS has been on a tear lately. Last week it announced hosting for static websites. It launched Elastic Beanstalk, a PaaS environment earlier in the month. These are all new efforts.

But there are lots of other movements in the market that AWS has to be watching. OpenStack and hosting providers are starting to build their own cloud environments.

In any case, these are fast moving times. I wonder what it will be from AWS next week?

blog comments powered by Disqus