Does it seem strange to you that we collectively collaborate on code but training material is produced individually in private? Why would each company or person produce their own material when it can be sourced from a central location, under a business friendly license, and built on and modified? Or perhaps you just see better ways of producing content, then come along and listen to what the Apache Training project is doing. You’ll find out how make nice presentations with simple markup that can be put under version control and exported to many formats.
Availability of content and training sets is a major bottleneck for a chatbot development today. Relying on Apache OpenNLP and its sub-project OpenNLP.chatbot, we introduce a number of tools and components to design a chatbot and its training set to be knowledgeable and intelligent. n In this talk we will analyze the reasons it is so hard to find a chatbot demo today for a nontrivial task or to observe an intelligent behavior of a chatbot. It is easy to see how a success in AI can boost the chatbot development on one hand, but it is hard to detect intelligence in those chatbots that are available to the public, on the other hand. n We will present an advanced search engine for chatbots with the focus on linguistic features and discourse-level analysis for dialogue management. We will introduce a tool that builds a dialogue from an arbitrary document to form a training dataset for deep learning chatbots. We will demo a chatbot supporting virtual dialogue, where a user joins a virtual community built on the fly, whose members answer questions in this user’s current area of interest. An extended content for this talk is available in the book recently published by the speaker “Developing Enterprise Chatbots”.
With most machine learning (ML) and deep learning (DL) frameworks, it can take hours to move data, and hours to train models. It’s also hard to scale, with data sets increasingly being larger than the capacity of any single server. The size of the data also makes it hard to incrementally test and retrain models in near real-time to improve results. Learn how Apache Ignite and GridGain help to address these limitations with model training and execution, and help achieve near-real-time, continuous learning. It will be explained how ML/DL work with Apache Ignite, and how to get started. Topics include:n n— Overview of distributed ML/DL including design, implementation, usage patterns, pros and consn— Overview of Apache Ignite ML/DL, including prebuilt ML/DL, and how to add your own ML/DL algorithmsn— Model execution with Apache Ignite, including how to build models with Apache Spark and deploy them in Igniten— How Apache Ignite and TensorFlow can be used together to build distributed DL model training and execution
In mid-2018 the Cassandra community committed to making Cassandra 4.0 the most stable major release of Cassandra in the project’s history. In September, the community shifted focus from feature work and development to ensuring the quality of the release. To this end, we have adopted several new approaches to testing and validation including the replaying of production traffic, code audits, and property-based testing. This talk will explore the methodologies we’ve adopted and the results of their application as well as costs of this level of commitment to testing and its benefits.
Cloud database offerings have expanded over the past decade, encompassing everything from virtualized machines in the cloud to entirely serverless databases. With this pace of innovation in the cloud ecosystem, Cassandra stands in a unique position to serve its users with unique advantages over any other system. This also puts Cassandra in an interesting position to compete with the ongoing innovations in the cloud. With the internal architecture and storage mechanism aside, citizen developers in the community are looking for several other operability aspects/ ecosystem around the services for the long term investments and benefits from such service. At this juncture, it is vital for the Cassandra dev community to build muscle around supporting such an ecosystem that the community is looking for, to be on par with rest of cloud services and other competing offers in the industry. Specifically, it is important for us to innovate and focus on improving these areas along with the rest of the product.
– Ease of use
– Simplified operability in the cloud, precisely simplified operability in the hybrid/multi-cloud
– Pluggability with other infrastructure services such as metrics, discovery, and monitoring
– Painless rollouts of version and protocol upgrades
– Elegant developer experience in polyglot environments
– Dev education of Cassandra best practices
– Unified access across the complex systems
As Cassandra stands today, operating the database requires either considerable labor, complex automation, or both. Some of this complexity is an unavoidable result of operating a distributed system, but much of it is operational complexity stemming from properties of C* itself. As a result of these complexities, C* operators spend too much time dealing with issues that the database should solve on its own, and are unable to reap the full benefit of Cassandra’s powerful distributed data model. As part of this talk, we will focus on how to address the aforementioned challenges to keep the Cassandra competitive within the cloud offerings and database services industry, with simplified operability and elegant developer and operator experience. We also hope to get Apache Cassandra 4.0 up and running in any cloud without a hassle with the help of the sidecar. We hope to leave a thought in the Cassandra dev community to start thinking about these areas in upcoming releases of Cassandra.
nIn eventually consistent systems, when a node failures or network partition occurs, we’re presented with a trade-off: to execute a request and sacrifice consistency or reject execution and sacrifice availability. In such system, quorums, overlapping node subsets guaranteeing at least one node to hold the most recent value, can be a good middle-ground. We can tolerate failures and loss of connectivity for some nodes while still serving latest results. Quorum-based replication schemes incur high storage costs: we have to store redundant values on several nodes to guarantee enough copies are going to be available in case of failure. It turns out that we do not have to store data on each replica. We can reduce storage and compute resources by storing the data only a subset of nodes, and only use the other nodes (Transient Replicas), for redundancy in failure scenarios. In this talk, we discuss Witness Replicas, a replication scheme used in Spanner and Megastore, and Apache Cassandra implementation of this concept, called Transient Replication and Cheap Quorums. (edited)
Apache Kafka, Apache Cassandra and Kubernetes are open source big data technologies enabling applications and business operations to scale massively and rapidly. While Kafka and Cassandra underpins the data layer of the stack providing capability to stream, disseminate, store and retrieve data at very low latency, Kubernetes is a container orchestration technology that helps in automated application deployment and scaling of application clusters. In this presentation, we will reveal how we architected a massive scale deployment of a streaming data pipeline with Kafka and Cassandra to cater to an example Anomaly detection application running on a Kubernetes cluster and generating and processing massive amount of events. Anomaly detection is a method used to detect unusual events in an event stream. It is widely used in a range of applications such as financial fraud detection, security, threat detection, website user analytics, sensors, IoT, system health monitoring, etc. When such applications operate at massive scale generating millions or billions of events, they impose significant computational, performance and scalability challenges to anomaly detection algorithms and data layer technologies. We will demonstrate the scalability, performance and cost effectiveness of Apache Kafka, Cassandra and Kubernetes, with results from our experiments allowing the Anomaly detection application to scale to 19 Billion anomaly checks per day.
This presentation will describe the initial experience building and using CassKop, an operator developed for running Cassandra on top of Kubernetes. CassKop works as a usual K8S controller (reconcile the real state with a desired state) and automates the Cassandra operations through JMX. All the operations are launched by calling standard K8S APIs (kubectl apply …) or by using a K8S plugin (kubectl casskop …). CassKop is developed in GO, based on CoreOS operator-sdk framework.nAmong the main features :n- deploying a rack aware cluster (or AZ aware cluster)n- scaling up & down (including cleanups)n- setting and modifying configuration parameters (C* and JVM parameters)n- adding / removing a datacenter in Cassandran- rebuilding nodesn- removing node or replacing node (in case of hardware failure)n- upgrading C* or Java versions (including upgradesstables)n- monitoring (using Prometheus/Grafana)n- … By using local and persistent volumes, it is possible to handle failures or stop/start nodes for maintenance operations with no transfer of data between nodes. Moreover, cassandra-reaper is deployed in K8S and used for scheduling repair sessions (Thanks to Spotify and TheLastPickle teams)nThe Cassandra exporter for Prometheus and backup/restore developed by Instaclustr are also used (Thanks to Instaclustr team) During this session, we will delve into the architecture and implementation of this operator, and share our learnings.
Netflix relies on Apache Cassandra as a critical source of truth database, and while Cassandra is a remarkably resilient database, it does, ever so occasionally, break. This talk explores how complex Cassandra deployments fail in production, but more importantly, the techniques, tools, and approaches our distributed systems engineers use to debug and mitigate these failures. We will first cover software-based failure modes that come either from our software or the software that Cassandra builds upon. For example, retry storms or unbounded queues can effectively overwhelm the database. Cassandra and Linux give you many tools to detect these, and there are numerous strategies you can use to avoid them. Along the way we will also visit common JVM failure modes and how to assess and remediate these. Next, we will cover hardware-based failure modes that come from a typical cloud environment. We will cover the broad classes of drive and network failures that we cope with every day, as well as the metrics and tests we use to detect them. With these understandings, we will learn how to automatically heal these failures and safely meet our latency SLOs. Finally, we will cover systemic failure modes, including complex interactions of multiple components and systems. This will involve a number of concrete failures we observed that involved multiple bugs, system failures, or data modeling issues in combination. At the end of the talk, we hope that the audience leaves understanding how large distributed databases can fail, but also how to use DevOps skills and tools to debug, mitigate, and automate away these failure modes going forward.
We’ve started to provide our stream based data pipeline by using Google Cloud Dataflow and Apache Beam since 2018 fall. It collects event logs from microservices running on GKE, then transforms and forwards the logs to GCS and BigQuery to use for analytics, etc. As you know, implementing and operating streaming jobs are challenging. We’re encountered various issues during that time.
I’d like to share our knowledge on development and operation perspective. There are 3 topics in the development part, 1) Implementing stream jobs with using spotify/scio, 2) How to debug the jobs, especially having DynamicDestination, 3) How to load testing, to ensure our jobs stable. And topics in the next part, 1) How to deploy new jobs safely(with avoiding data loss), 2) How to monitor the jobs and surrounding systems, and misc.