FeatherCast

The voice of The Apache Software Foundation

Apache NetBeans – Shameless Marketing Tool Anton Epple

September 13, 2019
timothyarthur

NetBeans has completed its transition to Apache and is now a top level Apache project with a strong and dedicated community and millions of users worldwide. NetBeans always had great support for Apache Maven & having an IDE of our own is a great chance to further promote Apache projects to a large audience. In this session I’ll show you how to plugin your own language, tool, library, server, database or framework and make it easy for developers to get started with them. Use NetBeans as a marketing tool to shamelessly plug your own cool project.

Apache Training – Contributing more than just code Justin Mclean

September 13, 2019
timothyarthur

Does it seem strange to you that we collectively collaborate on code but training material is produced individually in private? Why would each company or person produce their own material when it can be sourced from a central location, under a business friendly license, and built on and modified? Or perhaps you just see better ways of producing content, then come along and listen to what the Apache Training project is doing. You’ll find out how make nice presentations with simple markup that can be put under version control and exported to many formats.

How to Raise an Erudite Chatbot Boris Galitsky Jay Taylor

September 13, 2019
timothyarthur

Availability of content and training sets is a major bottleneck for a chatbot development today. Relying on Apache OpenNLP and its sub-project OpenNLP.chatbot, we introduce a number of tools and components to design a chatbot and its training set to be knowledgeable and intelligent. n In this talk we will analyze the reasons it is so hard to find a chatbot demo today for a nontrivial task or to observe an intelligent behavior of a chatbot. It is easy to see how a success in AI can boost the chatbot development on one hand, but it is hard to detect intelligence in those chatbots that are available to the public, on the other hand. n We will present an advanced search engine for chatbots with the focus on linguistic features and discourse-level analysis for dialogue management. We will introduce a tool that builds a dialogue from an arbitrary document to form a training dataset for deep learning chatbots. We will demo a chatbot supporting virtual dialogue, where a user joins a virtual community built on the fly, whose members answer questions in this user’s current area of interest. An extended content for this talk is available in the book recently published by the speaker “Developing Enterprise Chatbots”.

Continuous Machine and Deep Learning at Scale with Apache Ignite Denis Magda

September 13, 2019
timothyarthur

With most machine learning (ML) and deep learning (DL) frameworks, it can take hours to move data, and hours to train models. It’s also hard to scale, with data sets increasingly being larger than the capacity of any single server. The size of the data also makes it hard to incrementally test and retrain models in near real-time to improve results. Learn how Apache Ignite and GridGain help to address these limitations with model training and execution, and help achieve near-real-time, continuous learning. It will be explained how ML/DL work with Apache Ignite, and how to get started. Topics include:n n— Overview of distributed ML/DL including design, implementation, usage patterns, pros and consn— Overview of Apache Ignite ML/DL, including prebuilt ML/DL, and how to add your own ML/DL algorithmsn— Model execution with Apache Ignite, including how to build models with Apache Spark and deploy them in Igniten— How Apache Ignite and TensorFlow can be used together to build distributed DL model training and execution

 

Cassandra 4.0: Our Most Stable Major Release Jordan West

September 13, 2019
timothyarthur

In mid-2018 the Cassandra community committed to making Cassandra 4.0 the most stable major release of Cassandra in the project’s history. In September, the community shifted focus from feature work and development to ensuring the quality of the release. To this end, we have adopted several new approaches to testing and validation including the replaying of production traffic, code audits, and property-based testing. This talk will explore the methodologies we’ve adopted and the results of their application as well as costs of this level of commitment to testing and its benefits.

Apache Cassandra Sidecar, let’s make C* attractive and easy to operate Vinay Chella Dinesh Joshi

September 13, 2019
timothyarthur

Cloud database offerings have expanded over the past decade, encompassing everything from virtualized machines in the cloud to entirely serverless databases. With this pace of innovation in the cloud ecosystem, Cassandra stands in a unique position to serve its users with unique advantages over any other system. This also puts Cassandra in an interesting position to compete with the ongoing innovations in the cloud. With the internal architecture and storage mechanism aside, citizen developers in the community are looking for several other operability aspects/ ecosystem around the services for the long term investments and benefits from such service. At this juncture, it is vital for the Cassandra dev community to build muscle around supporting such an ecosystem that the community is looking for, to be on par with rest of cloud services and other competing offers in the industry. Specifically, it is important for us to innovate and focus on improving these areas along with the rest of the product.
– Ease of use
– Simplified operability in the cloud, precisely simplified operability in the hybrid/multi-cloud
– Pluggability with other infrastructure services such as metrics, discovery, and monitoring
– Painless rollouts of version and protocol upgrades
– Elegant developer experience in polyglot environments
– Dev education of Cassandra best practices
– Unified access across the complex systems
As Cassandra stands today, operating the database requires either considerable labor, complex automation, or both. Some of this complexity is an unavoidable result of operating a distributed system, but much of it is operational complexity stemming from properties of C* itself. As a result of these complexities, C* operators spend too much time dealing with issues that the database should solve on its own, and are unable to reap the full benefit of Cassandra’s powerful distributed data model. As part of this talk, we will focus on how to address the aforementioned challenges to keep the Cassandra competitive within the cloud offerings and database services industry, with simplified operability and elegant developer and operator experience. We also hope to get Apache Cassandra 4.0 up and running in any cloud without a hassle with the help of the sidecar. We hope to leave a thought in the Cassandra dev community to start thinking about these areas in upcoming releases of Cassandra.

Reduce your Storage Costs with Transient Replication and Cheap Quorums Blake Eggleston

September 13, 2019
timothyarthur

nIn eventually consistent systems, when a node failures or network partition occurs, we’re presented with a trade-off: to execute a request and sacrifice consistency or reject execution and sacrifice availability. In such system, quorums, overlapping node subsets guaranteeing at least one node to hold the most recent value, can be a good middle-ground. We can tolerate failures and loss of connectivity for some nodes while still serving latest results. Quorum-based replication schemes incur high storage costs: we have to store redundant values on several nodes to guarantee enough copies are going to be available in case of failure. It turns out that we do not have to store data on each replica. We can reduce storage and compute resources by storing the data only a subset of nodes, and only use the other nodes (Transient Replicas), for redundancy in failure scenarios. In this talk, we discuss Witness Replicas, a replication scheme used in Spanner and Megastore, and Apache Cassandra implementation of this concept, called Transient Replication and Cheap Quorums. (edited)

Blog at WordPress.com.