UMBC Training Centers logo

Apache Accumulo for Developers

 

Course Description | Course Outline | Hadoop & Accumulo Training | IT Training

1. Accumulo Background

  • A history of NOSQL
  • A survey of lightly structured stores
  • Design drivers for Accumulo

2. Installation and startup (hands-on)

  • Environment setup
  • Basic Accumulo configuration
  • Running process control scripts
  • Using Accumulo administrative tools (shell and monitor)

3. Overview of Accumulo Architecture

  • Defining the sorted Key/Value space
  • Range selection and filtering
  • Table/Tablet organization
  • Processes and inter-process communication
  • Control and data flow for read and write operations

4. API Introduction (hands-on)

  • Keys, Values, and Mutations
  • Instances and Connectors
  • BatchWriter, Scanner, BatchScanner

5. Application Design (hands-on)

  • Diagramming table schemas / flexible schemas
  • Basic indexing theory
  • Information retrieval design patterns
  • Joins and pre-joins

6. Advanced Topics

  • Hadoop ecosystem integration
  • Relational operations on Accumulo
  • Iterators

7. Advanced API

  • Iterators
  • Constraints
  • Bulk load
  • ACID/BASE semantics

8. Cell-level security

  • Defining domain-specific authorizations
  • Trust boundaries

9. Partition Management

  • Column- and Row-orientation
  • Row schemas
  • Locality groups

10. Information retrieval

  • Joins
  • Document-distributed indexes
  • Partitioned joins with the IntersectingIterator

11. Statistics

  • OLAP cube shells
  • Query-time vs. compaction-time aggregation

12. Additional Applications

  • Graph search
  • Machine learning
  • Geohashing

13. Relationship to relational databases

  • Data definition languages
  • Pre-joined indexes

14. Custom iterators

  • The SortedKeyValueIterator Interface
  • Filters and Combiners
  • Lookup Iterators and seeking

15. Performance optimization

  • Hot spots and bottlenecks
  • Managing parallelism
  • Troubleshooting