John Hiller executive Sales & Technology Consultant

John Hiller executive Sales & Technology ConsultantJohn Hiller executive Sales & Technology ConsultantJohn Hiller executive Sales & Technology Consultant
  • Home
  • Selling
  • Supercomputing AI
  • Services
  • References
  • Sales Barriers & Answers
  • Sales Engine
  • Experience
  • Supercomputing Engine
  • Sales Brochure
  • John Hiller Capabilities
  • More
    • Home
    • Selling
    • Supercomputing AI
    • Services
    • References
    • Sales Barriers & Answers
    • Sales Engine
    • Experience
    • Supercomputing Engine
    • Sales Brochure
    • John Hiller Capabilities

John Hiller executive Sales & Technology Consultant

John Hiller executive Sales & Technology ConsultantJohn Hiller executive Sales & Technology ConsultantJohn Hiller executive Sales & Technology Consultant
  • Home
  • Selling
  • Supercomputing AI
  • Services
  • References
  • Sales Barriers & Answers
  • Sales Engine
  • Experience
  • Supercomputing Engine
  • Sales Brochure
  • John Hiller Capabilities

SUPERCOMPUTER ARTIFICIAL INTELLIGENCE CONSULTING

Falcon ML Consulting Competitive Advantage for Investment Banks

Falcon ML Consulting Competitive Advantage for Investment Banks

Falcon ML Consulting Competitive Advantage for Investment Banks

  • Top Investment Banking
    Competitive Advantages Using Machine Learning (ML)  


  • Machine Learning (ML) provides a technological breakthrough for the Financial Industry that provides millisecond decision improvements that translate into more profitable decisions


  • Developing ML solutions for the Financial Industry is one of the riskiest, most costly and time-sensitive tasks facing this industry. 


  • Developing new ML applications for the Financial Industry & running these applications with high performance architectures requires experienced practitioners with stellar track records.

Competitive Success – Senior Executive Page 1

Falcon ML Consulting Competitive Advantage for Investment Banks

Falcon ML Consulting Competitive Advantage for Investment Banks

  • •Situation: CEO/President, large investment firm. 


  • Needs: Projected yearly growth revenue & profits of 30%.  Competitive advantage that improves trade decisions & accuracy. Decreases in R&D costs, risks and scheduling 


  • •Reason:  Electronic trading staff is not keeping up with new advances in supercomputer architecture software/hardware & artificial intelligence mathematics. Poor R/D planning & tracking & costs are high. Integration schedules are lengthy. Less funding available for innovation projects

Competitive Success – Senior Executive Page 2

Falcon ML Consulting Competitive Advantage for Investment Banks

Competitive Success – Senior Executive Page 2

  • FALCON ML Consulting Solution:


  • High Performance Computing (HPC) consulting company descendant of the Oryx Corporation


  • Oryx Corp. Founder: John Hiller


  • Dr. Narendra Ahuja – Artificial Intelligence & Computer Science (Azriel Rosenfeld PhD Advisor) – Student Advisees 60 PhD, 20 MS, 100 Undergrad Research, 14 postdocs, University of Illinois – Labs developed Computer Vision & Robotics Lab Beckman Institute (Research)


  • Oryx Corp. developed both commercial and militarized Supercomputer (HPC)


  • $19M Funded by Eastman Kodak, Venture Capital, Grumman Corp and Lockheed Martin


  • •One of the earliest developers/users of Graphical User Interface programing software for math


  • Used by DARPA for the first successful Autonomous Land Vehicle (AI)


  • Oryx Corp. Accomplishments:  


  • Designed/Completed/Tested a Crossbar-based Supercomputer within 18 months


  • Experience in Development/Testing rich library of Parallel math algorithms used in sonar/image/radar processing applications requiring high computational and memory bandwidth applicable to AI and ML


  • Linear Algebra


  • Neural Net


  • Vision Processing, Optimization

Competitive Success – Senior Executive Page 3

PAST: Oryx Funded by a $3.6M R/D Contract from Eastman Kodak Corporate Followed by $15M Venture – A

Competitive Success – Senior Executive Page 2

  • Results:


  • Rapid entry into the AI/ML market arena
  • Faster stock trades and better stock trade accuracy
  • Reduced risks in R/D
  • Reduced costs in R/D
  • Reduced schedules in R/D
      

PAST: Oryx Corp. Supercomputer Was Built & Tested Using Advanced Algorithms from Linpack, DARPA

PAST: Oryx Funded by a $3.6M R/D Contract from Eastman Kodak Corporate Followed by $15M Venture – A

PAST: Oryx Funded by a $3.6M R/D Contract from Eastman Kodak Corporate Followed by $15M Venture – A

  • Sponsors: Kodak, Grumman, General Electric, Martin Marietta, Unisys, Lockheed, Raytheon, etc.
  • Applications: Radar, sonar, image processing, electronic warfare, autonomous land vehicle, communications intercept, etc.
  • Founder: John Hiller (Founder, CEO, CTO)
  • Consultants: Dr. Ariel Rosenfeld (Father of Image Processing), U Maryland. Dr. Rosenfeld - lead architect of the DARPA benchmark suite – Hough Transform (finding straight lines in an image), Traveling Salesman, etc.
  • Products: Oryx SSP R&D development computer architecture hardware & software broke world records for DARPA & LinpackAlgorithms---directly applicable to efficiently provide the advantages necessary for millisecond quickness in making better trading decisions.

PAST: Oryx Funded by a $3.6M R/D Contract from Eastman Kodak Corporate Followed by $15M Venture – A

PAST: Oryx Funded by a $3.6M R/D Contract from Eastman Kodak Corporate Followed by $15M Venture – A

PAST: Oryx Funded by a $3.6M R/D Contract from Eastman Kodak Corporate Followed by $15M Venture – A

  • Oryx initial development funding ($3.6M) – all technical milestones (hardware, Math algorithms and Flowgraph editor) demonstrated on a parallel processor in an 18-month work schedule
  • Oryx 2nd round funding of $15M from top venture firms in the USA
  • New Enterprise Associates, Oxford Partners, Polyventures, Grumman Ventures, East Kodak Ventures, Spectra Ventures, Investech, & Allied Signal Corporate. 
  • Delivered on-board units to Grumman, Lockheed, etc

SUPERCOMPUTER ARTIFICIAL INTELLIGENCE CONSULTING

John Hiller, Oryx Corporation Founder, Supercomputer Architecture Cornerstone Patent

Past: Oryx High Performance Supercomputer System capabilities Directly Applicable to AI Performance

John Hiller, Oryx Corporation Founder, Supercomputer Architecture Cornerstone Patent

  •  Highly parallel computer architecture employing crossbar switch with selectable pipeline delay
  • United States Patent 5081575


  • A crossbar switch which connects N (N=2k ; k=0, 1, 2, 3) coarse grain processing elements (rated at 20 million floating point operations per second) to a plurality of memories provides for a parallel processing system free of memory conflicts over a wide range of arithmetic computations (i.e. scalar, vector and matrix). The configuration of the crossbar switch, i.e., the connection between each processing element unit and each parallel memory module, may be changed dynamically on a cycle-by-cycle basis in accordance with the requirements of the algorithm under execution. Although there are certain crossbar usage rules which must be obeyed, the data is mapped over parallel memory such that the processing element units can access and operate on input streams of data in a highly parallel fashion with an effective memory transfer rate and computational throughput power comparable in performance to present-day supercomputers. The crossbar switch is comprised of two basic sections; a multiplexer and a control section. The multiplexer provides the actual switching of signal paths, i.e. connects each processing element unit to a particular parallel memory on each clock cycle. The control section determines which connections are made on each clock cycle in accordance with the algorithm under execution. Selectable pipelined delay in the control section provides for optimal data transfer efficiency between the processors and memory modules over a wide range of array processing algorithms. The crossbar switch also provides for graceful system degradation in computational throughput power without the need to download a new program.



John Hiller, Oryx Corporation Founder, Supercomputer Software Patent

Past: Oryx High Performance Supercomputer System capabilities Directly Applicable to AI Performance

John Hiller, Oryx Corporation Founder, Supercomputer Architecture Cornerstone Patent

  • Method for Automated Deployment Of Software Program onto a Multi-Processor Architecture
  • United States Patent 5418953


  • The method is employed for pre-assignment & scheduling of tasks that enables allocation across multiple physical processors arranged in a variety of architectures.  The assigning action attempts to arrive at minimal cost value for all tasks comprising the problem


Past: Oryx High Performance Supercomputer System capabilities Directly Applicable to AI Performance

Past: Oryx High Performance Supercomputer System capabilities Directly Applicable to AI Performance

Past: Oryx High Performance Supercomputer System capabilities Directly Applicable to AI Performance

  • Multiple processing elements containing floating point, fixed point & logical on computational nodes(2,4,8,etc) connected to multiple memories(2,4,8,etc) via Crossbar Switch
     
  • Scalar, vector, & matrix computation capabilities


  • Fully connected Parallel Crossbar Switch free of memory access conflicts over a wide range of applications
     
  • High & effective memory transfer rates
     
  • Data mapped to parallel memories that are readily accessible for input/output & matrix processing


Dr. Narendra Ahuja Artificial Intelligence Math & Computer Science Capabilities – Page 1

Dr. Narendra Ahuja Artificial Intelligence Math & Computer Science Capabilities – Page 1

Past: Oryx High Performance Supercomputer System capabilities Directly Applicable to AI Performance

Alma mater


University   of Maryland, College Park

 

Known for


Face detection, Video understanding, Motion planning, 3D Vision, Computational cameras, Pattern Recognition

 

Awards


IEEE Emanuel R. Piore Award (1999)
IEEE Fellow (1992)
ACM   Fellow (1996)
Presidential Young Investigator Award (1984)
AAAI Fellow (1992)
SPIE Technology   Achievement Award (1998)

 

Scientific career

 

Fields


Computer science, Computer vision, Artificial Intelligence, Machine   Learning, Robotics

 

Institutions


Professor,  University of Illinois, Urbana-Champaign
   Founding Director,
International   Institute of Information Technology, Hyderabad

Founding Director, Information Technology Research Academy, Delhi

 

Thesis


Mosaic Models for Image Analysis and Synthesis (1979)

 

Doctoral advisor


Azriel Rosenfeld

Dr. Narendra Ahuja Artificial Intelligence Math & Computer Science Capabilities Page 2

Dr. Narendra Ahuja Artificial Intelligence Math & Computer Science Capabilities – Page 1

Dr. Narendra Ahuja Artificial Intelligence Math & Computer Science Capabilities – Page 3

  • Computer vision, advanced artificial intelligence, machine learning/pattern recognition, computer architecture, probability theory, robotics, virtual environments, digital signal processing and knowledge networks
  • Computational approach to automatically extract syntax of images & use it for automated image understanding
  • Special purpose cameras
  • Understanding 3D scenes from Stereo, Motion and Texture
  • Machine learning: Efficient, Explainable, Physics inspired, Multimodal
  • Algorithms and computational complexity
  • PhD in computer science from University of Maryland, College Park MD advised by AzrielRosenfeld (Father of Image Processing) – Thesis Mosaic Models for Image Analysis and Synthesis – Azriel Rosenfeld Advisor
  • Publications: 3 Books Co-Authored, 20 Book Chapters, 115 Journal Articles, 350 Conference Papers, 4 Patents
  • Student Advisees: 60 PhDs, 30 MS
  • •New Labs Developed: Computer Vision and Robotics Lab (Research), Robotics (Teaching)
  • New Courses Started: Computer Vision, Pattern Recognition, Robotics
  • Consulting: SAIC, Battelle Corp., AT&T, Lockheed, Eastman Kodak, Honeywell, Westinghouse, HRL Labs
  • Development: Automated rail inspection; Active and Passive Cameras - omnifocus, hemispherical, high-dynamic-range, stereo; Fingerprint recognition

Dr. Narendra Ahuja Artificial Intelligence Math & Computer Science Capabilities – Page 3

Dr. Narendra Ahuja Artificial Intelligence Math & Computer Science Capabilities – Page 1

Dr. Narendra Ahuja Artificial Intelligence Math & Computer Science Capabilities – Page 3

Patents


·N. Ahuja and M. Tabb, Multiscale Image Edge and Region Detection Method and Apparatus, U. S. Patent, Issued September 1998. ·


R. Dugad and N. Ahuja, Transform Domain Significant Coefficient Digital Image Watermarking Method, U.S. patent Application filed, June 2000. ·


H. Hua and N. Ahuja, Method and Apparatus for a High-Resolution and Real-Time Panoramic Camera, Patent Application Filed, November 2001.

 ·

A. Krishnan and N. Ahuja, Imaging Apparatus and Method for Determining Range from Focus and Focus Information, U. S., Patent, Issued September 1995. European Patent issued, October 2001.

·

H. Hua and N. Ahuja, Method and Apparatus for a High-Resolution and Real-Time Panoramic Camera, Patent Application Filed, November 2001. ·


R. Dugad and N. Ahuja, Transformation of Image Parts in Different domains to Obtain Resultant Image Size Different From Initial Image Size, U.S. Patent, Issued October 2004. 


Books Authored


·N. Ahuja and B. Schachter, Pattern Models, Wiley, l983. ·


J. Weng, T. S. Huang and N. Ahuja, Motion and Structure from Image Sequences, Springer-Verlag, 1992. 

·

Ming-Hsuan Yang and N. Ahuja, Face Detection and Hand Gesture Recognition for Vision-Based Human Computer Interaction, Kluwer Academic Publishers, 2001.  

Enabling Nvidia to Make Faster & more accurate stock trades

Nvidia Streaming Multiprocessor GA100

256 x 256 Matrix Multiply on Nvidia GA100 GPU Host to Host Runtimes

256 x 256 Matrix Multiply on Nvidia GA100 GPU Efficiencies

  • 128 (27)SM/GPU 
  • 32(26) CudaCores/SM
  • Total # Cores 4096 (213)/GPU
  • Two FP32 Multor Add/CudaCore
  • 64 FP32 Per SM
  • One FP64 Multor Add /CudaCore
  • 32 FP64 Per SM
  • 40 GB HBM2 Total – 6 per GPU
  • 40 MB L2 cache Total – 6 per GPU
  • L1 data cache = 128 per GPU, ----192 Kbytes per L1
  • Each Cuda Core has 8 Kbytes– 4 Register Files
  • 1.2 Ghz. clock


256 x 256 Matrix Multiply on Nvidia GA100 GPU Efficiencies

256 x 256 Matrix Multiply on Nvidia GA100 GPU Host to Host Runtimes

256 x 256 Matrix Multiply on Nvidia GA100 GPU Efficiencies

  • •Dense Memories to L1 transfer: 27,306/41,130 = 66.4%
  • •L1 to CUDA core (RFs) transfer: 8,448/ 41,130 = 20.5%
  • •CUDA core FP320s: 4,096/41,130 = 10.0%
  • •CUDA core FP321s: 4,096 +256 = 5,120/41,130 = 12.4%

•

256 x 256 Matrix Multiply on Nvidia GA100 GPU Host to Host Runtimes

256 x 256 Matrix Multiply on Nvidia GA100 GPU Host to Host Runtimes

256 x 256 Matrix Multiply on Nvidia GA100 GPU Runtimes in Clock Cycles

  • Host Gen 4 PCI to 6 Device Memories 32 Gbytes/sec transfer of 786,432 bytes = 25 microseconds
  • 6 Device Memories to 6 L2s 177Gbytes/sec transfer of 786,432 bytes = 4.5 microseconds
  • 6 L2s to 128 L1s 280Gbytes/sec transfer of 786,432 bytes = 2.8 microseconds
  • L1 to 32 CUDA Cores 10.6 Gbytes/sec transfer of of 137,216 bytes = 13.0 microseconds
  • CUDA COREs Computation using 1.2 Ghz.clock (.833 nanoseconds per clock) of 13,824 clocks = 10.7 microseconds
  • Total runtime = 56 microseconds plus software overhead

256 x 256 Matrix Multiply on Nvidia GA100 GPU Runtimes in Clock Cycles

256 x 256 Matrix Multiply on Nvidia GA100 GPU Runtimes in Clock Cycles

256 x 256 Matrix Multiply on Nvidia GA100 GPU Runtimes in Clock Cycles

  • 27,306 clocks 5,461 clocks for movement of A Matrix evenly distributed across 6 L2s---each A row is sent to a pair of L1s---64 pairs of L1s hold the A Matrix. 16,384clocks for movement of one half of B matrix into 2 L2s---one L2 broadcasts to 64 of 128 L1s---the other L2 broadcasts to the other 64 L1s. Data in & out of Dense Memories, L2s, L1s. 5,461clocks to move C matrix back to dense memories via L2s.
  • 13,824 clocks 4 passes (1024 C matrices per pass)
  • 41,130 clocks total 
  • 256 X 256 C matrix in 6 dense memories tied to host bus

2048 x 2048 2D Convolution on Nividia GA100 GPU Efficiencies

256 x 256 Matrix Multiply on Nvidia GA100 GPU Runtimes in Clock Cycles

2048 x 2048 2D Convolution on Nividia GA100 GPU Host to Host Runtimes

  • Dense Memories to L1 transfer: 174,762/188,458 = 92.7%
  • L1 to CUDA core transfer: 2,528/ 188,458 = 1.3%
  • ALUs:   11,168/188,458 = 5.9%

2048 x 2048 2D Convolution on Nividia GA100 GPU Host to Host Runtimes

256 x 256 Matrix Multiply on Nvidia GA100 GPU Runtimes in Clock Cycles

2048 x 2048 2D Convolution on Nividia GA100 GPU Host to Host Runtimes

  • Host Gen 4 PCI to 6 Device Memories 32 Gbytes/sec transfer of 8,388,608 bytes = 262 microseconds
  • 6 Device Memories to 6 L2s 177Gbytes/sec transfer of 786,432 bytes = 47 microseconds
  • 6 L2s to 128 L1s 280Gbytes/sec transfer of 786,432 bytes = 30 microseconds
  • L1 to 32 CUDA Cores 10.6 Gbytes/sec transfer of of 65,536 bytes = 6.2 microseconds
  • CUDA COREs Computation using 1.2 Ghz.clock (.833 nanoseconds per clock) of 11,168 clocks = 9.3 microseconds
  • Total runtime = 354 microseconds plus software overhead

ENABLING NVIDIA TO MAKE FASTER & MORE ACCURATE STOCK TRADES

2048 x 2048 2D Convolution on Nividia GA100 GPU Runtimes in Clock Cycles

  • 174,762 clocks 174,762 clocks for movement of Input & Output Matrices evenly distributed across 6 L2s---128 (16 X 16 Matrix) to / from each one of 128 L1s. 
  • 13,440 clocks 32 passes (128 (16 X 16) matrices per pass). 352 clocks per pass compute. 68 clocks for data to/from L1s and CUDA Cores. 420 clocks total per pass. 
  • 188,202 clocks total 
  • 2048 X 2048 matrix in 6 dense memories tied to host bus

John Hiller Sales & Technology ConsultanT

294B Shore Drive (1041) Montague, NJ 07827, USA

+1.8456725431

Copyright © 2025 John Hiller Sales & Technology Consultant - All Rights Reserved.

Powered by

This website uses cookies.

We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.

Accept