Results 1 to 3 of 3

Thread: Is there maximum amount of loaded kernels?

  1. #1
    Junior Member
    Join Date
    Jul 2011
    Posts
    8

    Is there maximum amount of loaded kernels?

    Hello,

    If kernel was impreted as a "logical operation" there is going to be plenty of those going around with full-application implementation.

    Is there some practical limit in the count of kernels loaded simultaneously in memory?

    I'm asking so that I know whether the code needs to be "merged" to less kernels that contain the conditional blocks to execute proper code, or can we keep them simply separated operations each.


    Kalle

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Is there maximum amount of loaded kernels?

    Is there some practical limit in the count of kernels loaded simultaneously in memory?
    The only limit should be the available memory. A kernel is just a function.

    I'm asking so that I know whether the code needs to be "merged" to less kernels that contain the conditional blocks to execute proper code
    I do not recommend doing that.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3
    Junior Member
    Join Date
    Jul 2011
    Posts
    8

    Re: Is there maximum amount of loaded kernels?

    Hi,

    I got an impression about GPU core/thread structure (naming conventions aside), that the kernel is loaded per core and threads within the core all execute the same kernel.

    Now our technology-methodology behaves like "high-level-compiler", such that we align the executed code through code-generation properly.

    This in turn means that we can align and bundle the executed event codes (say if we have around 100 different events) with conditional branching down to the core count, so that the cores could then parallelize the bundles to threading.

    Now this will make the kernels more complex and I wouldn't be surprised that the conditional branching would introduce the same kind of issues that would for example proper register allocation altogether and eat the benefit.

    Then again the code generation phase is entirely modifiable first-class-code element and it can include all the logic to bundle similar events (that have alike parameters) to get as good register usage as possible.

    I'm totally new to OpenCL structures and the technology constraints behind it, thus the questions. This is easier to discuss once I push out first demo that shows this in practice, but I try to get "best-guess" of how to approach this on the first demo.


    Kalle

Similar Threads

  1. Replies: 4
    Last Post: 06-24-2012, 07:07 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •